README.md 2.09 KB
Newer Older
Ben Anderson's avatar
Ben Anderson committed
1
2
3
DECC-git NEED
============

Ben Anderson's avatar
Ben Anderson committed
4
Extract & analyse data from the anonymised & released versions of DECC's  NEED dataset.
Ben Anderson's avatar
Ben Anderson committed
5

Ben Anderson's avatar
Ben Anderson committed
6
Original 'End User License' version of the data available from: UK DATA ARCHIVE: Study Number 7518 - National Energy Efficiency Data-Framework, 2014
Ben Anderson's avatar
Ben Anderson committed
7
8
http://discover.ukdataservice.ac.uk/catalogue/?sn=7518

Ben Anderson's avatar
Ben Anderson committed
9
10
For full detailed documentation see https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/332169/need_anonymised_dataset_accompanying_documentation.pdf

Ben Anderson's avatar
Ben Anderson committed
11
Notes (mostly to self):
Ben Anderson's avatar
Ben Anderson committed
12
13
14
15
* gas kwh are weather corrected within the 10 DNO distribution zones before delivery to DECC
* The End User License file (EULF) dataset is a sample of just over 4 million households 
* EULF is a semi-random sample of the 8m records which have an Energy Performance Certificate. 
 * It includes only those with valid values on key variables (Property Age, Property Type, Floor Area Band and Energy Efficiency Band) and (especially) valid observations for electricity in 2012. 
Ben Anderson's avatar
Ben Anderson committed
16
 * Records were selected based on the frequency of household type in the dataset relative to the total dwelling stock so that uncommon property types (e.g. older detached properties) are over-represented and common types (e.g. flats where turnover is high) are under-represented. The supplied weight corrects for this for descriptive analaysis. 
Ben Anderson's avatar
Ben Anderson committed
17
 * Implications for sample bias unclear - there may be other systematic biases not captured by the weight?
Ben Anderson's avatar
Ben Anderson committed
18
* UPRN = unique property reference = linkage mechanism (uses AddressBase)
Ben Anderson's avatar
Ben Anderson committed
19
* Bias caused by linkage failure is unknown although the DECC NEED Data Framework report from 2011 suggest match rates of 94%-100% (https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/209264/Annex_B_-_Quality_Assurance.pdf)
Ben Anderson's avatar
Ben Anderson committed
20

Ben Anderson's avatar
Ben Anderson committed
21
Issues:
Ben Anderson's avatar
Ben Anderson committed
22
23
24
25
* the E/Gcons*valid variable has some undefined labels (L,M,G):
 * 0 = off gas/elec (documented)
 * V = valid reading  (documented: gas range 0 - 50,000; electricity range = 100 - 25,000)
 * L = large? (> 50k or 25k depending?)
Ben Anderson's avatar
Ben Anderson committed
26
27
 * M = missing?
 * G = ?
Ben Anderson's avatar
Ben Anderson committed
28
* ideally DECC should set missing to -99 to aid re-coding and avoid unpleasant surprises in naive analysis!
Ben Anderson's avatar
Ben Anderson committed
29