README.md 3.9 KB
Newer Older
Ben Anderson's avatar
Ben Anderson committed
1
DECC NEED
Ben Anderson's avatar
Ben Anderson committed
2
3
============

Ben Anderson's avatar
Ben Anderson committed
4
Extract & analyse data from the anonymised & released versions of DECC's  NEED dataset.
Ben Anderson's avatar
Ben Anderson committed
5

Ben Anderson's avatar
Ben Anderson committed
6
Original 2014 'End User License' version of the data:
7
* available from: UK DATA ARCHIVE: Study Number 7518 - National Energy Efficiency Data-Framework, 2014
Ben Anderson's avatar
Ben Anderson committed
8
http://discover.ukdataservice.ac.uk/catalogue/?sn=7518
9
10
* Detailed documentation: https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/332169/need_anonymised_dataset_accompanying_documentation.pdf
* Full coding details of variables at: https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/315189/need_dataset_look_ups.xlsx
Ben Anderson's avatar
Ben Anderson committed
11

Ben Anderson's avatar
Ben Anderson committed
12
You may find that the scripts also work with the Public Use File (https://www.gov.uk/government/statistics/national-energy-efficiency-data-framework-need-anonymised-data-2014) but I have not tested this.
Ben Anderson's avatar
Ben Anderson committed
13
14
15
16
17
18
19
20
21
###Terms of Use
GPL: V2 - http://choosealicense.com/licenses/gpl-2.0/

See license file for details.

[YMMV](http://en.wiktionary.org/wiki/YMMV)

Notes (mostly to self)
----------------------
Ben Anderson's avatar
Ben Anderson committed
22
* gas kwh are weather corrected within the 10 DNO distribution zones before delivery to DECC
Ben Anderson's avatar
Ben Anderson committed
23
24
25
26
* The End User License file (EULF) dataset is a sample of just over 4 million households
* EULF is a semi-random sample of the 8m records which have an Energy Performance Certificate.
 * It includes only those with valid values on key variables (Property Age, Property Type, Floor Area Band and Energy Efficiency Band) and (especially) valid observations for electricity in 2012.
 * Records were selected based on the frequency of household type in the dataset relative to the total dwelling stock so that uncommon property types (e.g. older detached properties) are over-represented and common types (e.g. flats where turnover is high) are under-represented. The supplied weight corrects for this for descriptive analysis.
Ben Anderson's avatar
Ben Anderson committed
27
 * Implications for sample bias unclear - there may be other systematic biases not captured by the weight?
28
* UPRN = unique property reference = linkage mechanism across EPCs, gas/electricity data and EST data on energy efficiency installations (uses AddressBase)
Ben Anderson's avatar
Ben Anderson committed
29
 * PV installs added for 2015 report - see https://www.gov.uk/government/statistics/national-energy-efficiency-data-framework-need-report-summary-of-analysis-2015
30
* Bias caused by linkage failure is unknown although the DECC NEED Data Framework report from 2013 suggest match rates of 94%-100% (https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/209264/Annex_B_-_Quality_Assurance.pdf)
Ben Anderson's avatar
Ben Anderson committed
31
* Both gas and electricity consumption are rounded and the rounding range ('to nearest n') increases through the distributions (see https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/315189/need_dataset_look_ups.xlsx). The reasons for this are explained in the consultation response at https://www.gov.uk/government/consultations/national-energy-efficiency-data-framework-making-data-available
Ben Anderson's avatar
Ben Anderson committed
32
33
* the Gcons*valid variable codes:
 * G = Gas consumption invalid, greater than 50,000
34
35
 * L = Gas consumption invalid, less than 100
 * M = Gas consumption data is missing in source data
Ben Anderson's avatar
Ben Anderson committed
36
 * 0 = Property does not have a gas connection
Ben Anderson's avatar
Ben Anderson committed
37
 * V = Valid gas consumption (between 100 and 50,000 inclusive)
38
 * NB - there are valid gas readings of '0' which presumably were > 100 but < 249 (first gas 'heap' = 'nearest 500')
Ben Anderson's avatar
Ben Anderson committed
39
* the Econs*valid variable codes:
40
41
42
43
44
45
 * G	Electricity consumption invalid, greater than 25,000 (DECC lookup table says 50,000)
 * L	Electricity consumption invalid, less than 100
 * M	Electricity consumption data is missing in source dataset
 * V	Valid electricity consumption (between 100 and 25,000 inclusive)

Notes to DECC (!)
Ben Anderson's avatar
Ben Anderson committed
46
-----------------
47
48
49
* ideally could set missing to -99 to aid re-coding and avoid unpleasant surprises in naive analysis?
* can the consumption rounding be constant through the distributions?
* check coding of Gcons ref 0 values for 'valid' cases?
50
* distinguish between electric & 'other' heating in 'main heating fuel'?