From 0f4889dcbe5e4d9ae5ef4730d61530982434e953 Mon Sep 17 00:00:00 2001 From: Ben Anderson <b.anderson@soton.ac.uk> Date: Tue, 16 Sep 2014 09:43:08 +0100 Subject: [PATCH] updated readme following DECC NEED user event --- NEED/README.md | 33 ++++++++++++++++++++------------- 1 file changed, 20 insertions(+), 13 deletions(-) diff --git a/NEED/README.md b/NEED/README.md index 3107586..b7a06c0 100644 --- a/NEED/README.md +++ b/NEED/README.md @@ -3,10 +3,11 @@ DECC-git NEED Extract & analyse data from the anonymised & released versions of DECC's NEED dataset. -Original 'End User License' version of the data available from: UK DATA ARCHIVE: Study Number 7518 - National Energy Efficiency Data-Framework, 2014 +Original 'End User License' version of the data: +* available from: UK DATA ARCHIVE: Study Number 7518 - National Energy Efficiency Data-Framework, 2014 http://discover.ukdataservice.ac.uk/catalogue/?sn=7518 - -For full detailed documentation see https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/332169/need_anonymised_dataset_accompanying_documentation.pdf +* Detailed documentation: https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/332169/need_anonymised_dataset_accompanying_documentation.pdf +* Full coding details of variables at: https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/315189/need_dataset_look_ups.xlsx Notes (mostly to self): * gas kwh are weather corrected within the 10 DNO distribution zones before delivery to DECC @@ -15,15 +16,21 @@ Notes (mostly to self): * It includes only those with valid values on key variables (Property Age, Property Type, Floor Area Band and Energy Efficiency Band) and (especially) valid observations for electricity in 2012. * Records were selected based on the frequency of household type in the dataset relative to the total dwelling stock so that uncommon property types (e.g. older detached properties) are over-represented and common types (e.g. flats where turnover is high) are under-represented. The supplied weight corrects for this for descriptive analysis. * Implications for sample bias unclear - there may be other systematic biases not captured by the weight? -* UPRN = unique property reference = linkage mechanism (uses AddressBase) -* Bias caused by linkage failure is unknown although the DECC NEED Data Framework report from 2011 suggest match rates of 94%-100% (https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/209264/Annex_B_-_Quality_Assurance.pdf) +* UPRN = unique property reference = linkage mechanism across EPCs, gas/electricity data and EST data on energy efficiency installations (uses AddressBase) + * hoping to add PV etc installations soon +* Bias caused by linkage failure is unknown although the DECC NEED Data Framework report from 2013 suggest match rates of 94%-100% (https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/209264/Annex_B_-_Quality_Assurance.pdf) +* Both gas and electricity consumption are rounded and the rounding range ('to nearest n') increases through the distributions (see https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/315189/need_dataset_look_ups.xlsx) +* the E/Gcons*valid variable codes: + * 0 = off gas/elec + * V = valid reading (gas range 100 - 50,000; electricity range = 100 - 25,000) + * L = Gas consumption invalid, less than 100 + * M = Gas consumption data is missing in source data + * G = Gas consumption invalid, greater than 50,000 + * NB - there are valid gas readings of '0' which presumably were > 100 by < 249 (first gas 'heap' = 'nearest 500') -Issues: -* the E/Gcons*valid variable has some undefined labels (L,M,G): - * 0 = off gas/elec (documented) - * V = valid reading (documented: gas range 0 - 50,000; electricity range = 100 - 25,000) - * L = large? (> 50k or 25k depending?) - * M = missing? - * G = ? -* ideally DECC should set missing to -99 to aid re-coding and avoid unpleasant surprises in naive analysis! +Notes to DECC (!) +* ideally could set missing to -99 to aid re-coding and avoid unpleasant surprises in naive analysis? +* can the consumption rounding be constant through the distributions? +* check coding of Gcons ref 0 values for 'valid' cases? +YMMV \ No newline at end of file -- GitLab