diff --git a/NEED/README.md b/NEED/README.md index b7a06c0434ec7b6fff534258196fb7d36fad4c26..50fa13f8d2ba549b334c1733f9b91578cd8cc2c6 100644 --- a/NEED/README.md +++ b/NEED/README.md @@ -20,15 +20,16 @@ Notes (mostly to self): * hoping to add PV etc installations soon * Bias caused by linkage failure is unknown although the DECC NEED Data Framework report from 2013 suggest match rates of 94%-100% (https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/209264/Annex_B_-_Quality_Assurance.pdf) * Both gas and electricity consumption are rounded and the rounding range ('to nearest n') increases through the distributions (see https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/315189/need_dataset_look_ups.xlsx) -* the E/Gcons*valid variable codes: - * 0 = off gas/elec - * V = valid reading (gas range 100 - 50,000; electricity range = 100 - 25,000) +* the Gcons*valid variable codes: + * G = Gas consumption invalid, greater than 50,000 * L = Gas consumption invalid, less than 100 * M = Gas consumption data is missing in source data - * G = Gas consumption invalid, greater than 50,000 + * 0 = Property does not have a gas connection + * V = Valid gas consumption (between 100 and 50,000 inclusive) * NB - there are valid gas readings of '0' which presumably were > 100 by < 249 (first gas 'heap' = 'nearest 500') - -Notes to DECC (!) +* the Econs*valid variable codes: + * G Electricity consumption invalid, greater than 25,000 (DECC lookup table says 50,000) * L Electricity consumption invalid, less than 100 * M Electricity consumption data is missing in source dataset + * V Valid electricity consumption (between 100 and 25,000 inclusive) Notes to DECC (!) * ideally could set missing to -99 to aid re-coding and avoid unpleasant surprises in naive analysis? * can the consumption rounding be constant through the distributions? * check coding of Gcons ref 0 values for 'valid' cases?