* Bias caused by linkage failure is unknown although the DECC NEED Data Framework report from 2013 suggest match rates of 94%-100% (https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/209264/Annex_B_-_Quality_Assurance.pdf)
* Both gas and electricity consumption are rounded and the rounding range ('to nearest n') increases through the distributions (see https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/315189/need_dataset_look_ups.xlsx)
* the E/Gcons*valid variable codes:
* 0 = off gas/elec
* V = valid reading (gas range 100 - 50,000; electricity range = 100 - 25,000)
* the Gcons*valid variable codes:
* G = Gas consumption invalid, greater than 50,000
* L = Gas consumption invalid, less than 100
* M = Gas consumption data is missing in source data
* G = Gas consumption invalid, greater than 50,000
* 0 = Property does not have a gas connection
* V = Valid gas consumption (between 100 and 50,000 inclusive)
* NB - there are valid gas readings of '0' which presumably were > 100 by < 249 (first gas 'heap' = 'nearest 500')
Notes to DECC (!)
* the Econs*valid variable codes:
* G Electricity consumption invalid, greater than 25,000 (DECC lookup table says 50,000)
* L Electricity consumption invalid, less than 100
* M Electricity consumption data is missing in source dataset
* V Valid electricity consumption (between 100 and 25,000 inclusive)
Notes to DECC (!)
* ideally could set missing to -99 to aid re-coding and avoid unpleasant surprises in naive analysis?
* can the consumption rounding be constant through the distributions?
* check coding of Gcons ref 0 values for 'valid' cases?