* UPRN = unique property reference = linkage mechanism across EPCs, gas/electricity data and EST data on energy efficiency installations (uses AddressBase)
* hoping to add PV etc installations soon
* Bias caused by linkage failure is unknown although the DECC NEED Data Framework report from 2013 suggest match rates of 94%-100% (https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/209264/Annex_B_-_Quality_Assurance.pdf)
* Both gas and electricity consumption are rounded and the rounding range ('to nearest n') increases through the distributions (see https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/315189/need_dataset_look_ups.xlsx)
* Both gas and electricity consumption are rounded and the rounding range ('to nearest n') increases through the distributions (see https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/315189/need_dataset_look_ups.xlsx). The reasons for this are explained in the consultation response at https://www.gov.uk/government/consultations/national-energy-efficiency-data-framework-making-data-available
* the Gcons*valid variable codes:
* G = Gas consumption invalid, greater than 50,000
* L = Gas consumption invalid, less than 100
* M = Gas consumption data is missing in source data
* 0 = Property does not have a gas connection
* V = Valid gas consumption (between 100 and 50,000 inclusive)
* NB - there are valid gas readings of '0' which presumably were > 100 by < 249 (first gas 'heap' = 'nearest 500')
* NB - there are valid gas readings of '0' which presumably were > 100 but < 249 (first gas 'heap' = 'nearest 500')
* the Econs*valid variable codes:
* G Electricity consumption invalid, greater than 25,000 (DECC lookup table says 50,000)
* L Electricity consumption invalid, less than 100
* M Electricity consumption data is missing in source dataset
* V Valid electricity consumption (between 100 and 25,000 inclusive)
Notes to DECC (!)
* ideally could set missing to -99 to aid re-coding and avoid unpleasant surprises in naive analysis?
* can the consumption rounding be constant through the distributions?
* check coding of Gcons ref 0 values for 'valid' cases?
* distinguish between electric & 'other' heating in 'main heating fuel'?
* Ben Anderson, Energy & Climate Change, Faculty of Engineering & Environment, University of Southampton
* b.anderson@soton.ac.uk
* (c) University of Southampton
* Unless there is a different license file in the folder in which this script is found, the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license applies
* http://creativecommons.org/licenses/by-nc/4.0/
clearall
capturenoisilylogclose
* written for Mac OSX - remember to change filesystem delimiter for other platforms
* use the pre-processed long form file which contains all years of consumption data but not the constant values (housing charactersitics etc) which are in the xwave file
* Ben Anderson, Energy & Climate Change, Faculty of Engineering & Environment, University of Southampton
* b.anderson@soton.ac.uk
* (c) University of Southampton
* Unless there is a different license file in the folder in which this script is found, the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license applies
* http://creativecommons.org/licenses/by-nc/4.0/
clearall
capturenoisilylogclose
* written for Mac OSX - remember to change filesystem delimiter for other platforms
* log the consumption as it's very skewed -> becomes semi-normal & OK for linear regression
* Gcons = gas
* Econs = Electricity
* Presumably those without gas use oil or electricity for heating - we don't have oil so we should probably restrict analysis to gas-using hosueholds only to avoid this confounding factor?
* check what's valid
tabGcons2012ValidEcons2012Valid,mi// what does G,L,M mean? Presumably O = off gas?
* output all the results - that's a lot of t tests!
* we could put them all out in one file but it would be really hard to find the ones you want!
estoutrlog_Gcons2012using"`rpath'/NEED-EULF-2014-log-gas-model-`version'-$S_DATE.txt",replacecells("b se p _star")stats(r2r2_aNll)
estoutrlog_Gcons2012q*using"`rpath'/NEED-EULF-2014-log-gas-models-quintiles-`version'-$S_DATE.txt",replacecells("b se p _star")stats(r2r2_aNll)
estoutrlog_Gcons2012_*using"`rpath'/NEED-EULF-2014-log-gas-models-by-property-type-`version'-$S_DATE.txt",replacecells("b se p _star")stats(r2r2_aNll)
estoutrlog_Econs2012using"`rpath'/NEED-EULF-2014-log-elec-model-`version'-$S_DATE.txt",replacecells("b se p _star")stats(r2r2_aNll)
estoutrlog_Econs2012q*using"`rpath'/NEED-EULF-2014-log-elec-models-quintiles-`version'-$S_DATE.txt",replacecells("b se p _star")stats(r2r2_aNll)
estoutrlog_Econs2012_*using"`rpath'/NEED-EULF-2014-log-elec-models-by-property-type-`version'-$S_DATE.txt",replacecells("b se p _star")stats(r2r2_aNll)
estoutrlog_Allcons2012using"`rpath'/NEED-EULF-2014-log-energy-model-`version'-$S_DATE.txt",replacecells("b se p _star")stats(r2r2_aNll)
estoutrlog_Allcons2012q*using"`rpath'/NEED-EULF-2014-log-energy-models-quintiles-`version'-$S_DATE.txt",replacecells("b se p _star")stats(r2r2_aNll)
estoutrlog_Allcons2012_*using"`rpath'/NEED-EULF-2014-log-energy-models-by-property-type-`version'-$S_DATE.txt",replacecells("b se p _star")stats(r2r2_aNll)