Administrator approval is now required for registering new accounts. If you are registering a new account, and are external to the University, please ask the repository owner to contact ServiceLine to request your account be approved. Repository owners must include the newly registered email address, and specific repository in the request for approval.

* UPRN = unique property reference = linkage mechanism across EPCs, gas/electricity data and EST data on energy efficiency installations (uses AddressBase)

* hoping to add PV etc installations soon

* Bias caused by linkage failure is unknown although the DECC NEED Data Framework report from 2013 suggest match rates of 94%-100% (https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/209264/Annex_B_-_Quality_Assurance.pdf)

* Both gas and electricity consumption are rounded and the rounding range ('to nearest n') increases through the distributions (see https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/315189/need_dataset_look_ups.xlsx)

* Both gas and electricity consumption are rounded and the rounding range ('to nearest n') increases through the distributions (see https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/315189/need_dataset_look_ups.xlsx). The reasons for this are explained in the consultation response at https://www.gov.uk/government/consultations/national-energy-efficiency-data-framework-making-data-available

* the Gcons*valid variable codes:

* G = Gas consumption invalid, greater than 50,000

* L = Gas consumption invalid, less than 100

* M = Gas consumption data is missing in source data

* 0 = Property does not have a gas connection

* V = Valid gas consumption (between 100 and 50,000 inclusive)

* NB - there are valid gas readings of '0' which presumably were > 100 by < 249 (first gas 'heap' = 'nearest 500')

* NB - there are valid gas readings of '0' which presumably were > 100 but < 249 (first gas 'heap' = 'nearest 500')

* the Econs*valid variable codes:

* G Electricity consumption invalid, greater than 25,000 (DECC lookup table says 50,000)
* L Electricity consumption invalid, less than 100
* M Electricity consumption data is missing in source dataset

* V Valid electricity consumption (between 100 and 25,000 inclusive)
Notes to DECC (!)

* ideally could set missing to -99 to aid re-coding and avoid unpleasant surprises in naive analysis?

* can the consumption rounding be constant through the distributions?

* check coding of Gcons ref 0 values for 'valid' cases?

* distinguish between electric & 'other' heating in 'main heating fuel'?

* Ben Anderson, Energy & Climate Change, Faculty of Engineering & Environment, University of Southampton

* b.anderson@soton.ac.uk

* (c) University of Southampton

* Unless there is a different license file in the folder in which this script is found, the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license applies

* http://creativecommons.org/licenses/by-nc/4.0/

clear all

capture noisily log close

* written for Mac OSX - remember to change filesystem delimiter for other platforms

local home "/Users/ben/Documents"

local proot "`home'/Work/Data/Social Science Datatsets/DECC"

* for clam

* local proot "`home'/Work/NEED"

local dpath "`proot'/NEED/End User Licence File 2014/processed"

local rpath "`proot'/results/NEED/"

local version "v1.1"

set more off

log using "`rpath'/analyse-NEED-EULF-2014-descriptives-`version'-$S_DATE.smcl", replace

* use the pre-processed long form file which contains all years of consumption data but not the constant values (housing charactersitics etc) which are in the xwave file

use "`dpath'/need_eul_may2014_longfile.dta", clear

table EconsValid year, c(count Econs min Econs mean Econs max Econs)

table GconsValid year, c(count Gcons min Gcons mean Gcons max Gcons)

* Ben Anderson, Energy & Climate Change, Faculty of Engineering & Environment, University of Southampton

* b.anderson@soton.ac.uk

* (c) University of Southampton

* Unless there is a different license file in the folder in which this script is found, the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license applies

* http://creativecommons.org/licenses/by-nc/4.0/

clear all

capture noisily log close

* written for Mac OSX - remember to change filesystem delimiter for other platforms

local home "/Users/ben/Documents"

local proot "`home'/Work/Data/Social Science Datatsets/DECC"

* for clam

* local proot "`home'/Work/NEED"

local dpath "`proot'/NEED/End User Licence File 2014/processed"

local rpath "`proot'/results/NEED/"

local version "v1.1"

set more off

log using "`rpath'/analyse-NEED-EULF-2014-models-`version'-$S_DATE.smcl", replace

* use the pre-processed wide form file which contains all years of consumption data but not the constant values which are in the xwave file

use "`dpath'/need_eul_may2014_consumptionfile_wide.dta", clear

* we're goinmg to use 2012 data only

keep HH_ID *2012*

* log the consumption as it's very skewed -> becomes semi-normal & OK for linear regression

* Gcons = gas

* Econs = Electricity

* Presumably those without gas use oil or electricity for heating - we don't have oil so we should probably restrict analysis to gas-using hosueholds only to avoid this confounding factor?

* check what's valid

tab Gcons2012Valid Econs2012Valid, mi // what does G,L,M mean? Presumably O = off gas?

tabstat Gcons2012, by(Gcons2012Valid) s(mean min max n)

keep if Gcons2012Valid == "V"

gen log_Gcons2012 = log(Gcons2012)

gen log_Econs2012 = log(Econs2012)

* combine consumption

* treat missing (gas) as 0

egen Allcons2012 = rowtotal(Gcons2012 Econs2012)

gen log_Allcons2012 = log(Allcons2012)

* create log consumption quintiles

egen quinlog_Allcons2012 = cut(log_Allcons2012), group(5)

egen quinlog_Gcons2012 = cut(log_Gcons2012), group(5)

egen quinlog_Econs2012 = cut(log_Econs2012), group(5)

* merge in the pre-processed cross-year fixed values file

merge 1:1 HH_ID using "`dpath'/need_eul_may2014_xwavefile.dta"

* fix some of the variables

* combine IMD: this is a bit dodgy as they are not strictly comparable

gen ba_imd = IMD_ENG

replace ba_imd = IMD_WALES if ba_imd == .

* must use as category variables!!

* set unkown to be 10 -> adds to end of contrasts so can see effect

replace LOFT_DEPTH = 10 if LOFT_DEPTH == .

* set unkown to be 2020 -> adds to end of contrasts so can see effect

tabstat `v'2012, by(MAIN_HEAT_FUEL) s(n mean min max)

* all hhs model

qui: regress log_`v'2012 `generic_hvars' ///

`generic_rvars' ///

i.BOILER_YEAR

est store rlog_`v'2012

di "* -> `v' estat to test for heteroskedasticity & omitted vars"

estat ovtest

estat hettest

* we ought to be testing for linearity too

di "* -> `v' linktest to test for model specification"

di "* if p of _hatsq < 0.05 -> mis-spec"

di "* http://www.ats.ucla.edu/stat/stata/webbooks/reg/chapter2/statareg2.htm"

linktest

* models by property type - to see if rsq & coefficients vary

foreach p of local ptypes {

di "* -> testing log_`v'2012 for `pt`p''"

qui: regress log_`v'2012 `generic_hvarsnp' ///

`generic_rvars' ///

i.BOILER_YEAR ///

if PROP_TYPE == `p'

est store rlog_`v'2012_`pt`p''

di "* -> `v' 2012 `pt`p'' - estat to test for heteroskedasticity & omitted vars"

estat ovtest

estat hettest

* we ought to be testing for linearity too

di "* -> `v' `pt`p'' linktest to test for model specification"

di "* if p of _hatsq < 0.05 -> mis-spec"

di "* http://www.ats.ucla.edu/stat/stata/webbooks/reg/chapter2/statareg2.htm"

linktest

}

* models for different consumption quintiles - to see if rsq & coefficients vary

foreach q of numlist 0/4 {

di "* -> testing log_`v'2012 for quintile: `q'"

qui: regress log_`v'2012 `generic_hvars' ///

`generic_rvars' ///

i.BOILER_YEAR ///

if quinlog_`v'2012 == `q'

est store rlog_`v'2012q`q'

di "* -> quintile: `q' - estat to test for heteroskedasticity & omitted vars"

estat ovtest

estat hettest

* we ought to be testing for linearity too

di "* -> quintile: `q' - linktest"

di "* if p of _hatsq < 0.05 -> mis-spec"

di "* http://www.ats.ucla.edu/stat/stata/webbooks/reg/chapter2/statareg2.htm"

linktest

}

}

* output all the results - that's a lot of t tests!

* we could put them all out in one file but it would be really hard to find the ones you want!

estout rlog_Gcons2012 using "`rpath'/NEED-EULF-2014-log-gas-model-`version'-$S_DATE.txt", replace cells("b se p _star") stats(r2 r2_a N ll)

estout rlog_Gcons2012q* using "`rpath'/NEED-EULF-2014-log-gas-models-quintiles-`version'-$S_DATE.txt", replace cells("b se p _star") stats(r2 r2_a N ll)

estout rlog_Gcons2012_* using "`rpath'/NEED-EULF-2014-log-gas-models-by-property-type-`version'-$S_DATE.txt", replace cells("b se p _star") stats(r2 r2_a N ll)

estout rlog_Econs2012 using "`rpath'/NEED-EULF-2014-log-elec-model-`version'-$S_DATE.txt", replace cells("b se p _star") stats(r2 r2_a N ll)

estout rlog_Econs2012q* using "`rpath'/NEED-EULF-2014-log-elec-models-quintiles-`version'-$S_DATE.txt", replace cells("b se p _star") stats(r2 r2_a N ll)

estout rlog_Econs2012_* using "`rpath'/NEED-EULF-2014-log-elec-models-by-property-type-`version'-$S_DATE.txt", replace cells("b se p _star") stats(r2 r2_a N ll)

estout rlog_Allcons2012 using "`rpath'/NEED-EULF-2014-log-energy-model-`version'-$S_DATE.txt", replace cells("b se p _star") stats(r2 r2_a N ll)

estout rlog_Allcons2012q* using "`rpath'/NEED-EULF-2014-log-energy-models-quintiles-`version'-$S_DATE.txt", replace cells("b se p _star") stats(r2 r2_a N ll)

estout rlog_Allcons2012_* using "`rpath'/NEED-EULF-2014-log-energy-models-by-property-type-`version'-$S_DATE.txt", replace cells("b se p _star") stats(r2 r2_a N ll)