Administrator approval is now required for registering new accounts. If you are registering a new account, and are external to the University, please ask the repository owner to contact ServiceLine to request your account be approved. Repository owners must include the newly registered email address, and specific repository in the request for approval.

* Ben Anderson, Energy & Climate Change, Faculty of Engineering & Environment, University of Southampton

* b.anderson@soton.ac.uk

* (c) University of Southampton

* Unless there is a different license file in the folder in which this script is found, the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license applies

* http://creativecommons.org/licenses/by-nc/4.0/

*******************************************

* Script to:

* - Extract data from DECC LSOA level domestic energy consumption data (https://www.gov.uk/government/collections/mlsoa-and-llsoa-electricity-and-gas-estimates)

/*

Copyright (C) 2014 University of Southampton

Author: Ben Anderson (b.anderson@soton.ac.uk, @dataknut, https://github.com/dataknut)

[Energy & Climate Change, Faculty of Engineering & Environment, University of Southampton]

This program is free software; you can redistribute it and/or modify

it under the terms of the GNU General Public License as published by

the Free Software Foundation; either version 2 of the License

(http://choosealicense.com/licenses/gpl-2.0/), or (at your option) any later version.

This program is distributed in the hope that it will be useful,

but WITHOUT ANY WARRANTY; without even the implied warranty of

MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the

* Ben Anderson, Energy & Climate Change, Faculty of Engineering & Environment, University of Southampton

* b.anderson@soton.ac.uk

* (c) University of Southampton

* NB this script uses 2 data files derived from the original data using the 'process' script

/*

Copyright (C) 2014 University of Southampton

Author: Ben Anderson (b.anderson@soton.ac.uk, @dataknut, https://github.com/dataknut)

[Energy & Climate Change, Faculty of Engineering & Environment, University of Southampton]

This program is free software; you can redistribute it and/or modify

it under the terms of the GNU General Public License as published by

the Free Software Foundation; either version 2 of the License

(http://choosealicense.com/licenses/gpl-2.0/), or (at your option) any later version.

This program is distributed in the hope that it will be useful,

but WITHOUT ANY WARRANTY; without even the implied warranty of

MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the

GNU General Public License for more details.

#YMMV - http://en.wiktionary.org/wiki/YMMV

* Unless there is a different license file in the folder in which this script is found, the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license applies

* Ben Anderson, Energy & Climate Change, Faculty of Engineering & Environment, University of Southampton

* b.anderson@soton.ac.uk

* (c) University of Southampton

* Unless there is a different license file in the folder in which this script is found, the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license applies

* http://creativecommons.org/licenses/by-nc/4.0/

clear all

capture noisily log close

* written for Mac OSX - remember to change filesystem delimiter for other platforms

local home "/Users/ben/Documents"

local proot "`home'/Work/Data/Social Science Datatsets/DECC"

* for clam

* local proot "`home'/Work/NEED"

local dpath "`proot'/NEED/End User Licence File 2014/processed"

local rpath "`proot'/results/NEED/"

* local verrsion "1.0"

* initial models - all households for electricity models

local verrsion "1.1"

* restrict to gas only households to avoid complications of:

* - primary electric heating (presumably)

* - oil heating

set more off

log using "`rpath'/analyse-NEED-EULF-2014-models-`version'-$S_DATE.smcl", replace

* use the pre-processed wide form file which contains all years of consumption data but not the constant values which are in the xwave file

use "`dpath'/need_eul_may2014_consumptionfile_wide.dta", clear

* we're going to use 2012 data only

keep HH_ID *2012*

* log the consumption as it's very skewed -> becomes semi-normal & OK for linear regression

* Gcons = gas

* Econs = Electricity

* Presumably those without gas use oil or electricity for heating - we don't have oil so we should probably restrict analysis to gas-using hosueholds only to avoid this confounding factor?

* check what's valid

tab Gcons2012Valid Econs2012Valid, mi // what does G,L,M mean? Presumably O = off gas?

tabstat Gcons2012, by(Gcons2012Valid) s(mean min max n)

keep if Gcons2012Valid == "V"

gen log_Gcons2012 = log(Gcons2012)

gen log_Econs2012 = log(Econs2012)

* combine consumption

* treat missing (gas) as 0

egen Allcons2012 = rowtotal(Gcons2012 Econs2012)

gen log_Allcons2012 = log(Allcons2012)

* create log consumption quintiles

egen quinlog_Allcons2012 = cut(log_Allcons2012), group(5)

egen quinlog_Gcons2012 = cut(log_Gcons2012), group(5)

egen quinlog_Econs2012 = cut(log_Econs2012), group(5)

* merge in the pre-processed cross-year fixed values file

merge 1:1 HH_ID using "`dpath'/need_eul_may2014_xwavefile.dta"

* fix some of the variables

* combine IMD: this is a bit dodgy as they are not strictly comparable

gen ba_imd = IMD_ENG

replace ba_imd = IMD_WALES if ba_imd == .

* must use as category variables!!

* set unkown to be 10 -> adds to end of contrasts so can see effect

replace LOFT_DEPTH = 10 if LOFT_DEPTH == .

* set unkown to be 2020 -> adds to end of contrasts so can see effect

tabstat `v'2012, by(MAIN_HEAT_FUEL) s(n mean min max)

* all hhs model

qui: regress log_`v'2012 `generic_hvars' ///

`generic_rvars' ///

i.BOILER_YEAR

est store rlog_`v'2012

di "* -> `v' estat to test for heteroskedasticity & omitted vars"

estat ovtest

estat hettest

* we ought to be testing for linearity too

di "* -> `v' linktest to test for model specification"

di "* if p of _hatsq < 0.05 -> mis-spec"

di "* http://www.ats.ucla.edu/stat/stata/webbooks/reg/chapter2/statareg2.htm"

linktest

* models by property type - to see if rsq & coefficients vary

foreach p of local ptypes {

di "* -> testing log_`v'2012 for `pt`p''"

qui: regress log_`v'2012 `generic_hvarsnp' ///

`generic_rvars' ///

i.BOILER_YEAR ///

if PROP_TYPE == `p'

est store rlog_`v'2012_`pt`p''

di "* -> `v' 2012 `pt`p'' - estat to test for heteroskedasticity & omitted vars"

estat ovtest

estat hettest

* we ought to be testing for linearity too

di "* -> `v' `pt`p'' linktest to test for model specification"

di "* if p of _hatsq < 0.05 -> mis-spec"

di "* http://www.ats.ucla.edu/stat/stata/webbooks/reg/chapter2/statareg2.htm"

linktest

}

* models for different consumption quintiles - to see if rsq & coefficients vary

foreach q of numlist 0/4 {

di "* -> testing log_`v'2012 for quintile: `q'"

qui: regress log_`v'2012 `generic_hvars' ///

`generic_rvars' ///

i.BOILER_YEAR ///

if quinlog_`v'2012 == `q'

est store rlog_`v'2012q`q'

di "* -> quintile: `q' - estat to test for heteroskedasticity & omitted vars"

estat ovtest

estat hettest

* we ought to be testing for linearity too

di "* -> quintile: `q' - linktest"

di "* if p of _hatsq < 0.05 -> mis-spec"

di "* http://www.ats.ucla.edu/stat/stata/webbooks/reg/chapter2/statareg2.htm"

linktest

}

}

* output all the results - that's a lot of t tests!

* we could put them all out in one file but it would be really hard to find the ones you want!

estout rlog_Gcons2012 using "`rpath'/NEED-EULF-2014-log-gas-model-`version'-$S_DATE.txt", replace cells("b se p _star") stats(r2 r2_a N ll)

estout rlog_Gcons2012q* using "`rpath'/NEED-EULF-2014-log-gas-models-quintiles-`version'-$S_DATE.txt", replace cells("b se p _star") stats(r2 r2_a N ll)

estout rlog_Gcons2012_* using "`rpath'/NEED-EULF-2014-log-gas-models-by-property-type-`version'-$S_DATE.txt", replace cells("b se p _star") stats(r2 r2_a N ll)

estout rlog_Econs2012 using "`rpath'/NEED-EULF-2014-log-elec-model-`version'-$S_DATE.txt", replace cells("b se p _star") stats(r2 r2_a N ll)

estout rlog_Econs2012q* using "`rpath'/NEED-EULF-2014-log-elec-models-quintiles-`version'-$S_DATE.txt", replace cells("b se p _star") stats(r2 r2_a N ll)

estout rlog_Econs2012_* using "`rpath'/NEED-EULF-2014-log-elec-models-by-property-type-`version'-$S_DATE.txt", replace cells("b se p _star") stats(r2 r2_a N ll)

estout rlog_Allcons2012 using "`rpath'/NEED-EULF-2014-log-energy-model-`version'-$S_DATE.txt", replace cells("b se p _star") stats(r2 r2_a N ll)

estout rlog_Allcons2012q* using "`rpath'/NEED-EULF-2014-log-energy-models-quintiles-`version'-$S_DATE.txt", replace cells("b se p _star") stats(r2 r2_a N ll)

estout rlog_Allcons2012_* using "`rpath'/NEED-EULF-2014-log-energy-models-by-property-type-`version'-$S_DATE.txt", replace cells("b se p _star") stats(r2 r2_a N ll)

* Ben Anderson, Energy & Climate Change, Faculty of Engineering & Environment, University of Southampton

* b.anderson@soton.ac.uk

* (c) University of Southampton

/*

* Unless there is a different license file in the folder in which this script is found, the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license applies

* http://creativecommons.org/licenses/by-nc/4.0/

Copyright (C) 2014 University of Southampton

Author: Ben Anderson (b.anderson@soton.ac.uk, @dataknut, https://github.com/dataknut)

[Energy & Climate Change, Faculty of Engineering & Environment, University of Southampton]

This program is free software; you can redistribute it and/or modify

it under the terms of the GNU General Public License as published by

the Free Software Foundation; either version 2 of the License

(http://choosealicense.com/licenses/gpl-2.0/), or (at your option) any later version.

This program is distributed in the hope that it will be useful,

but WITHOUT ANY WARRANTY; without even the implied warranty of

MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the

* Gcons<year>valid variable has undefined labels: G, L, M = ? Presumably 0 = off gas & V = valid?

* ideally DECC should set missing to -99 to aid re-coding and avoid unpleasant surprises in naive analysis!

* Author: Ben Anderson, Energy & Climate Change, Faculty of Engineering & Environment, University of Southampton

* b.anderson@soton.ac.uk

* (c) University of Southampton

/*

* Unless there is a different license file in the folder in which this script is found, the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license applies

* http://creativecommons.org/licenses/by-nc/4.0/

Copyright (C) 2014 University of Southampton

Author: Ben Anderson (b.anderson@soton.ac.uk, @dataknut, https://github.com/dataknut)

[Energy & Climate Change, Faculty of Engineering & Environment, University of Southampton]

This program is free software; you can redistribute it and/or modify

it under the terms of the GNU General Public License as published by

the Free Software Foundation; either version 2 of the License

(http://choosealicense.com/licenses/gpl-2.0/), or (at your option) any later version.

This program is distributed in the hope that it will be useful,

but WITHOUT ANY WARRANTY; without even the implied warranty of

MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the

The charts here show r sq values for different 'kichen sink' multivariate regression models. The idea was to see how much of the variation in gas & electricity consumption could be attributed to the 'physical' variables in the datasets and thus that the residual would represent the 'effect' of people's behaviour on energy consumption - see stats code for explanations!