analyse-NEED-EULF-2014-models-v2.0.do 9.45 KB
Newer Older
1
2
* Script to analyse DECC's 2014 EULF NEED data to:
* - investigate % variance of energy consumption due to dwelling type variables as a way to infer the % of variance due to people
3
4
5
6
7
8

* NB this script uses 2 data files derived from the original data using the 'process' script

* Original data available from: UK DATA ARCHIVE: Study Number 7518 - National Energy Efficiency Data-Framework, 2014
* http://discover.ukdataservice.ac.uk/catalogue/?sn=7518

9
/*   
10

11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
Copyright (C) 2014  University of Southampton

Author: Ben Anderson (b.anderson@soton.ac.uk, @dataknut, https://github.com/dataknut) 
	[Energy & Climate Change, Faculty of Engineering & Environment, University of Southampton]

This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License 
(http://choosealicense.com/licenses/gpl-2.0/), or (at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

#YMMV - http://en.wiktionary.org/wiki/YMMV

*/
29
30
31
32
33
34
35
36
37
38
39

clear all

capture noisily log close

* written for Mac OSX - remember to change filesystem delimiter for other platforms
local home "/Users/ben/Documents"
local proot "`home'/Work/Data/Social Science Datatsets/DECC"
* for clam
* local proot "`home'/Work/NEED"
local dpath "`proot'/NEED/End User Licence File 2014/processed"
Ben Anderson's avatar
Ben Anderson committed
40
41
local rpath "`proot'/results/NEED"

Ben Anderson's avatar
Ben Anderson committed
42
*local version "1.0"
Ben Anderson's avatar
Ben Anderson committed
43
44
* initial models - all households for electricity models

Ben Anderson's avatar
Ben Anderson committed
45
*local version "1.1"
Ben Anderson's avatar
Ben Anderson committed
46
47
48
49
* restrict to gas only households to avoid complications of:
* - primary electric heating (presumably)
* - oil heating

Ben Anderson's avatar
Ben Anderson committed
50
*local version "v2_1pc"
Ben Anderson's avatar
Ben Anderson committed
51
52
53
54
55
56
*local sample 1
*local sampleby "EE_BAND PROP_TYPE"
* changed from using log consumption to consumption decile to avoid complications due to variable rounding ranges in original data (see readme)
* restricted analysis to households where gas is main heat source as it is better predicted by variables included & is more relevant to EPC (heat)
* uses 1% sample (c 30k) making sure keep proportions of property type and EE_Band to see if linktest fails with smaller n

57
58
59
local version "v2_10pc"
local sample 10
local sampleby "EE_BAND PROP_TYPE"
Ben Anderson's avatar
Ben Anderson committed
60
61
62
* uses 10% sample (c 300k) making sure keep proportions of property type and EE_Band to see if margin plots and co-efficients are the same 
* (linktest etc will probably now fail due to larger n)

63
64
65
*local version "v2_100pc"
*local sample 100
*local sampleby "EE_BAND PROP_TYPE"
66
* uses full sample (c 3m) to see if margin plots and co-efficients are the same 
Ben Anderson's avatar
Ben Anderson committed
67
* (linktest etc will probably now fail due to larger n)
68
69
70
71
72
73

set more off

log using "`rpath'/analyse-NEED-EULF-2014-models-`version'-$S_DATE.smcl", replace

* use the pre-processed wide form file which contains all years of consumption data but not the constant values which are in the xwave file
Ben Anderson's avatar
Ben Anderson committed
74
use "`dpath'/need_eul_may2014_consumptionfile_wide_`sample'pc.dta", clear
75

Ben Anderson's avatar
Ben Anderson committed
76
* we're going to use 2012 data only
77
78
79

keep HH_ID *2012*

Ben Anderson's avatar
Ben Anderson committed
80
* merge in the pre-processed cross-year fixed values file
Ben Anderson's avatar
Ben Anderson committed
81
merge 1:1 HH_ID using "`dpath'/need_eul_may2014_xwavefile_`sample'pc.dta"
82
83

* check what's valid
Ben Anderson's avatar
Ben Anderson committed
84
tab Gcons2012Valid Econs2012Valid, mi // O = off gas, V = valid, L = too low, G = too big, M = missing
85
tabstat Gcons2012, by(Gcons2012Valid) s(mean min max n)
Ben Anderson's avatar
Ben Anderson committed
86
87
* do off-gas use a lot more electricty (heating)?
tabstat Econs2012, by(Gcons2012Valid) s(mean min max n)
88

Ben Anderson's avatar
Ben Anderson committed
89
histogram Gcons2012, by(MAIN_HEAT_FUEL, total) name(histo_Gcons2012)
Ben Anderson's avatar
Ben Anderson committed
90
graph export "`rpath'/graphs/histo_Gcons2012_by_main_heating_fuel_`version'.png", replace
91

Ben Anderson's avatar
Ben Anderson committed
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
tabstat Gcons2012, by(MAIN_HEAT_FUEL) s(n mean min max)

* keep if valid gas & gas = main heat fuel
keep if Gcons2012Valid == "V" & MAIN_HEAT_FUEL == 1

***** random sample ****
* select a random sample but ensure proportions of sampleby are kept
di "* Keeping `sample'% sample by `sampleby'"
sample `sample', by(`sampleby')

tab `sampleby', mi

* log the consumption as it's very skewed -> becomes semi-normal & OK for linear regression
* Gcons = gas
* Econs = Electricity

* create log & deciles
* log - creates a normal distribution
* deciles - avoids the consumption rounding range differences (hopefully)
111
gen log_Gcons2012 = log(Gcons2012)
Ben Anderson's avatar
Ben Anderson committed
112
egen Gcons2012_dec = cut(Gcons2012), group(10)
113
gen log_Econs2012 = log(Econs2012)
Ben Anderson's avatar
Ben Anderson committed
114
egen Econs2012_dec = cut(Econs2012), group(10)
115
116
117
118
119

* combine consumption
* treat missing (gas) as 0
egen Allcons2012 = rowtotal(Gcons2012 Econs2012)

Ben Anderson's avatar
Ben Anderson committed
120
121
*gen log_Allcons2012 = log(Allcons2012)
egen Allcons2012_dec = cut(Allcons2012), group(10)
122

Ben Anderson's avatar
Ben Anderson committed
123
124
125
126
* create log consumption quintiles
*egen quinlog_Allcons2012 = cut(log_Allcons2012), group(5)
*egen quinlog_Gcons2012 = cut(log_Gcons2012), group(5)
*egen quinlog_Econs2012 = cut(log_Econs2012), group(5)
127
128
129
130
131
132
133
134

* fix some of the variables

* combine IMD: this is a bit dodgy as they are not strictly comparable
gen ba_imd = IMD_ENG
replace ba_imd = IMD_WALES if ba_imd == .

* household level vars
135
* for models
Ben Anderson's avatar
Ben Anderson committed
136
local generic_hvars "i.BOILER_YEAR i.MAIN_HEAT_FUEL i.LI_YEAR i.LOFT_DEPTH i.FLOOR_AREA_BAND WALL_CONS i.CWI_YEAR i.PROP_TYPE i.PROP_AGE i.EE_BAND"
137
138
139
* for graphs
local generic_hvarsg "BOILER_YEAR MAIN_HEAT_FUEL LI_YEAR LOFT_DEPTH FLOOR_AREA_BAND WALL_CONS CWI_YEAR PROP_TYPE PROP_AGE EE_BAND"
* for models by type
Ben Anderson's avatar
Ben Anderson committed
140
local generic_hvarsnp "i.BOILER_YEAR i.MAIN_HEAT_FUEL i.LI_YEAR i.LOFT_DEPTH i.FLOOR_AREA_BAND WALL_CONS i.CWI_YEAR i.PROP_AGE i.EE_BAND"
141
142
143
144
145
146
147
148
149
150

* area level vars
local generic_rvars "i.ba_region i.ba_imd"

* define different property types
local ptypes "101 102 103 104 105 106"
local pt101 "detached"
local pt102 "semi"
local pt103 "end_terr"
local pt104 "mid_terr"
Ben Anderson's avatar
Ben Anderson committed
151
local pt105 "bungalow"
152
153
local pt106 "flat"

Ben Anderson's avatar
Ben Anderson committed
154
155
156
* check for weirdness
su `generic_hvarsg'

Ben Anderson's avatar
Ben Anderson committed
157
158
159
* now loop over the energy types & run linear regression models
* NB - the rounding of the consumption values may lead to modelling problems

Ben Anderson's avatar
Ben Anderson committed
160
161
162
163
164
* add Econs Allcons for electricity & sum of both
* rename so graph names don't break
rename log_Gcons2012 lg2012
rename Gcons2012_dec g2012dec
local vars "lg2012 g2012dec"
165
166
167
168
foreach v of local vars {
	* check distributions of original consumption values
	
	* all hhs model
Ben Anderson's avatar
Ben Anderson committed
169
	qui: regress `v' `generic_hvars' ///
170
171
172
		`generic_rvars' ///
		i.BOILER_YEAR
	
Ben Anderson's avatar
Ben Anderson committed
173
	est store `v'
174
	* test a variable
Ben Anderson's avatar
Ben Anderson committed
175
	predict `v'_r, resid
176
177
178
179
	foreach testv of local generic_hvarsg {
		di "Testing residuals against `v'"
		* based on http://www.ats.ucla.edu/stat/stata/webbooks/reg/chapter2/statareg2.htm
		* can't use factor variables in acprplot...
Ben Anderson's avatar
Ben Anderson committed
180
		graph hbox `v'_r, over(`testv') name(rtest_`v'_`testv')
181
		graph export "`rpath'/graphs/rplot_`v'_`testv'-`version'.png", replace
Ben Anderson's avatar
Ben Anderson committed
182
		* possibly floor area - heteroskedasticity ?
183
184
185
	}
	

186
187
188
189
190
191
192
193
194
195
	di "* -> `v' estat to test for heteroskedasticity & omitted vars"
	estat ovtest
	estat hettest
	
	* we ought to be testing for linearity too
	di "* -> `v' linktest to test for model specification"
	di "* if p of _hatsq < 0.05 -> mis-spec"
	di "* http://www.ats.ucla.edu/stat/stata/webbooks/reg/chapter2/statareg2.htm"
	linktest	
	
Ben Anderson's avatar
Ben Anderson committed
196
197
	di "* test EPC margins for `v'"
	margins EE_BAND
198
	marginsplot, name(mplot_`v'_EE_BAND) note("All dwellings")
Ben Anderson's avatar
Ben Anderson committed
199
	graph export "`rpath'/graphs/mplot_`v'_EE_BAND-`version'.png", replace
Ben Anderson's avatar
Ben Anderson committed
200
	
201
202
	* models by property type - to see if rsq & coefficients vary
	foreach p of local ptypes {
Ben Anderson's avatar
Ben Anderson committed
203
204
		di "* -> testing `v' for `pt`p''"
		qui: regress `v' `generic_hvarsnp' ///
205
206
207
			`generic_rvars'	///
			i.BOILER_YEAR ///
			if PROP_TYPE == `p'
Ben Anderson's avatar
Ben Anderson committed
208
		est store `v'_`pt`p''
209
210
211
212
213
214
215
216
217
		
		di "* -> `v' 2012 `pt`p'' - estat to test for heteroskedasticity & omitted vars"
		estat ovtest
		estat hettest
		
		* we ought to be testing for linearity too
		di "* -> `v' `pt`p'' linktest to test for model specification"
		di "* if p of _hatsq < 0.05 -> mis-spec"
		di "* http://www.ats.ucla.edu/stat/stata/webbooks/reg/chapter2/statareg2.htm"
Ben Anderson's avatar
Ben Anderson committed
218
219
220
		linktest
		di "* test EPC margins for `v' (`pt`p'')"
		margins EE_BAND
221
		marginsplot, name(mplot_`v'_EE_BAND_`pt`p'') note("Type: `pt`p''")
Ben Anderson's avatar
Ben Anderson committed
222
		graph export "`rpath'/graphs/mplot_`v'_EE_BAND_`pt`p''-`version'.png", replace
Ben Anderson's avatar
Ben Anderson committed
223

224
225
	}
	* models for different consumption quintiles - to see if rsq & coefficients vary
Ben Anderson's avatar
Ben Anderson committed
226
	/* doesn't make much sense to do this if using deciles as dependent variable
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
	foreach q of numlist 0/4 {
		di "* -> testing log_`v'2012 for quintile: `q'"
		qui: regress log_`v'2012 `generic_hvars' ///
			`generic_rvars'	///
			i.BOILER_YEAR ///
			if quinlog_`v'2012 == `q'
		est store rlog_`v'2012q`q'
		
		di "* -> quintile: `q' - estat to test for heteroskedasticity & omitted vars"
		estat ovtest
		estat hettest
		
		* we ought to be testing for linearity too
		di "* -> quintile: `q' - linktest"
		di "* if p of _hatsq < 0.05 -> mis-spec"
		di "* http://www.ats.ucla.edu/stat/stata/webbooks/reg/chapter2/statareg2.htm"
		linktest	
	}
Ben Anderson's avatar
Ben Anderson committed
245
	*/
246
247
248
249
}

* output all the results - that's a lot of t tests!
* we could put them all out in one file but it would be really hard to find the ones you want!
Ben Anderson's avatar
Ben Anderson committed
250
251
estout lg2012 using "`rpath'/models/NEED-EULF-2014-log-gas-model-`version'.txt", replace cells("b se p _star") stats(r2 r2_a N ll)
estout lg2012_* using "`rpath'/models/NEED-EULF-2014-log-gas-models-by-property-type-`version'.txt", replace cells("b se p _star") stats(r2 r2_a N ll)
252

Ben Anderson's avatar
Ben Anderson committed
253
254
estout g2012dec using "`rpath'/models/NEED-EULF-2014-gas-deciles-model-`version'.txt", replace cells("b se p _star") stats(r2 r2_a N ll)
estout g2012dec_* using "`rpath'/models/NEED-EULF-2014-gas-deciles-models-by-property-type-`version'.txt", replace cells("b se p _star") stats(r2 r2_a N ll)
255
256
257
258

di "* Done!"

log close