Skip to content
Snippets Groups Projects
Commit 2282b734 authored by Ben Anderson's avatar Ben Anderson
Browse files

updated processing script

parent c5136350
No related branches found
No related tags found
No related merge requests found
* Script to turn original wide 2014 EULF version of DECC's NEED data into:
* 1. a stata wide form xwave file containing the fixed value variables
* 2. a stata wide form file containing just the yearly consumption variables (linked to 1. via HH_ID)
* 3. a stata long form file containing just the yearly consumption variables (linked to 1. via HH_ID)
* 4. Create codebooks from the above
/*
Script to turn original wide 2014 EULF version of DECC's NEED data into:
1. a stata wide form xwave file containing the fixed value variables
2. a stata wide form file containing just the yearly consumption variables (linked to 1. via HH_ID)
3. a stata long form file containing just the yearly consumption variables (linked to 1. via HH_ID)
4. Create codebooks from the above
* Original data available from: UK DATA ARCHIVE: Study Number 7518 - National Energy Efficiency Data-Framework, 2014
* http://discover.ukdataservice.ac.uk/catalogue/?sn=7518
Original data available from: UK DATA ARCHIVE: Study Number 7518 - National Energy Efficiency Data-Framework, 2014
* Notes:
* This dataset is a sample of just over 4 million households which have had an Energy Performance Certificate from the full NEED 'all dwellings' dataset
* Is this all those who have had an EPC or a random sample of all those who've had an EPC?
* Sample bias is unkown - which kinds of dwellings have an EPC?
* Gcons<year>valid variable has undefined labels: G, L, M = ? Presumably 0 = off gas & V = valid?
* ideally DECC should set missing to -99 to aid re-coding and avoid unpleasant surprises in naive analysis!
the Gcons*valid variable codes:
G = Gas consumption invalid, greater than 50,000
L = Gas consumption invalid, less than 100
M = Gas consumption data is missing in source data
0 = Property does not have a gas connection
V = Valid gas consumption (between 100 and 50,000 inclusive)
NB - there are valid gas readings of '0' which presumably were > 100 but < 249 (first gas 'heap' = 'nearest 500')
the Econs*valid variable codes:
G Electricity consumption invalid, greater than 25,000 (DECC lookup table says 50,000)
L Electricity consumption invalid, less than 100
M Electricity consumption data is missing in source dataset
V Valid electricity consumption (between 100 and 25,000 inclusive)
/*
http://discover.ukdataservice.ac.uk/catalogue/?sn=7518
Copyright (C) 2014 University of Southampton
......@@ -50,6 +26,25 @@ GNU General Public License for more details.
#YMMV - http://en.wiktionary.org/wiki/YMMV
******************
Notes:
This dataset is a sample of just over 4 million households which have had an Energy Performance Certificate from the full NEED 'all dwellings' dataset
Is this all those who have had an EPC or a random sample of all those who've had an EPC?
Sample bias is unkown - which kinds of dwellings have an EPC?
Ideally DECC should set missing to -99 to aid re-coding and avoid unpleasant surprises in naive analysis!
The Gcons*valid variable codes:
G = Gas consumption invalid, greater than 50,000
L = Gas consumption invalid, less than 100
M = Gas consumption data is missing in source data
0 = Property does not have a gas connection
V = Valid gas consumption (between 100 and 50,000 inclusive)
NB - there are valid gas readings of '0' which presumably were > 100 but < 249 (first gas 'heap' = 'nearest 500')
The Econs*valid variable codes:
G Electricity consumption invalid, greater than 25,000 (DECC lookup table says 50,000)
L Electricity consumption invalid, less than 100
M Electricity consumption data is missing in source dataset
V Valid electricity consumption (between 100 and 25,000 inclusive)
*/
clear all
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment