Commit 2282b734 authored by Ben Anderson's avatar Ben Anderson
Browse files

updated processing script

parent c5136350
* Script to turn original wide 2014 EULF version of DECC's NEED data into: /*
* 1. a stata wide form xwave file containing the fixed value variables Script to turn original wide 2014 EULF version of DECC's NEED data into:
* 2. a stata wide form file containing just the yearly consumption variables (linked to 1. via HH_ID) 1. a stata wide form xwave file containing the fixed value variables
* 3. a stata long form file containing just the yearly consumption variables (linked to 1. via HH_ID) 2. a stata wide form file containing just the yearly consumption variables (linked to 1. via HH_ID)
* 4. Create codebooks from the above 3. a stata long form file containing just the yearly consumption variables (linked to 1. via HH_ID)
4. Create codebooks from the above
* Original data available from: UK DATA ARCHIVE: Study Number 7518 - National Energy Efficiency Data-Framework, 2014 Original data available from: UK DATA ARCHIVE: Study Number 7518 - National Energy Efficiency Data-Framework, 2014
* http://discover.ukdataservice.ac.uk/catalogue/?sn=7518
* Notes: http://discover.ukdataservice.ac.uk/catalogue/?sn=7518
* This dataset is a sample of just over 4 million households which have had an Energy Performance Certificate from the full NEED 'all dwellings' dataset
* Is this all those who have had an EPC or a random sample of all those who've had an EPC?
* Sample bias is unkown - which kinds of dwellings have an EPC?
* Gcons<year>valid variable has undefined labels: G, L, M = ? Presumably 0 = off gas & V = valid?
* ideally DECC should set missing to -99 to aid re-coding and avoid unpleasant surprises in naive analysis!
the Gcons*valid variable codes:
G = Gas consumption invalid, greater than 50,000
L = Gas consumption invalid, less than 100
M = Gas consumption data is missing in source data
0 = Property does not have a gas connection
V = Valid gas consumption (between 100 and 50,000 inclusive)
NB - there are valid gas readings of '0' which presumably were > 100 but < 249 (first gas 'heap' = 'nearest 500')
the Econs*valid variable codes:
G Electricity consumption invalid, greater than 25,000 (DECC lookup table says 50,000)
L Electricity consumption invalid, less than 100
M Electricity consumption data is missing in source dataset
V Valid electricity consumption (between 100 and 25,000 inclusive)
/*
Copyright (C) 2014 University of Southampton Copyright (C) 2014 University of Southampton
...@@ -50,6 +26,25 @@ GNU General Public License for more details. ...@@ -50,6 +26,25 @@ GNU General Public License for more details.
#YMMV - http://en.wiktionary.org/wiki/YMMV #YMMV - http://en.wiktionary.org/wiki/YMMV
******************
Notes:
This dataset is a sample of just over 4 million households which have had an Energy Performance Certificate from the full NEED 'all dwellings' dataset
Is this all those who have had an EPC or a random sample of all those who've had an EPC?
Sample bias is unkown - which kinds of dwellings have an EPC?
Ideally DECC should set missing to -99 to aid re-coding and avoid unpleasant surprises in naive analysis!
The Gcons*valid variable codes:
G = Gas consumption invalid, greater than 50,000
L = Gas consumption invalid, less than 100
M = Gas consumption data is missing in source data
0 = Property does not have a gas connection
V = Valid gas consumption (between 100 and 50,000 inclusive)
NB - there are valid gas readings of '0' which presumably were > 100 but < 249 (first gas 'heap' = 'nearest 500')
The Econs*valid variable codes:
G Electricity consumption invalid, greater than 25,000 (DECC lookup table says 50,000)
L Electricity consumption invalid, less than 100
M Electricity consumption data is missing in source dataset
V Valid electricity consumption (between 100 and 25,000 inclusive)
*/ */
clear all clear all
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment