diff --git a/NEED/process-NEED-EULF-2014.do b/NEED/process-NEED-EULF-2014.do index ce33ad9ae0b3099c50f4d2e02e0315029422640c..a627b959203458745b07ba5683143a99a22050a3 100644 --- a/NEED/process-NEED-EULF-2014.do +++ b/NEED/process-NEED-EULF-2014.do @@ -1,37 +1,13 @@ -* Script to turn original wide 2014 EULF version of DECC's NEED data into: -* 1. a stata wide form xwave file containing the fixed value variables -* 2. a stata wide form file containing just the yearly consumption variables (linked to 1. via HH_ID) -* 3. a stata long form file containing just the yearly consumption variables (linked to 1. via HH_ID) -* 4. Create codebooks from the above +/* +Script to turn original wide 2014 EULF version of DECC's NEED data into: + 1. a stata wide form xwave file containing the fixed value variables + 2. a stata wide form file containing just the yearly consumption variables (linked to 1. via HH_ID) + 3. a stata long form file containing just the yearly consumption variables (linked to 1. via HH_ID) + 4. Create codebooks from the above -* Original data available from: UK DATA ARCHIVE: Study Number 7518 - National Energy Efficiency Data-Framework, 2014 -* http://discover.ukdataservice.ac.uk/catalogue/?sn=7518 +Original data available from: UK DATA ARCHIVE: Study Number 7518 - National Energy Efficiency Data-Framework, 2014 -* Notes: -* This dataset is a sample of just over 4 million households which have had an Energy Performance Certificate from the full NEED 'all dwellings' dataset -* Is this all those who have had an EPC or a random sample of all those who've had an EPC? -* Sample bias is unkown - which kinds of dwellings have an EPC? -* Gcons<year>valid variable has undefined labels: G, L, M = ? Presumably 0 = off gas & V = valid? -* ideally DECC should set missing to -99 to aid re-coding and avoid unpleasant surprises in naive analysis! - -the Gcons*valid variable codes: - - G = Gas consumption invalid, greater than 50,000 - L = Gas consumption invalid, less than 100 - M = Gas consumption data is missing in source data - 0 = Property does not have a gas connection - V = Valid gas consumption (between 100 and 50,000 inclusive) - NB - there are valid gas readings of '0' which presumably were > 100 but < 249 (first gas 'heap' = 'nearest 500') - -the Econs*valid variable codes: - - G Electricity consumption invalid, greater than 25,000 (DECC lookup table says 50,000) - L Electricity consumption invalid, less than 100 - M Electricity consumption data is missing in source dataset - V Valid electricity consumption (between 100 and 25,000 inclusive) - - -/* +http://discover.ukdataservice.ac.uk/catalogue/?sn=7518 Copyright (C) 2014 University of Southampton @@ -50,6 +26,25 @@ GNU General Public License for more details. #YMMV - http://en.wiktionary.org/wiki/YMMV +****************** +Notes: +This dataset is a sample of just over 4 million households which have had an Energy Performance Certificate from the full NEED 'all dwellings' dataset +Is this all those who have had an EPC or a random sample of all those who've had an EPC? +Sample bias is unkown - which kinds of dwellings have an EPC? +Ideally DECC should set missing to -99 to aid re-coding and avoid unpleasant surprises in naive analysis! +The Gcons*valid variable codes: + G = Gas consumption invalid, greater than 50,000 + L = Gas consumption invalid, less than 100 + M = Gas consumption data is missing in source data + 0 = Property does not have a gas connection + V = Valid gas consumption (between 100 and 50,000 inclusive) + NB - there are valid gas readings of '0' which presumably were > 100 but < 249 (first gas 'heap' = 'nearest 500') +The Econs*valid variable codes: + G Electricity consumption invalid, greater than 25,000 (DECC lookup table says 50,000) + L Electricity consumption invalid, less than 100 + M Electricity consumption data is missing in source dataset + V Valid electricity consumption (between 100 and 25,000 inclusive) + */ clear all