From 7c97403b7e4b5f112a09b8f22b0fd54f7b3567b8 Mon Sep 17 00:00:00 2001 From: Ben Anderson <b.anderson@soton.ac.uk> Date: Mon, 15 Sep 2014 13:54:59 +0100 Subject: [PATCH] updated readme --- NEED/README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/NEED/README.md b/NEED/README.md index 7293df9..2302272 100644 --- a/NEED/README.md +++ b/NEED/README.md @@ -8,14 +8,14 @@ http://discover.ukdataservice.ac.uk/catalogue/?sn=7518 For full detailed documentation see https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/332169/need_anonymised_dataset_accompanying_documentation.pdf -* Notes (mostly to self): +Notes (mostly to self): * gas kwh are weather corrected on distribution zones before delivery to DECC * This dataset is a sample of just over 4 million households which have had an Energy Performance Certificate from the full NEED 'all dwellings' dataset * It is a semi-random sample of the 8m records with an EPC, it includes only those with valid values on all variables and (especially) valid observations for electricity in 2012. Uncommon property types are over-represented, common types are under-represented and the weight corrects for this * Sample bias is unclear - which kinds of dwellings have an EPC (e.g. flats where frequent churn may be over-represented?) * UPRN = unique property reference = linkage mechanism (uses AddressBase) -* Issues: +Issues: * Gcons<year>valid variable has undefined labels: G, L, M = ? 0 = off gas & V = valid reading (range 0 - 50,000) so presumably L = large (> 50,000?) and M = missing? But G? * ideally DECC should set missing to -99 to aid re-coding and avoid unpleasant surprises in naive analysis! -- GitLab