From ee5a243da7babcc8e3fc184193718cc90ab8d62a Mon Sep 17 00:00:00 2001
From: Ben Anderson <b.anderson@soton.ac.uk>
Date: Mon, 29 Jun 2015 12:45:52 +0100
Subject: [PATCH] updated readme re PV installs

---
 NEED/README.md | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/NEED/README.md b/NEED/README.md
index 0ea5bbf..0f1e2fd 100644
--- a/NEED/README.md
+++ b/NEED/README.md
@@ -20,21 +20,21 @@ See license file for details.
 Notes (mostly to self)
 ----------------------
 * gas kwh are weather corrected within the 10 DNO distribution zones before delivery to DECC
-* The End User License file (EULF) dataset is a sample of just over 4 million households 
-* EULF is a semi-random sample of the 8m records which have an Energy Performance Certificate. 
- * It includes only those with valid values on key variables (Property Age, Property Type, Floor Area Band and Energy Efficiency Band) and (especially) valid observations for electricity in 2012. 
- * Records were selected based on the frequency of household type in the dataset relative to the total dwelling stock so that uncommon property types (e.g. older detached properties) are over-represented and common types (e.g. flats where turnover is high) are under-represented. The supplied weight corrects for this for descriptive analysis. 
+* The End User License file (EULF) dataset is a sample of just over 4 million households
+* EULF is a semi-random sample of the 8m records which have an Energy Performance Certificate.
+ * It includes only those with valid values on key variables (Property Age, Property Type, Floor Area Band and Energy Efficiency Band) and (especially) valid observations for electricity in 2012.
+ * Records were selected based on the frequency of household type in the dataset relative to the total dwelling stock so that uncommon property types (e.g. older detached properties) are over-represented and common types (e.g. flats where turnover is high) are under-represented. The supplied weight corrects for this for descriptive analysis.
  * Implications for sample bias unclear - there may be other systematic biases not captured by the weight?
 * UPRN = unique property reference = linkage mechanism across EPCs, gas/electricity data and EST data on energy efficiency installations (uses AddressBase)
- * hoping to add PV etc installations soon
+ * PV installs added for 2015 report - see https://www.gov.uk/government/statistics/national-energy-efficiency-data-framework-need-report-summary-of-analysis-2015
 * Bias caused by linkage failure is unknown although the DECC NEED Data Framework report from 2013 suggest match rates of 94%-100% (https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/209264/Annex_B_-_Quality_Assurance.pdf)
-* Both gas and electricity consumption are rounded and the rounding range ('to nearest n') increases through the distributions (see https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/315189/need_dataset_look_ups.xlsx). The reasons for this are explained in the consultation response at https://www.gov.uk/government/consultations/national-energy-efficiency-data-framework-making-data-available 
+* Both gas and electricity consumption are rounded and the rounding range ('to nearest n') increases through the distributions (see https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/315189/need_dataset_look_ups.xlsx). The reasons for this are explained in the consultation response at https://www.gov.uk/government/consultations/national-energy-efficiency-data-framework-making-data-available
 * the Gcons*valid variable codes:
  * G = Gas consumption invalid, greater than 50,000
  * L = Gas consumption invalid, less than 100
  * M = Gas consumption data is missing in source data
  * 0 = Property does not have a gas connection
- * V = Valid gas consumption (between 100 and 50,000 inclusive) 
+ * V = Valid gas consumption (between 100 and 50,000 inclusive)
  * NB - there are valid gas readings of '0' which presumably were > 100 but < 249 (first gas 'heap' = 'nearest 500')
 * the Econs*valid variable codes:
  * G	Electricity consumption invalid, greater than 25,000 (DECC lookup table says 50,000)
@@ -48,4 +48,3 @@ Notes to DECC (!)
 * can the consumption rounding be constant through the distributions?
 * check coding of Gcons ref 0 values for 'valid' cases?
 * distinguish between electric & 'other' heating in 'main heating fuel'?
-
-- 
GitLab