diff --git a/NEED/README.md b/NEED/README.md
index 814d37b86d41e3281d54ba14f75b6d7d1a2b3bda..7293df90b13ac9b4247dd2699f420a561d358a62 100644
--- a/NEED/README.md
+++ b/NEED/README.md
@@ -3,14 +3,19 @@ DECC-git NEED
 
 Extract & analyse data from the public versions of DECC's  NEED dataset
 
-Original data available from: UK DATA ARCHIVE: Study Number 7518 - National Energy Efficiency Data-Framework, 2014
+Original 'End User License' version of the data available from: UK DATA ARCHIVE: Study Number 7518 - National Energy Efficiency Data-Framework, 2014
 http://discover.ukdataservice.ac.uk/catalogue/?sn=7518
 
-* Notes:
+For full detailed documentation see https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/332169/need_anonymised_dataset_accompanying_documentation.pdf
+
+* Notes (mostly to self):
+* gas kwh are weather corrected on distribution zones before delivery to DECC
 * This dataset is a sample of just over 4 million households which have had an Energy Performance Certificate from the full NEED 'all dwellings' dataset
-* Is this all those who have had an EPC or a random sample of all those who've had an EPC?
-* Sample bias is unknown - which kinds of dwellings have an EPC?
-* Gcons<year>valid variable has undefined labels: G, L, M = ? Presumably 0 = off gas & V = valid?
+* It is a semi-random sample of the 8m records with an EPC, it includes only those with valid values on all variables and (especially) valid observations for electricity in 2012. Uncommon property types are over-represented, common types are under-represented and the weight corrects for this
+* Sample bias is unclear - which kinds of dwellings have an EPC (e.g. flats where frequent churn may be over-represented?)
+* UPRN = unique property reference = linkage mechanism (uses AddressBase)
+
+* Issues:
+* Gcons<year>valid variable has undefined labels: G, L, M = ?  0 = off gas & V = valid reading (range 0 - 50,000) so presumably L = large (> 50,000?) and M = missing? But G?
 * ideally DECC should set missing to -99 to aid re-coding and avoid unpleasant surprises in naive analysis!
 
-UPRN = unique property reference = linkage mechanism
\ No newline at end of file