diff --git a/NEED/README.md b/NEED/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..1072ebe5f9fb713d303504eb1985110604dd4fca
--- /dev/null
+++ b/NEED/README.md
@@ -0,0 +1,14 @@
+DECC-git NEED
+============
+
+Extract & analyse data from the public versions of DECC's  NEED dataset
+
+Original data available from: UK DATA ARCHIVE: Study Number 7518 - National Energy Efficiency Data-Framework, 2014
+http://discover.ukdataservice.ac.uk/catalogue/?sn=7518
+
+* Notes:
+* This dataset is a sample of just over 4 million households which have had an Energy Performance Certificate from the full NEED 'all dwellings' dataset
+* Is this all those who have had an EPC or a random sample of all those who've had an EPC?
+* Sample bias is unkown - which kinds of dwellings have an EPC?
+* Gcons<year>valid variable has undefined labels: G, L, M = ? Presumably 0 = off gas & V = valid?
+* ideally DECC should set missing to -99 to aid re-coding and avoid unpleasant surprises in naive analysis!