diff --git a/NEED/process-NEED-EULF-2014.do b/NEED/process-NEED-EULF-2014.do
index ce33ad9ae0b3099c50f4d2e02e0315029422640c..a627b959203458745b07ba5683143a99a22050a3 100644
--- a/NEED/process-NEED-EULF-2014.do
+++ b/NEED/process-NEED-EULF-2014.do
@@ -1,37 +1,13 @@
-* Script to turn original wide 2014 EULF version of DECC's NEED data into:
-* 1. a stata wide form xwave file containing the fixed value variables
-* 2. a stata wide form file containing just the yearly consumption variables (linked to 1. via HH_ID)
-* 3. a stata long form file containing just the yearly consumption variables (linked to 1. via HH_ID)
-* 4. Create codebooks from the above
+/* 
+Script to turn original wide 2014 EULF version of DECC's NEED data into:
+ 1. a stata wide form xwave file containing the fixed value variables
+ 2. a stata wide form file containing just the yearly consumption variables (linked to 1. via HH_ID)
+ 3. a stata long form file containing just the yearly consumption variables (linked to 1. via HH_ID)
+ 4. Create codebooks from the above
 
-* Original data available from: UK DATA ARCHIVE: Study Number 7518 - National Energy Efficiency Data-Framework, 2014
-* http://discover.ukdataservice.ac.uk/catalogue/?sn=7518
+Original data available from: UK DATA ARCHIVE: Study Number 7518 - National Energy Efficiency Data-Framework, 2014
 
-* Notes:
-* This dataset is a sample of just over 4 million households which have had an Energy Performance Certificate from the full NEED 'all dwellings' dataset
-* Is this all those who have had an EPC or a random sample of all those who've had an EPC?
-* Sample bias is unkown - which kinds of dwellings have an EPC?
-* Gcons<year>valid variable has undefined labels: G, L, M = ? Presumably 0 = off gas & V = valid?
-* ideally DECC should set missing to -99 to aid re-coding and avoid unpleasant surprises in naive analysis!
-
-the Gcons*valid variable codes:
-
-    G = Gas consumption invalid, greater than 50,000
-    L = Gas consumption invalid, less than 100
-    M = Gas consumption data is missing in source data
-    0 = Property does not have a gas connection
-    V = Valid gas consumption (between 100 and 50,000 inclusive)
-    NB - there are valid gas readings of '0' which presumably were > 100 but < 249 (first gas 'heap' = 'nearest 500')
-
-the Econs*valid variable codes:
-
-    G Electricity consumption invalid, greater than 25,000 (DECC lookup table says 50,000)
-    L Electricity consumption invalid, less than 100
-    M Electricity consumption data is missing in source dataset
-    V Valid electricity consumption (between 100 and 25,000 inclusive)
-
-
-/*   
+http://discover.ukdataservice.ac.uk/catalogue/?sn=7518
 
 Copyright (C) 2014  University of Southampton
 
@@ -50,6 +26,25 @@ GNU General Public License for more details.
 
 #YMMV - http://en.wiktionary.org/wiki/YMMV
 
+******************
+Notes:
+This dataset is a sample of just over 4 million households which have had an Energy Performance Certificate from the full NEED 'all dwellings' dataset
+Is this all those who have had an EPC or a random sample of all those who've had an EPC?
+Sample bias is unkown - which kinds of dwellings have an EPC?
+Ideally DECC should set missing to -99 to aid re-coding and avoid unpleasant surprises in naive analysis!
+The Gcons*valid variable codes:
+    G = Gas consumption invalid, greater than 50,000
+    L = Gas consumption invalid, less than 100
+    M = Gas consumption data is missing in source data
+    0 = Property does not have a gas connection
+    V = Valid gas consumption (between 100 and 50,000 inclusive)
+    NB - there are valid gas readings of '0' which presumably were > 100 but < 249 (first gas 'heap' = 'nearest 500')
+The Econs*valid variable codes:
+    G Electricity consumption invalid, greater than 25,000 (DECC lookup table says 50,000)
+    L Electricity consumption invalid, less than 100
+    M Electricity consumption data is missing in source dataset
+    V Valid electricity consumption (between 100 and 25,000 inclusive)
+
 */
 
 clear all