Skip to content
Snippets Groups Projects
Commit 9a61a3fe authored by Ben Anderson's avatar Ben Anderson
Browse files

Update README.md

parent fd2cdcad
No related branches found
No related tags found
No related merge requests found
......@@ -13,7 +13,7 @@ Notes (mostly to self):
* The End User License file (EULF) dataset is a sample of just over 4 million households
* EULF is a semi-random sample of the 8m records which have an Energy Performance Certificate.
* It includes only those with valid values on key variables (Property Age, Property Type, Floor Area Band and Energy Efficiency Band) and (especially) valid observations for electricity in 2012.
* Records were selected based on the frequency of household type in the dataset relative to the total dwelling stock so that uncommon property types are over-represented, common types are under-represented and the supplied weight corrects for this.
* Records were selected based on the frequency of household type in the dataset relative to the total dwelling stock so that uncommon property types (e.g. older detached properties) are over-represented and common types (e.g. flats where turnover is high) are under-represented. The supplied weight corrects for this for descriptive analaysis.
* Implications for sample bias unclear - there may be other systematic biases not captured by the weight?
* UPRN = unique property reference = linkage mechanism (uses AddressBase)
* Bias caused by linkage failure is unknown although the DECC NEED Data Framework report from 2011 suggest match rates of 94%-100% (https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/209264/Annex_B_-_Quality_Assurance.pdf)
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment