Apart from a few exempted buildings, a dwelling must have an EPC when constructed, sold or let. This means that over time we will have an EPC for an increasing number of properties and we should already have EPCs for all rented properties.
EPCs are not necessarily up to date. For example if a property has not been sold or let since a major upgrade, the effects of that upgrade may not be visible in the data.
Further reading:
check what feeds in automatically e.f. RHI installs etc
We have to assume the data we have is the current state of play for these dwellings.
Load the data for the area of interest - in this case the City of Southampton.
df <- path.expand("~/data/EW_epc/domestic-E06000045-Southampton/certificates.csv")
sotonEPCsDT <- data.table::fread(df)
The EPC data file has 91833 records for Southampton and 90 variables. We’re not interested in all of these, we want:
We’re also going to keep:
If an EPC has been updated or refreshed, the EPC dataset will hold multiple EPC records for that property (see Table 2.1 for some examples). For the current purposes we only want the most recent record for each dwelling.
uniqBRN_DT <- sotonEPCsDT[, .(nRecords = .N,
firstEPC = min(INSPECTION_DATE),
lastEPC = max(INSPECTION_DATE)), keyby = .(BUILDING_REFERENCE_NUMBER)]
kableExtra::kable(head(uniqBRN_DT[nRecords > 1]), cap = "Examples of multiple records") %>%
kable_styling()
BUILDING_REFERENCE_NUMBER | nRecords | firstEPC | lastEPC |
---|---|---|---|
444 | 3 | 2012-10-09 | 2014-03-13 |
668 | 2 | 2009-05-19 | 2018-02-16 |
697 | 2 | 2013-02-14 | 2013-11-25 |
805 | 2 | 2009-01-28 | 2019-01-08 |
871 | 2 | 2009-03-06 | 2019-05-08 |
1362 | 2 | 2009-06-12 | 2015-12-09 |
We select the most recent within BUILDING_REFERENCE_NUMBER and then check that this matches the maximum (most recent) INSPECTION_DATE from the original dataset.
# select just these vars
dt <- sotonEPCsDT[, .(BUILDING_REFERENCE_NUMBER, LMK_KEY, LODGEMENT_DATE,INSPECTION_DATE, PROPERTY_TYPE, BUILT_FORM,
ENVIRONMENT_IMPACT_CURRENT, ENERGY_CONSUMPTION_CURRENT, CO2_EMISSIONS_CURRENT, TENURE,
PHOTO_SUPPLY, WIND_TURBINE_COUNT, TOTAL_FLOOR_AREA,
POSTCODE, LOCAL_AUTHORITY_LABEL)]
# select most recent record within BUILDING_REFERENCE_NUMBER - how?
# better check this is doing so
setkey(dt,BUILDING_REFERENCE_NUMBER, INSPECTION_DATE) # sort by date within reference number
sotonUniqueEPCsDT <- unique(dt, by = "BUILDING_REFERENCE_NUMBER",
fromLast = TRUE) # takes the most recent as we have sorted by INSPECTION_DATE within BUILDING_REFERENCE_NUMBER using setkey
setkey(uniqBRN_DT, BUILDING_REFERENCE_NUMBER)
setkey(sotonUniqueEPCsDT, BUILDING_REFERENCE_NUMBER)
dt <- uniqBRN_DT[sotonUniqueEPCsDT]
dt[, diff := INSPECTION_DATE - lastEPC] # should be 0
message("Check difference between original max date and INSPECTION_DATE of selected record - should be 0")
## Check difference between original max date and INSPECTION_DATE of selected record - should be 0
summary(dt$diff)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 0 0 0 0 0
nLatestEPCs <- uniqueN(sotonUniqueEPCsDT$BUILDING_REFERENCE_NUMBER)
This leaves us with 71,600 EPCs. These are the most recent EPCs for the dwellings in the Southampton EPC dataset.
Now check the distributions of the retained variables.
skimr::skim(sotonUniqueEPCsDT)
## Warning: Couldn't find skimmers for class: integer64; No user-defined `sfl` provided. Falling back to
## `character`.
Name | sotonUniqueEPCsDT |
Number of rows | 71600 |
Number of columns | 15 |
_______________________ | |
Column type frequency: | |
character | 7 |
Date | 2 |
numeric | 6 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
BUILDING_REFERENCE_NUMBER | 0 | 1 | 17 | 21 | 0 | 71600 | 0 |
LMK_KEY | 0 | 1 | 29 | 34 | 0 | 71600 | 0 |
PROPERTY_TYPE | 0 | 1 | 4 | 10 | 0 | 5 | 0 |
BUILT_FORM | 0 | 1 | 8 | 20 | 0 | 7 | 0 |
TENURE | 0 | 1 | 0 | 16 | 1986 | 6 | 0 |
POSTCODE | 0 | 1 | 8 | 8 | 0 | 5107 | 0 |
LOCAL_AUTHORITY_LABEL | 0 | 1 | 11 | 11 | 0 | 1 | 0 |
Variable type: Date
skim_variable | n_missing | complete_rate | min | max | median | n_unique |
---|---|---|---|---|---|---|
LODGEMENT_DATE | 0 | 1 | 2008-10-01 | 2020-06-30 | 2014-10-20 | 4132 |
INSPECTION_DATE | 0 | 1 | 2007-03-02 | 2020-06-30 | 2014-10-10 | 3907 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
ENVIRONMENT_IMPACT_CURRENT | 0 | 1.00 | 62.55 | 15.75 | 1.0 | 52.0 | 63.0 | 73 | 115.00 | ▁▂▇▅▁ |
ENERGY_CONSUMPTION_CURRENT | 0 | 1.00 | 262.97 | 140.59 | -184.0 | 173.0 | 233.0 | 327 | 1597.00 | ▃▇▁▁▁ |
CO2_EMISSIONS_CURRENT | 0 | 1.00 | 3.16 | 1.94 | -1.8 | 1.8 | 2.8 | 4 | 77.00 | ▇▁▁▁▁ |
PHOTO_SUPPLY | 38590 | 0.46 | 0.59 | 5.11 | 0.0 | 0.0 | 0.0 | 0 | 100.00 | ▇▁▁▁▁ |
WIND_TURBINE_COUNT | 5555 | 0.92 | 0.00 | 0.03 | -1.0 | 0.0 | 0.0 | 0 | 1.00 | ▁▁▇▁▁ |
TOTAL_FLOOR_AREA | 0 | 1.00 | 72.98 | 34.91 | 0.0 | 49.0 | 69.0 | 87 | 1353.68 | ▇▁▁▁▁ |
As we would expect we have 71600 unique property reference numbers. We can also see some strangeness. In some cases we seem to have:
This is not surprising since the kWh/y and tCO2/y values are estimated using a model but before we go any further we’d better check if these anomalies are significant in number.
Load postcodes for Southampton (contains other geo-codes for linkage).
Source: https://geoportal.statistics.gov.uk/datasets/national-statistics-postcode-lookup-august-2020
# Load the postcode based MSOA codes
soPostcodesDT <- data.table::fread(path.expand("~/data/UK_postcodes/NSPL_AUG_2020_UK/Data/multi_csv/NSPL_AUG_2020_UK_SO.csv"))
#soPostcodesDT <- soPostcodesDT[is.na(doterm)] # keep current
# keep all as some of the defunct ones will be in the EPC data (!)
sotonPostcodesDT <- soPostcodesDT[laua == "E06000045"] # keep Southampton City
sotonPostcodesReducedDT <- sotonPostcodesDT[, .(pcd, pcd2, pcds, laua, msoa11, lsoa11)]
message("Example data")
## Example data
head(sotonPostcodesReducedDT)
## pcd pcd2 pcds laua msoa11 lsoa11
## 1: SO1 0AA SO1 0AA SO1 0AA E06000045 E02003577 E01032748
## 2: SO1 0AB SO1 0AB SO1 0AB E06000045 E02003577 E01032748
## 3: SO1 0AD SO1 0AD SO1 0AD E06000045 E02003577 E01032748
## 4: SO1 0AE SO1 0AE SO1 0AE E06000045 E02003571 E01017140
## 5: SO1 0AF SO1 0AF SO1 0AF E06000045 E02003571 E01017140
## 6: SO1 0AG SO1 0AG SO1 0AG E06000045 E02003577 E01032748
Load BEIS energy demand data.
Source: https://geoportal.statistics.gov.uk/datasets/national-statistics-postcode-lookup-august-2020
beisElecDT <- data.table::fread("~/data/beis/MSOA_DOM_ELEC_csv/MSOA_ELEC_2018.csv")
sotonElecDT <- beisElecDT[LAName %like% "Southampton", .(nElecMeters = METERS,
beisElecMWh = KWH/1000,
MSOACode, LAName)
]
beisGasDT <- data.table::fread("~/data/beis/MSOA_DOM_GAS_csv/MSOA_GAS_2018.csv")
sotonGasDT <- beisGasDT[LAName %like% "Southampton", .(nGasMeters = METERS,
beisGasMWh = KWH/1000,
MSOACode)]
setkey(sotonElecDT, MSOACode)
setkey(sotonGasDT, MSOACode)
sotonEnergyDT <- sotonGasDT[sotonElecDT]
sotonEnergyDT[, beisEnergyMWh := beisElecMWh + beisGasMWh]
#head(sotonEnergyDT)
message("Example data (retained variables)")
## Example data (retained variables)
head(sotonEnergyDT)
## nGasMeters beisGasMWh MSOACode nElecMeters beisElecMWh LAName beisEnergyMWh
## 1: 2557 38480.93 E02003549 2832 11196.005 Southampton 49676.93
## 2: 2876 28049.73 E02003550 3527 13074.440 Southampton 41124.17
## 3: 1649 17358.87 E02003551 2446 8957.742 Southampton 26316.61
## 4: 2009 17667.12 E02003552 2809 10383.889 Southampton 28051.01
## 5: 2303 24996.91 E02003553 2464 8479.993 Southampton 33476.91
## 6: 2378 29664.71 E02003554 2873 10048.060 Southampton 39712.77
Load Census 2011 tenure data.
Source: https://www.nomisweb.co.uk/census/2011/ks402ew
# census tenure ----
dt <- data.table::fread(path.expand("~/data/census2011/2011_MSOA_householdTenure_Soton.csv"))
dt[, census2011_socialRent := `Tenure: Social rented; measures: Value`]
dt[, census2011_privateRent := `Tenure: Private rented; measures: Value`]
dt[, census2011_ownerOccupy := `Tenure: Owned; measures: Value`]
dt[, census2011_other := `Tenure: Living rent free; measures: Value`]
dt[, MSOACode := `geography code`]
dt[, hhCheck := census2011_socialRent + census2011_privateRent + census2011_ownerOccupy + census2011_other]
dt[, nHHs_tenure := `Tenure: All households; measures: Value`]
dt[, socRent_pc := 100*(census2011_socialRent/nHHs_tenure)]
dt[, privRent_pc := 100*(census2011_privateRent/nHHs_tenure)]
dt[, ownerOcc_pc := 100*(census2011_ownerOccupy/nHHs_tenure)]
tenureDT <- dt[, .(MSOACode, nHHs_tenure, socRent_pc, privRent_pc, ownerOcc_pc)]
message("Example data (retained variables)")
## Example data (retained variables)
head(tenureDT) # all tenure data
## MSOACode nHHs_tenure socRent_pc privRent_pc ownerOcc_pc
## 1: E02002559 3646 4.443225 16.346681 77.53703
## 2: E02002560 2511 2.907208 12.385504 83.91079
## 3: E02002561 2507 11.408057 8.575987 79.17830
## 4: E02002562 2933 23.389021 26.321173 49.36925
## 5: E02002563 2343 23.772941 8.664106 65.98378
## 6: E02002564 4137 29.804206 8.581097 60.04351
Load IMD data.
Source: https://www.nomisweb.co.uk/census/2011/qs119ew
# add the deprivation data by MSOA
dt <- data.table::fread(path.expand("~/data/census2011/2011_MSOA_deprivation.csv"))
dt[, nHHs_deprivation := `Household Deprivation: All categories: Classification of household deprivation; measures: Value`]
dt[, MSOACode := `geography code`]
#sotonDep_DT[, .(nHouseholds = sum(totalHouseholds)), keyby = .(LAName)]
dt[, dep0_pc := 100*(`Household Deprivation: Household is not deprived in any dimension; measures: Value`/nHHs_deprivation)]
dt[, dep1_pc := 100*(`Household Deprivation: Household is deprived in 1 dimension; measures: Value`/nHHs_deprivation)]
dt[, dep2_pc := 100*(`Household Deprivation: Household is deprived in 2 dimensions; measures: Value`/nHHs_deprivation)]
dt[, dep3_pc := 100*(`Household Deprivation: Household is deprived in 3 dimensions; measures: Value`/nHHs_deprivation)]
dt[, dep4_pc := 100*(`Household Deprivation: Household is deprived in 4 dimensions; measures: Value`/nHHs_deprivation)]
deprivationDT <- dt[, .(MSOACode, nHHs_deprivation, dep0_pc, dep1_pc, dep2_pc, dep3_pc, dep4_pc)]
# sneak the LA name in there too
dt <- sotonEnergyDT[,.(MSOACode,LAName)]
setkey(dt, MSOACode)
setkey(deprivationDT, MSOACode)
sotonDeprivationDT <- deprivationDT[dt] # has the side effect of dropping non-Soton MSOAs
message("Example data (retained variables)")
## Example data (retained variables)
head(sotonDeprivationDT)
## MSOACode nHHs_deprivation dep0_pc dep1_pc dep2_pc dep3_pc dep4_pc LAName
## 1: E02003549 2849 52.36925 32.88873 12.24991 2.316602 0.1755002 Southampton
## 2: E02003550 3216 43.09701 32.92910 18.19030 5.254975 0.5286070 Southampton
## 3: E02003551 2256 33.68794 33.99823 23.00532 8.289007 1.0195035 Southampton
## 4: E02003552 2646 28.11791 32.01058 29.28949 9.901738 0.6802721 Southampton
## 5: E02003553 2394 39.01420 32.95739 19.88304 6.975773 1.1695906 Southampton
## 6: E02003554 2646 46.48526 32.38851 17.27135 3.514739 0.3401361 Southampton
# merge with census for future use
setkey(sotonDeprivationDT, MSOACode)
setkey(tenureDT, MSOACode)
sotonCensus2011_DT <- tenureDT[sotonDeprivationDT] # only Soton MSOAs
We recode the current energy consumption into categories for comparison with other low values and the presence of wind turbines/PV. We use -ve, 0 and 1 kWh as the thresholds of interest.
ggplot2::ggplot(sotonUniqueEPCsDT, aes(x = ENERGY_CONSUMPTION_CURRENT)) +
geom_histogram(binwidth = 5) +
facet_wrap(~TENURE) +
geom_vline(xintercept = 0, alpha = 0.4)
Figure 3.1: Histogram of ENERGY_CONSUMPTION_CURRENT (reference line = 0)
underZero <- nrow(sotonUniqueEPCsDT[ENERGY_CONSUMPTION_CURRENT < 0])
t <- with(sotonUniqueEPCsDT[ENERGY_CONSUMPTION_CURRENT < 0],
table(BUILT_FORM,TENURE))
kableExtra::kable(t, caption = "Properties with ENERGY_CONSUMPTION_CURRENT < 0") %>%
kable_styling()
owner-occupied | rental (social) | unknown | ||
---|---|---|---|---|
Detached | 0 | 2 | 0 | 0 |
End-Terrace | 2 | 0 | 2 | 0 |
Mid-Terrace | 3 | 1 | 1 | 0 |
NO DATA! | 0 | 0 | 0 | 2 |
Semi-Detached | 6 | 0 | 0 | 1 |
# do we think this is caused by solar/wind?
sotonUniqueEPCsDT[, hasWind := ifelse(WIND_TURBINE_COUNT > 0, "Yes", "No")]
#table(sotonUniqueEPCsDT$hasWind)
sotonUniqueEPCsDT[, hasPV := ifelse(PHOTO_SUPPLY >0, "Yes", "No")]
#table(sotonUniqueEPCsDT$hasPV)
sotonUniqueEPCsDT[, consFlag := ifelse(ENERGY_CONSUMPTION_CURRENT < 0, "-ve kWh/y", NA)]
sotonUniqueEPCsDT[, consFlag := ifelse(ENERGY_CONSUMPTION_CURRENT == 0, "0 kWh/y", consFlag)]
sotonUniqueEPCsDT[, consFlag := ifelse(ENERGY_CONSUMPTION_CURRENT > 0 &
ENERGY_CONSUMPTION_CURRENT <= 1, "0-1 kWh/y", consFlag)]
sotonUniqueEPCsDT[, consFlag := ifelse(ENERGY_CONSUMPTION_CURRENT > 1, "1+ kWh/y", consFlag)]
t <- sotonUniqueEPCsDT[, .(nObs = .N), keyby = .(consFlag, hasWind, hasPV)]
kableExtra::kable(t, caption = "Properties in ENERGY_CONSUMPTION_CURRENT category by presence of microgeneration") %>%
kable_styling()
consFlag | hasWind | hasPV | nObs |
---|---|---|---|
-ve kWh/y | NA | NA | 5 |
-ve kWh/y | No | NA | 15 |
0 kWh/y | NA | NA | 1 |
0 kWh/y | No | No | 1 |
1+ kWh/y | NA | NA | 5549 |
1+ kWh/y | No | NA | 33014 |
1+ kWh/y | No | No | 32534 |
1+ kWh/y | No | Yes | 447 |
1+ kWh/y | Yes | NA | 6 |
1+ kWh/y | Yes | No | 28 |
There are only 20 dwellings where ENERGY_CONSUMPTION_CURRENT < 0 and none of them seem to have PV or a wind turbine so we can probably ignore them.
# repeat with a density plot to allow easy overlap
# exclude those with no data
ggplot2::ggplot(sotonUniqueEPCsDT[TENURE != "NO DATA!" &
TENURE != "unknown" &
TENURE != ""], aes(x = ENERGY_CONSUMPTION_CURRENT,
fill = TENURE, alpha = 0.2)) +
geom_density() +
facet_wrap(~BUILT_FORM) +
guides(alpha = FALSE) +
geom_vline(xintercept = 0, alpha = 0.4) +
theme(legend.position = "bottom")
Figure 3.2: Comparing distributions of ENERGY_CONSUMPTION_CURRENT by tenure and built form (reference line = 0)
Recommendation: We should exclude any property where ENERGY_CONSUMPTION_CURRENT <= 0
Next we do the same for current emissions. Repeat the coding for total floor area using 0 and 1 TCO2/y as the threshold of interest.
ggplot2::ggplot(sotonUniqueEPCsDT, aes(x = CO2_EMISSIONS_CURRENT)) +
geom_histogram(binwidth = 1)
Figure 3.3: Histogram of CO2_EMISSIONS_CURRENT
nZeroEmissions <- nrow(sotonUniqueEPCsDT[CO2_EMISSIONS_CURRENT < 0])
sotonUniqueEPCsDT[, emissionsFlag := ifelse(CO2_EMISSIONS_CURRENT < 0, "-ve CO2/y", NA)]
sotonUniqueEPCsDT[, emissionsFlag := ifelse(CO2_EMISSIONS_CURRENT == 0, "0 CO2/y", emissionsFlag)]
sotonUniqueEPCsDT[, emissionsFlag := ifelse(CO2_EMISSIONS_CURRENT > 0 &
CO2_EMISSIONS_CURRENT <= 1, "0-1 TCO2/y", emissionsFlag)]
sotonUniqueEPCsDT[, emissionsFlag := ifelse(CO2_EMISSIONS_CURRENT > 1, "1+ TCO2/y", emissionsFlag)]
t <- sotonUniqueEPCsDT[, .(nObs = .N), keyby = .(emissionsFlag, hasWind, hasPV)]
kableExtra::kable(t, caption = "Properties with CO2_EMISSIONS_CURRENT < 0 by presence of microgeneration") %>%
kable_styling()
emissionsFlag | hasWind | hasPV | nObs |
---|---|---|---|
-ve CO2/y | NA | NA | 4 |
-ve CO2/y | No | NA | 16 |
-ve CO2/y | No | No | 2 |
0 CO2/y | NA | NA | 5 |
0 CO2/y | No | No | 1 |
0-1 TCO2/y | NA | NA | 3445 |
0-1 TCO2/y | No | NA | 1941 |
0-1 TCO2/y | No | No | 532 |
0-1 TCO2/y | No | Yes | 19 |
0-1 TCO2/y | Yes | NA | 1 |
0-1 TCO2/y | Yes | No | 1 |
1+ TCO2/y | NA | NA | 2101 |
1+ TCO2/y | No | NA | 31072 |
1+ TCO2/y | No | No | 32000 |
1+ TCO2/y | No | Yes | 428 |
1+ TCO2/y | Yes | NA | 5 |
1+ TCO2/y | Yes | No | 27 |
kableExtra::kable(round(100*(prop.table(table(sotonUniqueEPCsDT$emissionsFlag,
sotonUniqueEPCsDT$consFlag,
useNA = "always")
)
)
,2)
, caption = "% properties in CO2_EMISSIONS_CURRENT categories by ENERGY_CONSUMPTION_CURRENT categories") %>%
kable_styling()
-ve kWh/y | 0 kWh/y | 1+ kWh/y | NA | |
---|---|---|---|---|
-ve CO2/y | 0.03 | 0 | 0.00 | 0 |
0 CO2/y | 0.00 | 0 | 0.00 | 0 |
0-1 TCO2/y | 0.00 | 0 | 8.29 | 0 |
1+ TCO2/y | 0.00 | 0 | 91.67 | 0 |
NA | 0.00 | 0 | 0.00 | 0 |
There are 22 properties with 0 or negative emissions. It looks like they are also the properties with -ve kWh as we might expect. So we can safely ignore them.
Recommendation: we should exclude any property where CO2_EMISSIONS_CURRENT <= 0
Environmental impact
is some sort of numerical scale that unlikely to be normally distributed.
ggplot2::ggplot(sotonEPCsDT, aes(x = ENVIRONMENT_IMPACT_CURRENT)) +
geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Figure 3.4: Histogram of ENVIRONMENT_IMPACT_CURRENT
Environmental impact
should decrease as emissions increase…
ggplot2::ggplot(sotonEPCsDT, aes(x = CO2_EMISSIONS_CURRENT,
y = ENVIRONMENT_IMPACT_CURRENT,
colour = TENURE)) +
geom_point() +
facet_wrap(TENURE~.) +
theme(legend.position = "bottom")
Figure 3.5: Plot of ENVIRONMENT_IMPACT_CURRENT vs CO2_EMISSIONS_CURRENT
It does but what is the relationship between ENVIRONMENT_IMPACT_CURRENT and CO2_EMISSIONS_CURRENT? It is not linear… (Figure 3.5) and there are some interesting outliers.
Repeat the coding for total floor area using 5 m2 as the threshold of interest.
ggplot2::ggplot(sotonUniqueEPCsDT, aes(x = TOTAL_FLOOR_AREA)) +
geom_histogram(binwidth = 1)
Figure 3.6: Histogram of TOTAL_FLOOR_AREA
nZeroFloorArea <- nrow(sotonUniqueEPCsDT[TOTAL_FLOOR_AREA < 0])
sotonUniqueEPCsDT[, floorFlag := ifelse(TOTAL_FLOOR_AREA == 0, "0 m2", NA)]
sotonUniqueEPCsDT[, floorFlag := ifelse(TOTAL_FLOOR_AREA > 0 &
TOTAL_FLOOR_AREA <= 5, "0-5 m2", floorFlag)]
sotonUniqueEPCsDT[, floorFlag := ifelse(TOTAL_FLOOR_AREA > 5, "5+ m2", floorFlag)]
t <- with(sotonUniqueEPCsDT, table(floorFlag, consFlag))
kableExtra::kable(round(100*prop.table(t),3), caption = "% properties with TOTAL_FLOOR_AREA category by ENERGY_CONSUMPTION_CURRENT category") %>%
kable_styling()
-ve kWh/y | 0 kWh/y | 1+ kWh/y | |
---|---|---|---|
0 m2 | 0.000 | 0.000 | 0.096 |
0-5 m2 | 0.000 | 0.000 | 0.001 |
5+ m2 | 0.028 | 0.003 | 99.872 |
kableExtra::kable(head(sotonUniqueEPCsDT[, .(BUILDING_REFERENCE_NUMBER, PROPERTY_TYPE, TOTAL_FLOOR_AREA,
ENERGY_CONSUMPTION_CURRENT)][order(-TOTAL_FLOOR_AREA)], 10),
caption = "Top 10 by floor area (largest)") %>%
kable_styling()
BUILDING_REFERENCE_NUMBER | PROPERTY_TYPE | TOTAL_FLOOR_AREA | ENERGY_CONSUMPTION_CURRENT |
---|---|---|---|
9507976768 | House | 1353.680 | 140 |
3834614378 | House | 1123.000 | 120 |
9808638568 | House | 973.210 | 522 |
5181048568 | House | 861.360 | 279 |
1654050778 | House | 855.000 | 170 |
898835568 | House | 846.421 | 161 |
8250419178 | House | 800.000 | 185 |
3914088378 | House | 796.000 | 216 |
2590863278 | House | 714.000 | 224 |
4947642078 | Flat | 694.000 | 88 |
kableExtra::kable(head(sotonUniqueEPCsDT[, .(BUILDING_REFERENCE_NUMBER, PROPERTY_TYPE, TOTAL_FLOOR_AREA,
ENERGY_CONSUMPTION_CURRENT)][order(TOTAL_FLOOR_AREA)], 10),
caption = "Bottom 10 by floor area (smallest)") %>%
kable_styling()
BUILDING_REFERENCE_NUMBER | PROPERTY_TYPE | TOTAL_FLOOR_AREA | ENERGY_CONSUMPTION_CURRENT |
---|---|---|---|
184420668 | Flat | 0 | 104 |
627876968 | Flat | 0 | 58 |
666790668 | Flat | 0 | 110 |
747876968 | Flat | 0 | 70 |
884420668 | Flat | 0 | 119 |
884838868 | Flat | 0 | 71 |
1478001668 | Flat | 0 | 115 |
1926004568 | Flat | 0 | 144 |
1935420668 | Flat | 0 | 103 |
2145420668 | Flat | 0 | 117 |
kableExtra::kable(round(100*prop.table(t),3), caption = "% properties with TOTAL_FLOOR_AREA category by ENERGY_CONSUMPTION_CURRENT category") %>%
kable_styling()
-ve kWh/y | 0 kWh/y | 1+ kWh/y | |
---|---|---|---|
0 m2 | 0.000 | 0.000 | 0.096 |
0-5 m2 | 0.000 | 0.000 | 0.001 |
5+ m2 | 0.028 | 0.003 | 99.872 |
Table 3.3 shows that the properties with floor area of < 5m2 are not necessarily the ones with 0 or negative kWh values. Nevertheless they represent a small proportion of all properties.
The scale of the x axis in Figure 3.6 also suggests a few very large properties.
Recommendation: We should exclude any property where TOTAL_FLOOR_AREA <= 5
We know that we do not have EPC records for every dwelling. But how many are we missing? We will check this at MSOA level as it allows us to link to other MSOA level datasets that tell us how many households, dwellings or energy meters to expect. Arguably it would be better to do this at LSOA level but…
First we’ll use the BEIS 2018 MSOA level annual electricity data to estimate the number of meters (not properties) - some addresses can have 2 meters (e.g. standard & economy 7). However this is more useful than the number of gas meters since not all dwellings have mains gas but all (should?) have an electricity meter.
message("Number of electricity & gas meters")
## Number of electricity & gas meters
sotonEnergyDT[, .(nElecMeters = sum(nElecMeters),
nGasMeters = sum(nGasMeters)), keyby = .(LAName)]
## LAName nElecMeters nGasMeters
## 1: Southampton 108333 81645
Next we’ll check for the number of households reported by the 2011 Census.
would be better to use the Census dwellings counts but this gives us tenure which is useful
#censusDT <- data.table::fread(path.expand("~/data/"))
t <- sotonCensus2011_DT[, .(nHouseholds = sum(nHHs_deprivation)), keyby = .(LAName)]
kableExtra::kable(t, caption = "Census derived household counts") %>%
kable_styling()
LAName | nHouseholds |
---|---|
Southampton | 98254 |
That’s lower than the number of electricity meters (as expected) but note that as it is a count of households rather than dwellings, it doesn’t allow for dwellings that were empty on census night.
# Postcodes don't help - no count of addresses in the data (there used to be??)
# but we can use it to check which Soton postcodes are missing from the EPC file
sotonPostcodesReducedDT[, c("pc_chunk1","pc_chunk2" ) := tstrsplit(pcds,
split = " "
)
]
We should not have single digit postcodes in the postcode data - i.e. S01 should not be there (since 1993). Southampton City is unusual in only having double digit postcodes.
# EPC
# set up counters
# use final cleaned EPC data
sotonUniqueEPCsDT[, epcIsSocialRent := ifelse(TENURE == "rental (social)", 1, 0)]
sotonUniqueEPCsDT[, epcIsPrivateRent := ifelse(TENURE == "rental (private)", 1, 0)]
sotonUniqueEPCsDT[, epcIsOwnerOcc := ifelse(TENURE == "owner-occupied", 1, 0)]
sotonUniqueEPCsDT[, epcIsUnknownTenure := ifelse(TENURE == "NO DATA!" |
TENURE == "" , 1, 0)]
# aggregate EPCs to postcodes
sotonEpcPostcodes_DT <- sotonUniqueEPCsDT[, .(nEPCs = .N,
sumEPC_tCO2 = sum(CO2_EMISSIONS_CURRENT, na.rm = TRUE),
n_epcIsSocialRent = sum(epcIsSocialRent, na.rm = TRUE),
n_epcIsPrivateRent = sum(epcIsPrivateRent, na.rm = TRUE),
n_epcIsOwnerOcc = sum(epcIsOwnerOcc, na.rm = TRUE),
n_epcIsUnknownTenure = sum(epcIsUnknownTenure, na.rm = TRUE),
sumEpcMWh = sum(ENERGY_CONSUMPTION_CURRENT* TOTAL_FLOOR_AREA)/1000), # crucial as ENERGY_CONSUMPTION_CURRENT = kWh/m2
keyby = .(POSTCODE, LOCAL_AUTHORITY_LABEL)]
sotonEpcPostcodes_DT[, c("pc_chunk1","pc_chunk2" ) := tstrsplit(POSTCODE,
split = " "
)
]
# check original EPC data for Soton - which postcodes are covered?
sotonEPCsDT[, c("pc_chunk1","pc_chunk2" ) := tstrsplit(POSTCODE,
split = " "
)
]
t <- sotonEPCsDT[, .(nEPCs = .N), keyby = .(postcode_sector = pc_chunk1)]
kableExtra::kable(t, caption = "Count of most recent EPCs per postcode sector for Southampton") %>%
kable_styling()
postcode_sector | nEPCs |
---|---|
SO14 | 14213 |
SO15 | 17855 |
SO16 | 20270 |
SO17 | 8446 |
SO18 | 10661 |
SO19 | 20388 |
It looks like we have EPCs for each postcode sector and we only have double digit postcodes which is good.
# match the EPC postcode summaries to the postcode extract
sotonPostcodesReducedDT[, POSTCODE_s := stringr::str_remove(pcds, " ")]
setkey(sotonPostcodesReducedDT, POSTCODE_s)
sotonPostcodesReducedDT[, MSOACode := msoa11]
message("Number of postcodes: ",uniqueN(sotonPostcodesReducedDT$POSTCODE_s))
## Number of postcodes: 14228
sotonEpcPostcodes_DT[, POSTCODE_s := stringr::str_remove(POSTCODE, " ")]
setkey(sotonEpcPostcodes_DT, POSTCODE_s)
message("Number of postcodes with EPCs: ",uniqueN(sotonEpcPostcodes_DT$POSTCODE_s))
## Number of postcodes with EPCs: 5107
dt <- sotonEpcPostcodes_DT[sotonPostcodesReducedDT]
# aggregate to MSOA - watch for NAs where no EPCs in a given postcode
sotonEpcMSOA_DT <- dt[, .(nEPCs = sum(nEPCs, na.rm = TRUE),
sumEPC_tCO2 = sum(sumEPC_tCO2, na.rm = TRUE),
n_epcIsSocialRent = sum(n_epcIsSocialRent, na.rm = TRUE),
n_epcIsPrivateRent = sum(n_epcIsPrivateRent, na.rm = TRUE),
n_epcIsOwnerOcc = sum(n_epcIsOwnerOcc, na.rm = TRUE),
n_epcIsUnknownTenure = sum(n_epcIsUnknownTenure, na.rm = TRUE),
sumEpcMWh = sum(sumEpcMWh, na.rm = TRUE)
),
keyby = .(MSOACode) # change name on the fly for easier matching
]
#summary(sotonEpcMSOA_DT)
So we have some postcodes with no EPCs.
Join the estimates together at MSOA level for comparison. There are 32 MSOAs in Southampton.
# 32 LSOAs in Soton
# add census & deprivation to energy
setkey(sotonEnergyDT, MSOACode)
setkey(sotonCensus2011_DT, MSOACode)
setkey(sotonEpcMSOA_DT, MSOACode)
sotonMSOA_DT <- sotonCensus2011_DT[sotonEnergyDT]
#names(sotonMSOA_DT)
sotonMSOA_DT <- sotonEpcMSOA_DT[sotonMSOA_DT]
#names(sotonMSOA_DT)
# add MSOA names from the postcode LUT
msoaNamesDT <- data.table::as.data.table(readxl::read_xlsx(path.expand("~/data/UK_postcodes/NSPL_AUG_2020_UK/Documents/MSOA (2011) names and codes UK as at 12_12.xlsx")))
msoaNamesDT[, MSOACode := MSOA11CD]
msoaNamesDT[, MSOAName := MSOA11NM]
setkey(msoaNamesDT, MSOACode)
sotonMSOA_DT <- msoaNamesDT[sotonMSOA_DT]
#names(sotonMSOA_DT)
Table3.6 compares all three sources of counts. Clearly we have fewer EPCs in 2020 than both households in 2011 and electricity meters in 2018.
t <- sotonMSOA_DT[, .(nHouseholds_2011 = sum(nHHs_tenure),
nElecMeters_2018 = sum(nElecMeters),
nEPCs_2020 = sum(nEPCs),
total_MWh_BEIS_2018 = sum(beisEnergyMWh),
total_kWh_EPCs_2020 = sum(sumEpcMWh)), keyby = .(LAName)]
kableExtra::kable(t, caption = "Comparison of different estimates of the number of dwellings and total energy use") %>%
kable_styling()
LAName | nHouseholds_2011 | nElecMeters_2018 | nEPCs_2020 | total_MWh_BEIS_2018 | total_kWh_EPCs_2020 |
---|---|---|---|---|---|
Southampton | 98254 | 108333 | 71527 | 1276983 | 1290907 |
nHouseholds_2011f <- sum(sotonMSOA_DT$nHHs_tenure)
nElecMeters_2018f <- sum(sotonMSOA_DT$nElecMeters)
nEPCs_2020f <- sum(sotonMSOA_DT$nEPCs)
makePC <- function(x,y,r){
# make a percent of x/y and round it to r decimal places
pc <- round(100*(x/y),r)
return(pc)
}
The number of EPCs we have is:
We can also see that despite having ‘missing’ EPCs, the estimated total EPC-derived energy demand is marginally higher than the BEIS-derived weather corrected energy demand data. Given that the BEIS data accounts for all heating, cooking, hot water, lighting and appliance use we would expect the EPC data to be lower even if no EPCs were missing…
Figure 3.7 suggests that rates vary considerably by MSOA but are relatively consistent across the two baseline ‘truth’ estimates with the exception of E02003577 which appears to have many more EPCs than Census 2011 households. It is worth noting that this MSOA covers the city centre and dock areas which have had substantial new build since 2011 and so may have households inhabiting dwellings that did not exist at Census 2011. This is also supported by the considerably higher EPC derived energy demand data compared to BEIS’s 2018 data - although it suggests the dwellings are either very new (since 2018) or are yet to be occupied.
ggpairs(sotonMSOA_DT[, .(nHHs_tenure, nElecMeters, nEPCs)])
Figure 3.7: Pairs plot of estimates of meters, households and EPCs by MSOA
Figure 3.8 (see Table 7.1 below for details) extends this analysis to show the % missing compared to the relevant baseline coloured by the % of owner-occupied dwellings in the MSOA according to Census 2011. As we would expect given the EPC inspection process, those MSOAs with the lowest EPC coverage on both baseline measures tend to have higher proportions of owner occupiers and therefore are likely to have more dwellings that have never required an EPC inspection.
t <- sotonMSOA_DT[, .(MSOAName, MSOACode, nHHs_tenure,nElecMeters,nEPCs,
dep0_pc, socRent_pc, privRent_pc, ownerOcc_pc,sumEpcMWh, beisEnergyMWh )]
t[, pc_missingHH := makePC(nEPCs,nHHs_tenure,1)]
t[, pc_missingMeters := makePC(nEPCs,nElecMeters,1)]
t[, pc_energyBEIS := makePC(sumEpcMWh,beisEnergyMWh,1)]
kt1 <- t
ggplot2::ggplot(t, aes(x = pc_missingHH,
y = pc_missingMeters,
colour = round(ownerOcc_pc))) +
geom_abline(alpha = 0.2, slope=1, intercept=0) +
geom_point() +
scale_color_continuous(name = "% owner occupiers \n(Census 2011)", high = "red", low = "green") +
#theme(legend.position = "bottom") +
labs(x = "EPCs 2020 as % of Census 2011 households",
y = "EPCs 2020 as % of electricity meters 2018",
caption = "x = y line included for clarity")
Figure 3.8: % ‘missing’ rates comparison
outlierMSOA <- t[pc_missingHH > 100]
We can use the same approach to compare estimates of total energy demand at the MSOA level. To do this we compare:
current primary energy
(space heating, hot water and lighting) and of course also suffers from missing EPCs (see above)We should not expect the values to match but we might reasonably expect a correlation.
ggplot2::ggplot(t, aes(x = sumEpcMWh,
y = beisEnergyMWh,
colour = round(ownerOcc_pc))) +
geom_abline(alpha = 0.2, slope=1, intercept=0) +
geom_point() +
scale_color_continuous(name = "% owner occupiers \n(Census 2011)", high = "red", low = "green") +
#theme(legend.position = "bottom") +
labs(x = "EPC 2020 derived total MWh/year",
y = "BEIS 2018 derived total MWh/year",
caption = "x = y line included for clarity")
Figure 3.9: Energy demand comparison
outlier <- t[sumEpcMWh > 70000]
Figure 3.9 shows that both of these are true. MSOAs with a high proportion of owner occupiers (and therefore more likely to have missing EPCs) tend to have higher observed energy demand than the EOC data suggests - they are above the reference line. MSOAs with a lower proportion of owner occupiers (and therefore more likely to have more complete EPC coverage) tend to be on or below the line. As before we have the same notable outlier (E02003577) and for the same reasons… In this case this produces a much higher energy demand estimate than the BEIS 2018 data records.
While we’re here we’ll also check the BEIS data. Table 3.7 shows the five highest and lowest MSOAs by annual electricity use.
t1 <- head(sotonMSOA_DT[, .(MSOA11NM, MSOA11CD, beisElecMWh, nElecMeters,
beisGasMWh, nGasMeters)][order(-beisElecMWh)],5)
kableExtra::kable(t1, caption = "Southampton MSOAs: BEIS 2018 energy data ordered by highest electricity (top 5)") %>%
kable_styling()
MSOA11NM | MSOA11CD | beisElecMWh | nElecMeters | beisGasMWh | nGasMeters |
---|---|---|---|---|---|
Southampton 029 | E02003577 | 27352.70 | 6734 | 20108.63 | 2420 |
Southampton 014 | E02003562 | 14757.18 | 3921 | 36532.48 | 2983 |
Southampton 022 | E02003570 | 14719.37 | 4142 | 34730.60 | 3083 |
Southampton 031 | E02003579 | 13860.94 | 4460 | 34052.12 | 3068 |
Southampton 021 | E02003569 | 13719.22 | 3999 | 27661.45 | 2722 |
t2 <- tail(sotonMSOA_DT[, .(MSOA11NM, MSOA11CD, beisElecMWh, nElecMeters,
beisGasMWh, nGasMeters)][order(-beisElecMWh)],5)
kableExtra::kable(t2, caption = "Southampton MSOAs: BEIS 2018 energy data ordered by lowest electricity (bottom 5)") %>%
kable_styling()
MSOA11NM | MSOA11CD | beisElecMWh | nElecMeters | beisGasMWh | nGasMeters |
---|---|---|---|---|---|
Southampton 024 | E02003572 | 9347.893 | 2597 | 30332.49 | 2381 |
Southampton 018 | E02003566 | 9221.544 | 2831 | 26826.22 | 2607 |
Southampton 008 | E02003556 | 9199.673 | 2589 | 26412.36 | 2295 |
Southampton 003 | E02003551 | 8957.742 | 2446 | 17358.87 | 1649 |
Southampton 005 | E02003553 | 8479.993 | 2464 | 24996.91 | 2303 |
We have identified some issues with a small number of the properties in the EPC dataset. These are not unexpected given that much of the estimates rely on partial or presumed data. Data entry errors are also quite likely. As a result we exclude:
finalEPCDT <- sotonUniqueEPCsDT[ENERGY_CONSUMPTION_CURRENT > 0 &
TOTAL_FLOOR_AREA > 5 &
CO2_EMISSIONS_CURRENT > 0]
skimr::skim(finalEPCDT)
## Warning: Couldn't find skimmers for class: integer64; No user-defined `sfl` provided. Falling back to
## `character`.
Name | finalEPCDT |
Number of rows | 71502 |
Number of columns | 24 |
_______________________ | |
Column type frequency: | |
character | 12 |
Date | 2 |
numeric | 10 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
BUILDING_REFERENCE_NUMBER | 0 | 1.00 | 17 | 21 | 0 | 71502 | 0 |
LMK_KEY | 0 | 1.00 | 29 | 34 | 0 | 71502 | 0 |
PROPERTY_TYPE | 0 | 1.00 | 4 | 10 | 0 | 5 | 0 |
BUILT_FORM | 0 | 1.00 | 8 | 20 | 0 | 7 | 0 |
TENURE | 0 | 1.00 | 0 | 16 | 1905 | 6 | 0 |
POSTCODE | 0 | 1.00 | 8 | 8 | 0 | 5105 | 0 |
LOCAL_AUTHORITY_LABEL | 0 | 1.00 | 11 | 11 | 0 | 1 | 0 |
hasWind | 5546 | 0.92 | 2 | 3 | 0 | 2 | 0 |
hasPV | 38495 | 0.46 | 2 | 3 | 0 | 2 | 0 |
consFlag | 0 | 1.00 | 8 | 8 | 0 | 1 | 0 |
emissionsFlag | 0 | 1.00 | 9 | 10 | 0 | 2 | 0 |
floorFlag | 0 | 1.00 | 5 | 5 | 0 | 1 | 0 |
Variable type: Date
skim_variable | n_missing | complete_rate | min | max | median | n_unique |
---|---|---|---|---|---|---|
LODGEMENT_DATE | 0 | 1 | 2008-10-01 | 2020-06-30 | 2014-10-22 | 4132 |
INSPECTION_DATE | 0 | 1 | 2007-03-02 | 2020-06-30 | 2014-10-14 | 3906 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
ENVIRONMENT_IMPACT_CURRENT | 0 | 1.00 | 62.51 | 15.72 | 1.00 | 52.0 | 63.00 | 73 | 100.00 | ▁▂▆▇▂ |
ENERGY_CONSUMPTION_CURRENT | 0 | 1.00 | 263.23 | 140.47 | 4.00 | 174.0 | 233.00 | 327 | 1597.00 | ▇▂▁▁▁ |
CO2_EMISSIONS_CURRENT | 0 | 1.00 | 3.17 | 1.94 | 0.10 | 1.8 | 2.85 | 4 | 77.00 | ▇▁▁▁▁ |
PHOTO_SUPPLY | 38495 | 0.46 | 0.59 | 5.11 | 0.00 | 0.0 | 0.00 | 0 | 100.00 | ▇▁▁▁▁ |
WIND_TURBINE_COUNT | 5546 | 0.92 | 0.00 | 0.02 | -1.00 | 0.0 | 0.00 | 0 | 1.00 | ▁▁▇▁▁ |
TOTAL_FLOOR_AREA | 0 | 1.00 | 73.05 | 34.86 | 5.85 | 49.0 | 69.00 | 87 | 1353.68 | ▇▁▁▁▁ |
epcIsSocialRent | 0 | 1.00 | 0.21 | 0.40 | 0.00 | 0.0 | 0.00 | 0 | 1.00 | ▇▁▁▁▂ |
epcIsPrivateRent | 0 | 1.00 | 0.27 | 0.44 | 0.00 | 0.0 | 0.00 | 1 | 1.00 | ▇▁▁▁▃ |
epcIsOwnerOcc | 0 | 1.00 | 0.41 | 0.49 | 0.00 | 0.0 | 0.00 | 1 | 1.00 | ▇▁▁▁▆ |
epcIsUnknownTenure | 0 | 1.00 | 0.04 | 0.19 | 0.00 | 0.0 | 0.00 | 0 | 1.00 | ▇▁▁▁▁ |
This leaves us with a total of 71,502 properties.
finalEPCDT[, POSTCODE_s := stringr::str_remove_all(POSTCODE, " ")]
sotonPostcodesReducedDT[, POSTCODE_s := stringr::str_remove_all(pcds, " ")]
setkey(finalEPCDT, POSTCODE_s)
setkey(sotonPostcodesReducedDT, POSTCODE_s)
dt <- sotonPostcodesReducedDT[finalEPCDT]
dt[, MSOACode := msoa11]
setkey(dt, MSOACode)
setkey(sotonCensus2011_DT, MSOACode)
dt <- sotonCensus2011_DT[dt]
of <- path.expand("~/data/EW_epc/domestic-E06000045-Southampton/EPCs_liveFinalClean.csv")
data.table::fwrite(dt, file = of)
message("Gziping ", of)
## Gziping /Users/ben/data/EW_epc/domestic-E06000045-Southampton/EPCs_liveFinalClean.csv
# Gzip it
# in case it fails (it will on windows - you will be left with a .csv file)
try(system( paste0("gzip -f '", of,"'"))) # include ' or it breaks on spaces
message("Gzipped ", of)
## Gzipped /Users/ben/data/EW_epc/domestic-E06000045-Southampton/EPCs_liveFinalClean.csv
NB: this failed to match an EPC postcode to an MSOA for 72 EPCs The table below shows which postcodes these were by date.
dt[is.na(MSOACode), .(nEPCs = .N), keyby = .(POSTCODE_s, TENURE, INSPECTION_DATE)]
## POSTCODE_s TENURE INSPECTION_DATE nEPCs
## 1: SO156GB unknown 2015-04-08 3
## 2: SO156GB unknown 2015-06-09 1
## 3: SO156GB unknown 2015-07-06 24
## 4: SO156GB unknown 2015-08-04 12
## 5: SO156GB unknown 2015-08-14 9
## 6: SO156GB unknown 2015-08-19 1
## 7: SO160AL 2009-02-17 4
## 8: SO160AL rental (private) 2019-11-20 2
## 9: SO162AJ unknown 2018-03-22 2
## 10: SO167HE owner-occupied 2009-11-30 1
## 11: SO167HE owner-occupied 2017-10-04 1
## 12: SO168AD 2011-03-18 1
## 13: SO168AD owner-occupied 2008-10-01 1
## 14: SO168AD owner-occupied 2009-09-29 1
## 15: SO168AD owner-occupied 2019-04-08 1
## 16: SO168AD owner-occupied 2020-06-17 1
## 17: SO181HS rental (private) 2010-04-27 1
## 18: SO185BR owner-occupied 2018-06-05 1
## 19: SO185BS owner-occupied 2012-02-15 1
## 20: SO185BS owner-occupied 2018-03-14 1
## 21: SO196AQ unknown 2014-11-05 3
## POSTCODE_s TENURE INSPECTION_DATE nEPCs
Finally we save the MSOA table into the repo data directory for future use. We don’t usually advocate keeping data in a git repo but this is small, aggregated and mostly harmless.
of <- here::here("data", "sotonMSOAdata.csv")
data.table::fwrite(sotonMSOA_DT, of)
message("Saved ", nrow(sotonMSOA_DT), " rows of data.")
## Saved 32 rows of data.
kableExtra::kable(kt1[order(-pc_missingHH)], digits = 2, caption = "EPC records as a % of n census households and n meters per MSOA") %>%
kable_styling()
MSOAName | MSOACode | nHHs_tenure | nElecMeters | nEPCs | dep0_pc | socRent_pc | privRent_pc | ownerOcc_pc | sumEpcMWh | beisEnergyMWh | pc_missingHH | pc_missingMeters | pc_energyBEIS |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Southampton 029 | E02003577 | 4908 | 6734 | 5917 | 37.92 | 27.61 | 43.09 | 24.67 | 77383.23 | 47461.33 | 120.6 | 87.9 | 163.0 |
Southampton 023 | E02003571 | 3040 | 3530 | 2958 | 47.96 | 20.23 | 56.97 | 21.48 | 47820.53 | 33485.69 | 97.3 | 83.8 | 142.8 |
Southampton 017 | E02003565 | 2563 | 2840 | 2427 | 48.81 | 11.63 | 57.24 | 28.95 | 42637.78 | 35191.63 | 94.7 | 85.5 | 121.2 |
Southampton 022 | E02003570 | 3635 | 4142 | 3429 | 25.56 | 29.82 | 45.69 | 22.59 | 58338.87 | 49449.97 | 94.3 | 82.8 | 118.0 |
Southampton 031 | E02003579 | 3357 | 4460 | 3137 | 44.92 | 11.56 | 23.80 | 63.09 | 46309.49 | 47913.06 | 93.4 | 70.3 | 96.7 |
Southampton 013 | E02003561 | 3181 | 3489 | 2663 | 39.80 | 19.68 | 44.73 | 33.98 | 49231.81 | 38430.85 | 83.7 | 76.3 | 128.1 |
Southampton 021 | E02003569 | 3527 | 3999 | 2754 | 40.71 | 15.00 | 38.28 | 44.32 | 47718.63 | 41380.67 | 78.1 | 68.9 | 115.3 |
Southampton 009 | E02003557 | 2753 | 3103 | 2137 | 50.78 | 7.37 | 38.98 | 52.49 | 47067.97 | 42842.23 | 77.6 | 68.9 | 109.9 |
Southampton 020 | E02003568 | 3820 | 3900 | 2959 | 50.08 | 4.53 | 50.92 | 42.93 | 53468.76 | 47024.09 | 77.5 | 75.9 | 113.7 |
Southampton 010 | E02003558 | 2924 | 3222 | 2223 | 33.96 | 32.32 | 25.82 | 39.81 | 38565.55 | 34421.99 | 76.0 | 69.0 | 112.0 |
Southampton 015 | E02003563 | 3483 | 3818 | 2553 | 37.81 | 21.79 | 24.17 | 51.79 | 46524.57 | 39920.85 | 73.3 | 66.9 | 116.5 |
Southampton 027 | E02003575 | 2808 | 2987 | 2028 | 29.59 | 51.14 | 8.23 | 37.64 | 36550.74 | 28941.46 | 72.2 | 67.9 | 126.3 |
Southampton 007 | E02003555 | 3140 | 3763 | 2261 | 34.59 | 30.96 | 11.15 | 56.15 | 35181.90 | 40416.83 | 72.0 | 60.1 | 87.0 |
Southampton 005 | E02003553 | 2394 | 2464 | 1687 | 39.01 | 25.44 | 40.27 | 32.00 | 31991.63 | 33476.91 | 70.5 | 68.5 | 95.6 |
Southampton 014 | E02003562 | 3636 | 3921 | 2513 | 45.68 | 9.13 | 29.24 | 59.19 | 47128.06 | 51289.66 | 69.1 | 64.1 | 91.9 |
Southampton 032 | E02003580 | 2617 | 2825 | 1804 | 27.21 | 55.48 | 6.65 | 35.69 | 31120.78 | 24488.16 | 68.9 | 63.9 | 127.1 |
Southampton 025 | E02003573 | 3236 | 3470 | 2106 | 29.57 | 43.54 | 6.12 | 47.84 | 38992.17 | 41714.91 | 65.1 | 60.7 | 93.5 |
Southampton 003 | E02003551 | 2256 | 2446 | 1456 | 33.69 | 38.96 | 15.29 | 42.95 | 27395.69 | 26316.61 | 64.5 | 59.5 | 104.1 |
Southampton 012 | E02003560 | 3040 | 3191 | 1952 | 26.97 | 53.52 | 8.75 | 36.12 | 33862.76 | 34252.94 | 64.2 | 61.2 | 98.9 |
Southampton 006 | E02003554 | 2646 | 2873 | 1684 | 46.49 | 14.55 | 21.05 | 63.00 | 35570.20 | 39712.77 | 63.6 | 58.6 | 89.6 |
Southampton 004 | E02003552 | 2646 | 2809 | 1653 | 28.12 | 47.47 | 9.26 | 40.97 | 30104.34 | 28051.01 | 62.5 | 58.8 | 107.3 |
Southampton 016 | E02003564 | 3474 | 3563 | 2164 | 39.38 | 22.54 | 12.09 | 63.39 | 43679.63 | 43718.49 | 62.3 | 60.7 | 99.9 |
Southampton 018 | E02003566 | 2607 | 2831 | 1617 | 35.21 | 29.84 | 8.36 | 59.42 | 27441.65 | 36047.76 | 62.0 | 57.1 | 76.1 |
Southampton 028 | E02003576 | 3434 | 3614 | 2121 | 38.99 | 22.83 | 18.58 | 56.41 | 39556.61 | 44100.48 | 61.8 | 58.7 | 89.7 |
Southampton 001 | E02003549 | 2849 | 2832 | 1737 | 52.37 | 11.23 | 25.06 | 62.06 | 41619.53 | 49676.93 | 61.0 | 61.3 | 83.8 |
Southampton 002 | E02003550 | 3216 | 3527 | 1932 | 43.10 | 21.05 | 11.04 | 66.08 | 36605.55 | 41124.17 | 60.1 | 54.8 | 89.0 |
Southampton 019 | E02003567 | 2991 | 3200 | 1780 | 39.18 | 27.28 | 14.11 | 56.80 | 33255.30 | 43448.21 | 59.5 | 55.6 | 76.5 |
Southampton 026 | E02003574 | 3412 | 3599 | 1972 | 40.77 | 11.78 | 13.66 | 71.72 | 37589.53 | 45562.64 | 57.8 | 54.8 | 82.5 |
Southampton 030 | E02003578 | 2641 | 2830 | 1519 | 44.07 | 10.64 | 15.68 | 72.13 | 27644.41 | 36572.30 | 57.5 | 53.7 | 75.6 |
Southampton 024 | E02003572 | 2484 | 2597 | 1367 | 45.61 | 8.13 | 15.46 | 75.28 | 30078.45 | 39680.38 | 55.0 | 52.6 | 75.8 |
Southampton 011 | E02003559 | 3065 | 3165 | 1678 | 53.38 | 5.97 | 15.14 | 76.70 | 42376.73 | 55256.20 | 54.7 | 53.0 | 76.7 |
Southampton 008 | E02003556 | 2471 | 2589 | 1339 | 42.57 | 12.51 | 16.15 | 70.42 | 28093.93 | 35612.03 | 54.2 | 51.7 | 78.9 |
Allaire, JJ, Yihui Xie, Jonathan McPherson, Javier Luraschi, Kevin Ushey, Aron Atkins, Hadley Wickham, Joe Cheng, and Winston Chang. 2018. Rmarkdown: Dynamic Documents for R. https://CRAN.R-project.org/package=rmarkdown.
Dowle, M, A Srinivasan, T Short, S Lianoglou with contributions from R Saporta, and E Antonyan. 2015. Data.table: Extension of Data.frame. https://CRAN.R-project.org/package=data.table.
Wickham, Hadley. 2009. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. http://ggplot2.org.
Wickham, Hadley, and Jennifer Bryan. 2017. Readxl: Read Excel Files. https://CRAN.R-project.org/package=readxl.
Xie, Yihui. 2016a. Bookdown: Authoring Books and Technical Documents with R Markdown. Boca Raton, Florida: Chapman; Hall/CRC. https://github.com/rstudio/bookdown.
———. 2016b. Knitr: A General-Purpose Package for Dynamic Report Generation in R. https://CRAN.R-project.org/package=knitr.
Zhu, Hao. 2018. KableExtra: Construct Complex Table with ’Kable’ and Pipe Syntax. https://CRAN.R-project.org/package=kableExtra.