Skip to content
Snippets Groups Projects
Commit 6cad116c authored by Ben Anderson's avatar Ben Anderson
Browse files

ran to html. again. please please can we have pages on gitlab

parent 558ba988
No related branches found
No related tags found
No related merge requests found
This diff is collapsed.
......@@ -57,14 +57,26 @@ lsoaCensusPath <- paste0(dPath, "/UK_Census/2011Data/EW/lsoa/processed")
oacPath <- paste0(dPath, "/UK OAC/2011/CDRC/Tables")
beisPath <- paste0(dPath, "/DECC/SubnationalEnergyData/2010-2018")
nsplPath <- "~/Data/UK_NSPL/NSPL_FEB_2020_UK/"
# residential consumption as % of total
consRatioThreshold <- 0.90
# Functions ----
getSAVESampleArea <- function(dt){
# filters and returns just those areas which were in the original SAVE sample frame
# requires LAName
saveDT <- dt[LAName %like% "Southampton" | LAName %like% "Portsmouth" |
LAName %like% "Isle of Wight" | LAName %like% "Hampshire"]
return(saveDT)
}
```
# Intro
Fractional weights are sufficient for the calculation of area level weighted statistics. However a number of applications such as local electricity network demand modelling or agent-based transport/mobility models require household units - implying the need for integerisation of weights to enable the selection of specific households to fit a given area. Lovelace & Ballas introduced a new approach to taking fracitonal weights generated using IPF and integerising them to give exact numbers of households per unit area. Their work showed that their method produced better fitting populations than other integerisation approaches (internal validation) but did not provide external validation to demonsrate the extent to which their approach reproduced the accuracy of fractional weights when calculating area level statistics. This paper estimates the mean and median annual household electricity consumption (in kWh) for English MSOAs using standard IPF and Lovelace and Ballas' approaches. It then comapres the results with observed MSOA level electricity consumption for areas where residential consumption is more than 90% of total electricity consumption.
Fractional weights are sufficient for the calculation of area level weighted statistics. However a number of applications such as local electricity network demand modelling or agent-based transport/mobility models require household units - implying the need for integerisation of weights to enable the selection of specific households to fit a given area. Lovelace & Ballas introduced a new approach to taking fracitonal weights generated using IPF and [integerising them](https://spatial-microsim-book.robinlovelace.net/smsimr.html#sintegerisation) to give exact numbers of households per unit area. Their work showed that their method produced better fitting populations than other integerisation approaches (internal validation) but did not provide external validation to demonsrate the extent to which their approach reproduced the accuracy of fractional weights when calculating area level statistics. This paper estimates the mean and median annual household electricity consumption (in kWh) for English MSOAs using standard IPF and Lovelace and Ballas' approaches. It then comapres the results with observed MSOA level electricity consumption for areas where residential consumption is more than 90% of total electricity consumption.
# Data prep
......@@ -77,33 +89,45 @@ Various datasets can be linked at these levels:
* MSOA
* BEIS domestic gas/elec (annual kWh)
* BEIS non-domestic gas/elec (annual kWh)
Note that the [BEIS documentation](https://www.gov.uk/government/publications/regional-energy-data-guidance-note) states:
* Gas:
* Annualised and weather corrected
* Consumers using less than 73,200 kWh a year are classified as domestic.
* Electricity:
* Annualised, not weather corrected
* Excludes: NI, Central Volume Allocation (CVA) users; large industrial consumers who receive their electricity through high voltage lines of the transmission system and hence have different arrangements with their electricity suppliers; Electricity used by companies that generate their own electricity and consume it without passing over the public distribution network
* The automatic cut-off point for non-domestic consumption is 100,000 kWh. Domestic consumers with consumption of between 50,000 and 100,000 kWh is reallocated to the non-domestic sector following a validation process
## Load MSOA electricity data
We might as well work at MSOA level rather than try to aggreate LSOA upwards.
```{r loadBeisMSOA}
```{r loadMSOA}
# BEIS data
dtMsoaDElec <- data.table::fread(paste0(beisPath, "/MSOA_DOM_ELEC_csv/MSOA_ELEC_2018.csv"))
dtMsoaDElec[, nDElecMeters := METERS]
dtMsoaDElec[, kwhDElec := KWH]
dtMsoaDElec[, kwhDMeanElec := MEAN]
dtMsoaDElec[, kwhDMedianElec := MEDIAN]
# need to select only the relabelled vars for clarity
dtMsoaDElec <- dtMsoaDElec[, .(LAName, LACode, MSOAName, MSOACode,
nDElecMeters, kwhDElec, kwhDMeanElec, kwhDMedianElec)]
dtMsoaDGas <- data.table::fread(paste0(beisPath, "/MSOA_DOM_GAS_csv/MSOA_GAS_2018.csv"))
dtMsoaDGas[, nDGasMeters := METERS]
dtMsoaDGas[, kwhDGas := KWH]
dtMsoaDGas[, kwhDMeanGas := MEAN]
dtMsoaDGas[, kwhDMedianGas := MEDIAN]
dtMsoaDGas <- dtMsoaDGas[, .(MSOACode, nDGasMeters, kwhDGas, kwhDMeanGas, kwhDMedianGas)]
dtMsoaNDElec <- data.table::fread(paste0(beisPath, "/MSOA_ND_ELEC_csv/MSOA_NONDOM_ELEC_2018.csv"))
dtMsoaNDElec[, nNDElecMeters := METERS]
dtMsoaNDElec[, kwhNDElec := KWH]
dtMsoaNDElec[, kwhNDMeanElec := MEAN]
dtMsoaNDElec[, kwhNDMedianElec := MEDIAN]
dtMsoaNDElec <- dtMsoaNDElec[, .(MSOACode, nNDElecMeters, kwhNDElec, kwhNDMeanElec, kwhNDMedianElec)]
dtMsoaNDGas <- data.table::fread(paste0(beisPath, "/MSOA_ND_GAS_csv/MSOA_NONDOM_GAS_2018.csv"))
......@@ -111,43 +135,185 @@ dtMsoaNDGas[, nNDGasMeters := METERS]
dtMsoaNDGas[, kwhNDGas := KWH]
dtMsoaNDGas[, kwhNDMeanGas := MEAN]
dtMsoaNDGas[, kwhNDMedianGas := MEDIAN]
dtMsoaNDGas <- dtMsoaNDGas[, .(MSOACode, nNDGasMeters, kwhNDGas, kwhNDMeanGas, kwhNDMedianGas)]
setkey(dtMsoaDElec, MSOACode)
setkey(dtMsoaNDElec, MSOACode)
setkey(dtMsoaDGas, MSOACode)
setkey(dtMsoaNDGas, MSOACode)
# NSPL-based LUT with some useful filter columns
dtMsoaLUT <- data.table::fread(paste0(nsplPath, "processed/msoaDT.csv"))
dtMsoaLUT[, MSOACode := msoa11]
setkey(dtMsoaLUT, MSOACode)
```
For now what we want to do is flag the MSOAs where most energy demand is residential - we can then use these for validation against the small area model-based estimates.
At this point it is interesting to note that the various MSOA files have different numbers of rows:
* MSOA look up table from NSPL: `r nrow(dtMsoaLUT)` rows with `r uniqueN(dtMsoaLUT$MSOACode)` unique MSOAs
* domestic elec: `r nrow(dtMsoaDElec)` rows with `r uniqueN(dtMsoaDElec$MSOACode)` unique MSOAs
* domestic gas: `r nrow(dtMsoaDGas)` rows with `r uniqueN(dtMsoaDGas$MSOACode)` unique MSOAs
* non domestic elec: `r nrow(dtMsoaNDElec)` rows with `r uniqueN(dtMsoaNDElec$MSOACode)` unique MSOAs
* nonndomestic gas: `r nrow(dtMsoaNDGas)` rows with `r uniqueN(dtMsoaNDGas$MSOACode)` unique MSOAs
Why might this be? It only applies to the non-domestic electricity data.
```{r testdtMsoaNDElec}
dtMsoaDElec[, dElecMSOA := 1] # to trace MSOA sources
dtMsoaDGas[, dGasMSOA := 1]
dtMsoaNDGas[, ndGasMSOA := 1]
dtMsoaNDElec[, ndElecMSOA := 1]
# merge to the MSOA lookup
dt <- dtMsoaLUT[dtMsoaNDElec] # merge to get useful labels
skim(dt)
```
So we get `r nrow(dt[is.na(cty)])` rows where the MSOA look up table columns are NA. So these must be the extra ones. What's in them?
```{r testNA}
ht <- head(dt[is.na(cty), .(MSOACode,nNDElecMeters,kwhNDElec,
kwhNDMeanElec,kwhNDMedianElec,ndElecMSOA)])
kableExtra::kable(ht, caption = "First few rows of 'extra' MSOAs") %>%
kable_styling()
```
So it seems that the half-hourly electricity consumption data which comes from "larger non-domestic consumers" has not been allocated to MSOAs. Indeed the guodance states:
>"Half Hourly consumption (consumption by the larger non-domestic customers) totals, at a local authority level are provided in the Middle Layer Super Output Area (MSOA) level datasets"
Note that there are no LA codes in this dataset so these totals cannot be allocated to partoicuaor aggregations of MSOAs using this data.
```{r testHalfHourlyNDElec}
dtMsoaNDElec[, source := ifelse(MSOACode == "Half-Hourly", "Half-Hourly - large", "Annualised - small")]
totKwh <- sum(dtMsoaNDElec$kwhNDElec)
t <- dtMsoaNDElec[, .(nObs = .N,
sumkWh = sum(kwhNDElec),
meankWh = mean(as.numeric(kwhNDElec)), # careful
mean_meankWh = mean(kwhNDMeanElec),
nMeters = sum(nNDElecMeters),
pcTotal = 100*(sum(kwhNDElec)/totKwh)),
keyby = .(source)]
kableExtra::kable(t, caption = "Sum, mean and % of total non-domestic electricity by source",
digits = 3) %>%
kable_styling()
```
As we can see the mean of the means is considerably smaller than the mean of the individual observations in both cases.
Overall, this is OK for our purposes but might cause problems for someone who thought they had captured _all_ non-domestic consumption at MSOA level - there would be a lot missing!
```{r linkBeisMSOA}
setkey(dtMsoaDElec, MSOACode)
setkey(dtMsoaNDElec, MSOACode)
setkey(dtMsoaDGas, MSOACode)
setkey(dtMsoaNDGas, MSOACode)
dtMsoaSpine <- dtMsoaDElec[dtMsoaDGas]
dtMsoaSpine <- dtMsoaSpine[dtMsoaDGas]
dtMsoaSpine <- dtMsoaSpine[dtMsoaNDGas]
dtMsoaSpine <- dtMsoaSpine[dtMsoaNDElec]
dtMsoaSpine <- dtMsoaDElec[dtMsoaDGas] # domestic
dtMsoaSpine <- dtMsoaSpine[dtMsoaNDGas] # ND gas
# lose the unallocated half-hour data
dtMsoaSpine <- dtMsoaSpine[dtMsoaNDElec[source != "Half-Hourly - large"]] # ND elec
dtMsoaSpine[, elecRatio := kwhDElec/(kwhDElec + kwhNDElec)]
dtMsoaSpine[, gasRatio := kwhDGas/(kwhNDGas + kwhDGas)]
t <- dtMsoaSpine[order(-elecRatio)]
kableExtra::kable(head(t[, .(MSOAName, MSOACode,
elecRatio, kwhDElec, kwhNDElec,
gasRatio, kwhDGas, kwhNDGas)],10),
caption = "All MSOAs") %>%
kable_styling()
dtMsoaSpinel <- dtMsoaLUT[dtMsoaSpine]
dtMsoaSpineIoW <- dtMsoaSpine[MSOAName %like% "Wight"]
# how many > X%?
# All MSOAs > consRatioThreshold
nrow(dtMsoaSpine[elecRatio > consRatioThreshold])
n <- nrow(dtMsoaSpinel[elecRatio > consRatioThreshold])
kableExtra::kable(head(dtMsoaSpinel[elecRatio > consRatioThreshold, .(MSOAName, MSOACode,
elecRatio, kwhDElec, kwhNDElec,
gasRatio, kwhDGas, kwhNDGas)],10),
caption = paste0("All ",n," English MSOAs where % domestic electricity > ",
consRatioThreshold,
" - first few rows")
) %>%
kable_styling()
# SE region?
seDT <- dtMsoaSpinel[GOR10NM == "South East"]
n <- nrow(seDT[elecRatio > consRatioThreshold])
kableExtra::kable(head(seDT[elecRatio > consRatioThreshold, .(MSOAName, MSOACode,
elecRatio, kwhDElec, kwhNDElec,
gasRatio, kwhDGas, kwhNDGas)],10),
caption = paste0("All ",n," south east England MSOAs where % domestic electricity > ",
consRatioThreshold,
" - first few rows")
) %>%
kable_styling()
# Solent sample region?
saveDT <- getSAVESampleArea(dtMsoaSpinel)
n <- nrow(saveDT[elecRatio > consRatioThreshold])
kableExtra::kable(head(saveDT[elecRatio > consRatioThreshold, .(MSOAName, MSOACode,
elecRatio, kwhDElec, kwhNDElec,
gasRatio, kwhDGas, kwhNDGas)],10),
caption = paste0("All ",n," Solent region MSOAs where % domestic electricity > ",
consRatioThreshold,
" - first few rows")
) %>%
kable_styling()
```
These would be the MSOAs to use if we wanted to get LV network data that _should_ mostly be due to domestic consumption.
But in this case, we don't. We just need to simulate mean annual kWh for each LSOA in the Solent area (as that matches our sample) using the two methods and then validate against the appropriate BEIS data!
## LSOA data
Load domestic elec data
```{r loadBeisLSOA}
dtLsoaElec <- data.table::fread(paste0(beisPath, "/LSOA_DOM_ELEC_csv/LSOA_ELEC_2018.csv"))
# to prevent confusion
dtLsoaElec[, nDElecMeters := METERS]
dtLsoaElec[, kwhDElec := KWH]
dtLsoaElec[, kwhDMeanElec := MEAN]
dtLsoaElec[, kwhDMedianElec := MEDIAN]
dtLsoaGas <- data.table::fread(paste0(beisPath, "/LSOA_DOM_GAS_csv/LSOA_GAS_2018.csv"))
dtLsoaGas[, nDGasMeters := METERS]
dtLsoaGas[, kwhDGas := KWH]
dtLsoaGas[, kwhDMeanGas := MEAN]
dtLsoaGas[, kwhDMedianGas := MEDIAN]
# Select Solent region sample
dtSaveLSOAElec <- getSAVESampleArea(dtLsoaElec)
# plot by LA for fun
ggplot2::ggplot(dtSaveLSOAElec, aes(x = LAName, group = LAName, y = kwhDElec/1000)) +
geom_boxplot() +
labs(y = "MWh",
caption = "Total annualised electricity consumption per LSOA in 2018 by local authority")
```
We've probably got off-gas electric heating in East Hampshire?
# Spatial microsim
Load SAVE LSOA level weights
```{r loadSAVEweights}
```
# Comparison of results with LSOA level data
# Conclusions
# Annexes
## Beis data
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment