Skip to content
Snippets Groups Projects
Commit 3c9c9317 authored by Ben Anderson's avatar Ben Anderson
Browse files

Merge branch 'master' into 'master'

various fixes to postcode district data

See merge request !8
parents 154c05f9 18afacbf
No related branches found
No related tags found
No related merge requests found
# useful functions
# use source(here::here("R", "functions.R")) to load
require(flextable) # use require so it fails if not present & can't install
makeFlexTable <- function(df, cap = "caption"){
# makes a pretty flextable - see https://cran.r-project.org/web/packages/flextable/index.html
ft <- flextable::flextable(df)
ft <- flextable::colformat_double(ft, digits = 1)
ft <- flextable::fontsize(ft, size = 9)
ft <- flextable::fontsize(ft, size = 10, part = "header")
ft <- flextable::set_caption(ft, caption = cap)
return(flextable::autofit(ft))
}
\ No newline at end of file
This diff is collapsed.
......@@ -47,71 +47,80 @@ library(ggplot2)
```
# It's the cats, stupid
Inspired by `@giulio_mattioli`'s [recent paper on the car dependence of dog ownership](https://twitter.com/giulio_mattioli/status/1466361022747455492) we thought we'd take a look at [cats](https://twitter.com/giulio_mattioli/status/1466710752606179331) and residential energy demand. Why? Well people like to keep their cats warm but, more importantly, they also cut big holes in doors and/or windows to let the cats in and out. Hardly a thermally sealed envelope!
Inspired by `@giulio_mattioli`'s [recent paper on the car dependence of dog ownership](https://twitter.com/giulio_mattioli/status/1466361022747455492) we thought we'd take a look at [cats](https://twitter.com/giulio_mattioli/status/1466710752606179331) and residential energy demand. Why? Well people like to keep their cats warm but, more importantly, they also cut big holes in doors and/or windows to let the cats in and out. Hardly a thermally sealed envelope...
# What's the data?
For now we're using:
We could also use `@SERL_UK`'s [smart meter gas/elec data](https://twitter.com/dataknut/status/1466712963222540289?s=20), dwelling characteristics and pet ownership (but no species detail :-)
So for now we're using:
* postcode sector level estimates of cat ownership in the UK in 2015. Does such a thing exist? [YEAH](https://data.gov.uk/dataset/febd29ff-7e7d-4f82-9908-031f7f0e0860/cat-population-per-postcode-district)! "_This dataset gives the mean estimate for population for each district, and was generated as part of the delivery of commissioned research. The data contained within this dataset are modelled figures, based on national estimates for pet population, and available information on Veterinary activity across GB. The data are accurate as of 01/01/2015. The data provided are summarised to the postcode district level. Further information on this research is available in a research publication by James Aegerter, David Fouracre & Graham C. Smith, discussing the structure and density of pet cat and dog populations across Great Britain._"
* experimental postcode level data on domestic [gas](https://www.gov.uk/government/collections/sub-national-gas-consumption-data) and [electricity](https://www.gov.uk/government/collections/sub-national-electricity-consumption-data) 'consumption' for 2015 aggregated to postcode sectors
* postcode district level estimates of cat ownership in the UK in 2015. Does such a thing exist? [YEAH](https://data.gov.uk/dataset/febd29ff-7e7d-4f82-9908-031f7f0e0860/cat-population-per-postcode-district)! "_This dataset gives the mean estimate for population for each district, and was generated as part of the delivery of commissioned research. The data contained within this dataset are modelled figures, based on national estimates for pet population, and available information on Veterinary activity across GB. The data are accurate as of 01/01/2015. The data provided are summarised to the postcode district level. Further information on this research is available in a research publication by James Aegerter, David Fouracre & Graham C. Smith, discussing the structure and density of pet cat and dog populations across Great Britain._"
* experimental postcode level data on domestic [gas](https://www.gov.uk/government/collections/sub-national-gas-consumption-data) and [electricity](https://www.gov.uk/government/collections/sub-national-electricity-consumption-data) 'consumption' for 2015 aggregated to postcode districts
```{r loadCats}
# cats
cats_DT <- data.table::fread(paste0(dp, "UK_Animal and Plant Health Agency/APHA0372-Cat_Density_Postcode_District.csv"))
cats_DT[, pcd_sector := PostcodeDistrict]
cats_DT[, pcd_district := PostcodeDistrict]
setkey(cats_DT, pcd_sector)
setkey(cats_DT, pcd_district)
nrow(cats_DT)
setkey(pc_sector_energy_dt, pcd_sector)
setkey(pc_district_energy_dt, pcd_district)
nrow(pc_sector_energy_dt)
nrow(pc_district_energy_dt)
pc_district <- merge(cats_DT, pc_sector_energy_dt , by = "pcd_sector") # keeps only postcode sectors where we have cat data
pc_district <- pc_district_energy_dt[cats_DT] # keeps only postcode districts where we have cat data
# this may include areas where we have no energy data
nrow(pc_district)
# there are postcode sectors with no electricity meters - for now we'll remove them
nrow(pc_district[!is.na(GOR10NM)])
# there are postcode districts with no electricity meters - for now we'll remove them
# pending further investigation
summary(pc_district)
table(pc_district$GOR10NM, pc_district$rgn)
```
t <- pc_district[!is.na(GOR10NM), .(nPostcodeDistricts = .N,
sumCats = sum(EstimatedCatPopulation)), keyby=.(GOR10NM)]
We could also use `@SERL_UK`'s [smart meter gas/elec data](https://twitter.com/dataknut/status/1466712963222540289?s=20), dwelling characteristics and pet ownership (but no species detail :-)
t[, catsPerDistrict := sumCats/nPostcodeDistricts]
makeFlexTable(t, cap = "Regions covered")
```
# What do we find?
Well, in some places there seem to be a lot of estimated cats...
Well, in some places there seem to be a lot of estimated cats per household...
(We calculated mean cats per household by dividing by the number of electricity meters - probably a reasonable proxy)
```{r maxCats}
pc_district[, mean_Cats := EstimatedCatPopulation/nElecMeters]
head(pc_district[, .(PostcodeDistrict, EstimatedCatPopulation, mean_Cats, nPostcodes, nElecMeters)][order(-mean_Cats)])
t <- head(pc_district[, .(PostcodeDistrict, EstimatedCatPopulation, mean_Cats, nPostcodes, nElecMeters)][order(-mean_Cats)],10)
makeFlexTable(t, cap = "Top 10 postcode districts by number of cats per 'household'")
```
SA63 is in south west [Wales](https://www.google.co.uk/maps/place/Clarbeston+Road+SA63/@51.8852685,-4.9147384,12z/data=!3m1!4b1!4m5!3m4!1s0x4868d5805b12efe5:0xca42ee4bc84a2f77!8m2!3d51.8900045!4d-4.8502065) while LL23 is on the edge of the [Snowdonia National Park](https://www.google.co.uk/maps/place/Bala+LL23/@52.8953768,-3.7752989,11z/data=!3m1!4b1!4m5!3m4!1s0x4865404ae1208f67:0x65a437b997c0dfb2!8m2!3d52.8825403!4d-3.6497989)....
Do these places have some largish catteries but few houses? 8,233 is a lot of estimated cats.
## More dwellings, more cats?
Is there a correlation between estimated total cats and the number of dwellings (electricity meters)?
```{r testTotalElecMeters}
ggplot2::ggplot(pc_district, aes(x = nElecMeters , y = EstimatedCatPopulation,
ggplot2::ggplot(pc_district[!is.na(GOR10NM)], aes(x = nElecMeters , y = EstimatedCatPopulation,
colour = GOR10NM)) +
geom_point() +
geom_smooth()
```
# More cats, more gas?
## More cats, more gas?
Is there a correlation between estimated cat ownership and total gas use?
```{r testTotalGas}
ggplot2::ggplot(pc_district, aes(x = EstimatedCatPopulation, y = total_gas_kWh,
colour = GOR10NM)) +
ggplot2::ggplot(pc_district[!is.na(GOR10NM)],
aes(x = EstimatedCatPopulation, y = total_gas_kWh, colour = GOR10NM)) +
geom_smooth() +
geom_point()
```
......@@ -119,50 +128,55 @@ Or mean gas use and mean cats?
```{r testMeanGas}
pc_district[, mean_gas_kWh := total_gas_kWh/nGasMeters]
ggplot2::ggplot(pc_district, aes(x = mean_Cats, y = mean_gas_kWh, colour = GOR10NM)) +
ggplot2::ggplot(pc_district[!is.na(GOR10NM)],
aes(x = mean_Cats, y = mean_gas_kWh, colour = GOR10NM)) +
geom_smooth() +
geom_point()
```
# More cats, more electricity?
## More cats, more electricity?
Or total electricity use and cats?
```{r testTotalElec}
ggplot2::ggplot(pc_district, aes(x = EstimatedCatPopulation, y = total_elec_kWh, colour = GOR10NM)) +
ggplot2::ggplot(pc_district[!is.na(GOR10NM)], aes(x = EstimatedCatPopulation, y = total_elec_kWh, colour = GOR10NM)) +
geom_smooth() +
geom_point()
```
Or mean elec use and mean cats?
```{r testMeanElec}
pc_district[, mean_elec_kWh := total_elec_kWh/nGasMeters]
ggplot2::ggplot(pc_district, aes(x = mean_Cats, y = mean_elec_kWh, colour = GOR10NM)) +
pc_district[, mean_elec_kWh := total_elec_kWh/nElecMeters]
ggplot2::ggplot(pc_district[!is.na(GOR10NM)], aes(x = mean_Cats, y = mean_elec_kWh, colour = GOR10NM)) +
geom_smooth() +
geom_point()
```
# More cats, more energy?
## More cats, more energy?
Or total energy use and total cats?
```{r testTotalEnergy}
pc_district[, total_energy_kWh := total_gas_kWh + total_elec_kWh]
ggplot2::ggplot(pc_district, aes(x = EstimatedCatPopulation, y = total_energy_kWh, colour = GOR10NM)) +
geom_point() +
geom_smooth()
ggplot2::ggplot(pc_district[!is.na(GOR10NM)], aes(x = EstimatedCatPopulation, y = total_energy_kWh, colour = GOR10NM)) +
geom_smooth() +
geom_point()
```
Well, there may be something in there? Let's try a boxplot by cat deciles... Figure \@ref(fig:catDeciles)
Let's try a boxplot by cat deciles... Figure \@ref(fig:catDeciles) suggests the median energy use is higher in postcode districts with higher cat ownership.
```{r catDeciles, fig.cap = "Cat ownership deciles and total annual residenital electricity & gas use"}
pc_district[, cat_decile := dplyr::ntile(EstimatedCatPopulation, 10)]
#head(pc_district[is.na(cat_decile)])
ggplot2::ggplot(pc_district[!is.na(cat_decile)], aes(x = as.factor(cat_decile), y = total_energy_kWh/1000000)) +
ggplot2::ggplot(pc_district[!is.na(cat_decile) & !is.na(GOR10NM)], aes(x = as.factor(cat_decile), y = total_energy_kWh/1000000)) +
geom_boxplot() +
facet_wrap(. ~ GOR10NM) +
labs(x = "Cat ownership deciles",
y = "Total domestic electricity & gas GWh",
caption = "Postcode sectors (Data: BEIS & Animal and Plant Health Agency, 2015)")
caption = "Postcode districts (Data: BEIS & Animal and Plant Health Agency, 2015)")
```
Well...
......@@ -170,7 +184,7 @@ Well...
pc_district[, total_energy_kWh := total_gas_kWh + total_elec_kWh]
pc_district[, mean_energy_kWh := total_energy_kWh/nElecMeters]
ggplot2::ggplot(pc_district, aes(x = mean_Cats, y = mean_energy_kWh)) +
ggplot2::ggplot(pc_district[!is.na(GOR10NM)], aes(x = mean_Cats, y = mean_energy_kWh)) +
geom_point()
```
......
......@@ -5,6 +5,8 @@ library(data.table)
library(here)
# Functions ----
source(here::here("R", "functions.R"))
makeReport <- function(f){
# default = html
rmarkdown::render(input = paste0(here::here("itsTheCatsStupid", f), ".Rmd"),
......@@ -29,28 +31,34 @@ authors = "Ben Anderson"
#> load the postcode data here (slow)
postcodes_elec_dt <- data.table::fread(paste0(dp, "beis/subnationalElec/Postcode_level_all_meters_electricity_2015.csv"))
postcodes_elec_dt[, pcd_sector := data.table::tstrsplit(POSTCODE, " ", keep = c(1))]
pc_sector_elec_dt <- postcodes_elec_dt[, .(nPostcodes = .N,
total_elec_kWh = sum(`Consumption (kWh)`),
nElecMeters = sum(`Number of meters`)
), keyby = .(pcd_sector)]
nrow(pc_sector_elec_dt)
postcodes_elec_dt[, pcd_district := data.table::tstrsplit(POSTCODE, " ", keep = c(1))]
pc_district_elec_dt <- postcodes_elec_dt[, .(elec_nPostcodes = .N,
total_elec_kWh = sum(`Consumption (kWh)`, na.rm = TRUE),
nElecMeters = sum(`Number of meters`, na.rm = TRUE)
), keyby = .(pcd_district)]
nrow(pc_district_elec_dt)
postcodes_gas_dt <- data.table::fread(paste0(dp, "beis/subnationalGas/Experimental_Gas_Postcode_Statistics_2015.csv"))
postcodes_gas_dt[, pcd_sector := data.table::tstrsplit(POSTCODE, " ", keep = c(1))]
pc_sector_gas_dt <- postcodes_gas_dt[, .(total_gas_kWh = sum(`Consumption (kWh)`),
nGasMeters = sum(`Number of meters`)), keyby = .(pcd_sector)]
nrow(pc_sector_gas_dt)
postcodes_gas_dt[, pcd_district := data.table::tstrsplit(POSTCODE, " ", keep = c(1))]
pc_district_gas_dt <- postcodes_gas_dt[, .(gas_nPostcodes = .N,
total_gas_kWh = sum(`Consumption (kWh)`, na.rm = TRUE),
nGasMeters = sum(`Number of meters`, na.rm = TRUE)), keyby = .(pcd_district)]
nrow(pc_district_gas_dt)
setkey(pc_sector_elec_dt, pcd_sector)
setkey(pc_sector_gas_dt, pcd_sector)
setkey(pc_district_elec_dt, pcd_district)
setkey(pc_district_gas_dt, pcd_district)
pc_sector_energy_dt <- pc_sector_gas_dt[pc_sector_elec_dt]
pc_district_energy_dt <- pc_district_gas_dt[pc_district_elec_dt]
pc_sector_region_dt <- data.table::fread(here::here("data", "postcode_sectors_dt.csv"))
setkey(pc_sector_region_dt, pcd_sector)
# load one we prepared earlier using https://git.soton.ac.uk/SERG/mapping-with-r/-/blob/master/R/postcodeWrangling.R
pc_district_region_dt <- data.table::fread(paste0(dp, "UK_postcodes/postcode_districts_2016.csv"))
setkey(pc_district_region_dt, pcd_district)
nrow(pc_district_region_dt)
pc_sector_energy_dt <- pc_sector_region_dt[pc_sector_energy_dt]
pc_district_region_dt[, .(n = .N), keyby = .(GOR10CD, GOR10NM)]
nrow(pc_district_energy_dt)
pc_district_energy_dt <- pc_district_energy_dt[pc_district_region_dt]
nrow(pc_district_energy_dt)
#> re-run report here ----
makeReport(rmdFile)
\ No newline at end of file
makeReport(rmdFile)
postcodes <- data.table::fread("~/Dropbox/data/UK_postcodes/NSPL_AUG_2020_UK/Data/NSPL_AUG_2020_UK.csv.gz")
postcodes[, pcd_sector := data.table::tstrsplit(pcds, " ", keep = c(1))]
pc_sectors_dt <- postcodes[, .(nPostcodes = .N), keyby = .(pcd_sector, rgn)]
pc_sectors_dt[, GOR10CD := rgn]
region_codes <- readxl::read_xlsx("~/Dropbox/data/UK_postcodes/NSPL_AUG_2020_UK/Documents/Region names and codes EN as at 12_10 (GOR).xlsx")
region_code_dt <- data.table::as.data.table(region_codes)
setkey(region_code_dt, GOR10CD)
setkey(pc_sectors_dt, GOR10CD)
dt <- region_code_dt[pc_sectors_dt]
data.table::fwrite(dt, file = here::here("data", "postcode_sectors_dt.csv"))
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment