Compare revisions

451de824 · 451de824 · 451de824 · 451de824 · 451de824 · 451de824
--- a/itsTheCatsStupid/itsTheCatsStupid.Rmd
+++ b/itsTheCatsStupid/itsTheCatsStupid.Rmd
+---
+params:
+  subtitle: ""
+  title: ""
+  authors: ""
+title: '`r params$title`'
+subtitle: '`r params$subtitle`'
+author: '`r params$authors`'
+date: 'Last run at: `r Sys.time()`'
+output:
+  bookdown::html_document2:
+    self_contained: true
+    fig_caption: yes
+    code_folding: hide
+    number_sections: yes
+    toc: yes
+    toc_depth: 2
+    toc_float: TRUE
+  bookdown::pdf_document2:
+    fig_caption: yes
+    number_sections: yes
+  bookdown::word_document2:
+    fig_caption: yes
+    number_sections: yes
+    toc: yes
+    toc_depth: 2
+    fig_width: 5
+bibliography: '`r path.expand("~/bibliography.bib")`'
+---
+
+<hr>
+
+>This fridayFagPacket was first published at...
+
+<hr>
+
+# fridayFagPackets
+
+Numbers that could have been done on the back of one and should probably come with a similar health warning...
+
+>Find out [more](https://dataknut.github.io/fridayFagPackets/).
+
+```{r setup, include=FALSE}
+knitr::opts_chunk$set(echo = TRUE)
+library(ggplot2)
+
+```
+
+# It's the cats, stupid
+Inspired by `@giulio_mattioli`'s [recent paper on the car dependence of dog ownership](https://twitter.com/giulio_mattioli/status/1466361022747455492) we thought we'd take a look at [cats](https://twitter.com/giulio_mattioli/status/1466710752606179331) and residential energy demand. Why? Well people like to keep their cats warm but, more importantly, they also cut big holes in doors and/or windows to let the cats in and out. Hardly a thermally sealed envelope...
+
+# What's the data?
+
+We could also use `@SERL_UK`'s [smart meter gas/elec data](https://twitter.com/dataknut/status/1466712963222540289?s=20), dwelling characteristics and pet ownership (but no species detail :-) 
+
+So for now we're using:
+
+ * postcode district level estimates of cat ownership in the UK in 2015. Does such a thing exist? [YEAH](https://data.gov.uk/dataset/febd29ff-7e7d-4f82-9908-031f7f0e0860/cat-population-per-postcode-district)! "_This dataset gives the mean estimate for population for each district, and was generated as part of the delivery of commissioned research. The data contained within this dataset are modelled figures, based on national estimates for pet population, and available information on Veterinary activity across GB. The data are accurate as of 01/01/2015. The data provided are summarised to the postcode district level. Further information on this research is available in a research publication by James Aegerter, David Fouracre & Graham C. Smith, discussing the structure and density of pet cat and dog populations across Great Britain._"
+ * experimental postcode level data on domestic  [gas](https://www.gov.uk/government/collections/sub-national-gas-consumption-data) and [electricity](https://www.gov.uk/government/collections/sub-national-electricity-consumption-data) 'consumption' for 2015 aggregated to postcode districts
+
+```{r loadCats}
+
+# cats
+cats_DT <- data.table::fread(paste0(dp, "UK_Animal and Plant Health Agency/APHA0372-Cat_Density_Postcode_District.csv"))
+cats_DT[, pcd_district := PostcodeDistrict]
+
+setkey(cats_DT, pcd_district)
+
+nrow(cats_DT)
+
+setkey(pc_district_energy_dt, pcd_district)
+
+nrow(pc_district_energy_dt)
+
+pc_district <- pc_district_energy_dt[cats_DT] # keeps only postcode districts where we have cat data
+# this may include areas where we have no energy data
+pc_district[, mean_Cats := EstimatedCatPopulation/nElecMeters]
+
+nrow(pc_district)
+nrow(pc_district[!is.na(GOR10NM)])
+# there are postcode districts with no electricity meters - for now we'll remove them
+# pending further investigation
+
+t <- pc_district[!is.na(GOR10NM), .(nPostcodeDistricts = .N,
+                     sumCats = sum(EstimatedCatPopulation),
+                     sumElecMeters = sum(nElecMeters, na.rm = TRUE)), keyby=.(GOR10NM)]
+
+t[, catsPerDistrict := sumCats/nPostcodeDistricts]
+t[, catsPerDwelling := sumCats/sumElecMeters]
+makeFlexTable(t, cap = "Regions covered (1 electricity meter assumed to be 1 dwelling)")
+
+message("Total number of electricity meters:", sum(t$sumElecMeters))
+```
+
+# What do we find?
+
+Well, in some places there seem to be a lot of estimated cats per household...
+
+(We calculated mean cats per household by dividing by the number of electricity meters - should be a reasonable proxy)
+
+```{r maxCats}
+
+t <- head(pc_district[, .(Region = GOR10NM, PostcodeDistrict, EstimatedCatPopulation, mean_Cats, nPostcodes, nElecMeters,nGasMeters)][order(-mean_Cats)],10)
+makeFlexTable(t, cap = "Top 10 postcode districts by number of cats per 'household'")
+```
+SA63 is in south west [Wales](https://www.google.co.uk/maps/place/Clarbeston+Road+SA63/@51.8852685,-4.9147384,12z/data=!3m1!4b1!4m5!3m4!1s0x4868d5805b12efe5:0xca42ee4bc84a2f77!8m2!3d51.8900045!4d-4.8502065) while LL23 is on the edge of the [Snowdonia National Park](https://www.google.co.uk/maps/place/Bala+LL23/@52.8953768,-3.7752989,11z/data=!3m1!4b1!4m5!3m4!1s0x4865404ae1208f67:0x65a437b997c0dfb2!8m2!3d52.8825403!4d-3.6497989)....
+
+Do these places have some largish catteries but few houses?
+
+```{r histoCats, fig.cap = "Histogram of mean number of cats per dwelling"}
+ggplot2::ggplot(pc_district, aes(x = mean_Cats)) +
+  geom_histogram() +
+  labs(caption = "Postcode districts")
+
+summary(pc_district$mean_Cats)
+```
+
+## More dwellings, more cats?
+
+Is there a correlation between estimated total cats and the number of dwellings (as measured by the number of electricity meters)?
+
+```{r testTotalElecMeters}
+ggplot2::ggplot(pc_district[!is.na(GOR10NM)], aes(x = nElecMeters , y = EstimatedCatPopulation, 
+                                 colour = GOR10NM)) +
+  scale_color_discrete(name = "UK `Region`") +
+  geom_point() +
+  geom_smooth()
+```
+
+## More cats, more energy?
+
+Clearly postcode districts with more dwellings will have higher energy use totals. So we need to compare the mean number of cats per dwelling with mean energy use per dwelling.
+
+This will need to accommodate some outliers in terms of mean number of cats as we saw above and potentially also in terms of mean energy.
+
+```{r meanEnergyOutliers}
+pc_district[, total_gas_kWh := ifelse(is.na(total_gas_kWh), 0, total_gas_kWh)]
+pc_district[, total_energy_kWh := total_gas_kWh + total_elec_kWh]
+pc_district[, mean_energy_kWh := total_energy_kWh/nElecMeters]
+
+summary(pc_district$mean_energy_kWh)
+t <- head(pc_district[, .(Region = GOR10NM, PostcodeDistrict, mean_energy_kWh, nPostcodes, nElecMeters,nGasMeters)][order(-mean_energy_kWh)],10)
+makeFlexTable(t, cap = "Top 10 postcode districts by mean energy per 'household'")
+```
+
+Is there a correlation between estimated cat ownership and total gas use?
+
+```{r meanCatsAndMeanEnergy}
+
+
+ggplot2::ggplot(pc_district[!is.na(GOR10NM)], 
+                aes(x = mean_Cats, y = mean_energy_kWh, colour = GOR10NM)) +
+  geom_smooth() +
+  geom_point()
+```
+
+Or mean gas use and mean cats?
+
+```{r testMeanGas}
+
+ggplot2::ggplot(pc_district[!is.na(GOR10NM)], 
+                aes(x = mean_Cats, y = mean_gas_kWh, colour = GOR10NM)) +
+  geom_smooth() +
+  geom_point()
+```
+
+## More cats, more electricity?
+
+Or total electricity use and cats?
+
+```{r testTotalElec}
+ggplot2::ggplot(pc_district[!is.na(GOR10NM)], aes(x = EstimatedCatPopulation, y = total_elec_kWh, colour = GOR10NM)) +
+  geom_smooth() +
+  geom_point()
+```
+
+Or mean elec use and mean cats?
+
+```{r testMeanElec}
+
+ggplot2::ggplot(pc_district[!is.na(GOR10NM)], aes(x = mean_Cats, y = mean_elec_kWh, colour = GOR10NM)) +
+  geom_smooth() +
+  geom_point()
+```
+
+## More cats, more energy?
+
+Or total energy use and total cats?
+
+```{r testTotalEnergy}
+
+pc_district[, total_gas_kWh := ifelse(is.na(total_gas_kWh), 0, total_gas_kWh)]
+pc_district[, total_energy_kWh := total_gas_kWh + total_elec_kWh]
+pc_district[, mean_energy_kWh := total_energy_kWh/nElecMeters]
+
+ggplot2::ggplot(pc_district[!is.na(GOR10NM)], aes(x = EstimatedCatPopulation, y = total_energy_kWh, colour = GOR10NM)) +
+  geom_smooth() +
+  geom_point()
+  
+```
+
+Let's try a boxplot by cat deciles... Figure \@ref(fig:catDeciles) suggests the median energy use is higher in postcode districts with higher cat ownership.
+
+```{r catDeciles, fig.cap = "Cat ownership deciles and total annual residenital electricity & gas use"}
+pc_district[, cat_decile := dplyr::ntile(EstimatedCatPopulation, 10)]
+#head(pc_district[is.na(cat_decile)])
+ggplot2::ggplot(pc_district[!is.na(cat_decile) & !is.na(GOR10NM)], aes(x = as.factor(cat_decile), y = total_energy_kWh/1000000)) +
+  geom_boxplot() +
+  facet_wrap(. ~ GOR10NM) +
+  labs(x = "Cat ownership deciles",
+       y = "Total domestic electricity & gas GWh",
+       caption = "Postcode districts (Data: BEIS & Animal and Plant Health Agency, 2015)")
+```
+Well...
+
+```{r testMeanEnergy}
+pc_district[, total_energy_kWh := total_gas_kWh + total_elec_kWh]
+pc_district[, mean_energy_kWh := total_energy_kWh/nElecMeters]
+
+ggplot2::ggplot(pc_district[!is.na(GOR10NM)], aes(x = mean_Cats, y = mean_energy_kWh)) +
+  geom_point()
+```
+
+# Data Annexes
+
+Cats data
+
+```{r skimCats}
+skimr::skim(cats_DT)
+```
+
+
+# R packages used
+
+ * bookdown [@bookdown]
+ * data.table [@data.table]
+ * ggplot2 [@ggplot2]
+ * knitr [@knitr]
+ * rmarkdown [@rmarkdown]
+ 
+# References
--- a/itsTheCatsStupid/makeFile.R
+++ b/itsTheCatsStupid/makeFile.R
+# loads the data and runs the Rmd render
+
+# Packages ----
+library(data.table)
+library(here)
+
+# Functions ----
+source(here::here("R", "functions.R"))
+
+makeReport <- function(f){
+  # default = html
+  rmarkdown::render(input = paste0(here::here("itsTheCatsStupid", f), ".Rmd"),
+                    params = list(title = title,
+                                  subtitle = subtitle,
+                                  authors = authors),
+                    output_file = paste0(here::here("docs/"), f, ".html")
+  )
+}
+
+# Set data path ----
+dp <- "~/Dropbox/data/"
+
+# Run report ----
+
+#> define yaml ----
+rmdFile <- "itsTheCatsStupid" # not the full path
+title = "#backOfaFagPacket: Its the Cats, stupid"
+subtitle = "Does cat ownership correlate with home energy demand?"
+authors = "Ben Anderson"
+
+#> load the postcode data here (slow)
+
+postcodes_elec_dt <- data.table::fread(paste0(dp, "beis/subnationalElec/Postcode_level_all_meters_electricity_2015.csv"))
+postcodes_elec_dt[, pcd_district := data.table::tstrsplit(POSTCODE, " ", keep = c(1))]
+pc_district_elec_dt <- postcodes_elec_dt[, .(elec_nPostcodes = .N, 
+                                           total_elec_kWh = sum(`Consumption (kWh)`, na.rm = TRUE),
+                                           nElecMeters = sum(`Number of meters`, na.rm = TRUE)
+                                           ), keyby = .(pcd_district)]
+nrow(pc_district_elec_dt)
+summary(pc_district_elec_dt)
+
+postcodes_gas_dt <- data.table::fread(paste0(dp, "beis/subnationalGas/Experimental_Gas_Postcode_Statistics_2015.csv"))
+postcodes_gas_dt[, pcd_district := data.table::tstrsplit(POSTCODE, " ", keep = c(1))]
+pc_district_gas_dt <- postcodes_gas_dt[, .(gas_nPostcodes = .N,
+                                           total_gas_kWh = sum(`Consumption (kWh)`, na.rm = TRUE),
+                                           nGasMeters = sum(`Number of meters`, na.rm = TRUE)), keyby = .(pcd_district)]
+nrow(pc_district_gas_dt)
+summary(pc_district_gas_dt)
+
+setkey(pc_district_elec_dt, pcd_district)
+setkey(pc_district_gas_dt, pcd_district)
+
+pc_district_energy_dt <- pc_district_gas_dt[pc_district_elec_dt]
+pc_district_energy_dt[, mean_gas_kWh := total_gas_kWh/nGasMeters]
+pc_district_energy_dt[, mean_elec_kWh := total_elec_kWh/nElecMeters]
+
+# load one we prepared earlier using https://git.soton.ac.uk/SERG/mapping-with-r/-/blob/master/R/postcodeWrangling.R
+pc_district_region_dt <- data.table::fread(paste0(dp, "UK_postcodes/postcode_districts_2016.csv"))
+setkey(pc_district_region_dt, pcd_district)
+nrow(pc_district_region_dt)
+
+pc_district_region_dt[, .(n = .N), keyby = .(GOR10CD, GOR10NM)]
+nrow(pc_district_energy_dt)
+pc_district_energy_dt <- pc_district_energy_dt[pc_district_region_dt]
+nrow(pc_district_energy_dt)
+
+#> re-run report here ----
+makeReport(rmdFile)
--- a/retrofitAndDontDie/.gitkeep
+++ b/retrofitAndDontDie/.gitkeep
--- a/retrofitAndDontDie/retrofitAndDontDie.xlsx
+++ b/retrofitAndDontDie/retrofitAndDontDie.xlsx
--- a/retrofitOrBust/makeFile.R
+++ b/retrofitOrBust/makeFile.R
 makeReport <- function(f){
  # default = html
-  rmarkdown::render(input = paste0(here::here("2020-10-16-retrofOrBust", f), ".Rmd"),
+  rmarkdown::render(input = paste0(here::here("retrofitOrBust", f), ".Rmd"),
                    params = list(title = title,
                                  subtitle = subtitle,
                                  authors = authors),
                    output_file = paste0(here::here("docs/"), f, ".html")
  )
-  # word
-  rmarkdown::render(input = paste0(here::here("2020-10-16-retrofOrBust", f), ".Rmd"),
-                    params = list(title = title,
-                                  subtitle = subtitle,
-                                  authors = authors),
-                    output_file = paste0(here::here("docs/"), f, ".docx"),
-                    output_format = "word_document"
-  )
 }

 # >> run report ----
 rmdFile <- "retrofitOrBust" # not the full path
-title = "Retrofit or bust?"
+title = "#backOfaFagPacket: Retrofit or bust?"
 subtitle = ""
 authors = "Ben Anderson"


--- a/retrofitOrBust/retrofitOrBust.Rmd
+++ b/retrofitOrBust/retrofitOrBust.Rmd
@@ -33,6 +33,13 @@ bibliography: '`r path.expand("~/github/dataknut/refs/refs.bib")`'
 >This fridayFagPacket was first published as a [blog](https://dataknut.wordpress.com/2020/10/16/retrofit-or-bust/)

 <hr>
+
+# fridayFagPackets
+
+Numbers that could have been done on the back of one and should probably come with a similar health warning...
+
+>Find out [more](https://dataknut.github.io/fridayFagPackets/).
+
 ```{r setup, include=FALSE}
 knitr::opts_chunk$set(echo = TRUE)
 ```
No results found