This fridayFagPacket was first published at…


1 fridayFagPackets

Numbers that could have been done on the back of one and should probably come with a similar health warning…

Find out more.

2 It’s the cats, stupid

Inspired by @giulio_mattioli’s recent paper on the car dependence of dog ownership we thought we’d take a look at cats and residential energy demand. Why? Well people like to keep their cats warm but, more importantly, they also cut big holes in doors and/or windows to let the cats in and out. Hardly a thermally sealed envelope…

3 What’s the data?

We could also use @SERL_UK’s smart meter gas/elec data, dwelling characteristics and pet ownership (but no species detail :-)

So for now we’re using:

  • postcode district level estimates of cat ownership in the UK in 2015. Does such a thing exist? YEAH! “This dataset gives the mean estimate for population for each district, and was generated as part of the delivery of commissioned research. The data contained within this dataset are modelled figures, based on national estimates for pet population, and available information on Veterinary activity across GB. The data are accurate as of 01/01/2015. The data provided are summarised to the postcode district level. Further information on this research is available in a research publication by James Aegerter, David Fouracre & Graham C. Smith, discussing the structure and density of pet cat and dog populations across Great Britain.
  • experimental postcode level data on domestic gas and electricity ‘consumption’ for 2015 aggregated to postcode districts
# cats
cats_DT <- data.table::fread(paste0(dp, "UK_Animal and Plant Health Agency/APHA0372-Cat_Density_Postcode_District.csv"))
cats_DT[, pcd_district := PostcodeDistrict]

setkey(cats_DT, pcd_district)

nrow(cats_DT)
## [1] 2830
setkey(pc_district_energy_dt, pcd_district)

nrow(pc_district_energy_dt)
## [1] 3185
pc_district <- pc_district_energy_dt[cats_DT] # keeps only postcode districts where we have cat data
# this may include areas where we have no energy data
pc_district[, mean_Cats := EstimatedCatPopulation/nElecMeters]

nrow(pc_district)
## [1] 2987
nrow(pc_district[!is.na(GOR10NM)])
## [1] 2936
# there are postcode districts with no electricity meters - for now we'll remove them
# pending further investigation

t <- pc_district[!is.na(GOR10NM), .(nPostcodeDistricts = .N,
                     sumCats = sum(EstimatedCatPopulation)), keyby=.(GOR10NM)]

t[, catsPerDistrict := sumCats/nPostcodeDistricts]
makeFlexTable(t, cap = "Regions covered")

4 What do we find?

Well, in some places there seem to be a lot of estimated cats per household…

(We calculated mean cats per household by dividing by the number of electricity meters - probably a reasonable proxy)

t <- head(pc_district[, .(PostcodeDistrict, EstimatedCatPopulation, mean_Cats, nPostcodes, nElecMeters)][order(-mean_Cats)],10)
makeFlexTable(t, cap = "Top 10 postcode districts by number of cats per 'household'")

SA63 is in south west Wales while LL23 is on the edge of the Snowdonia National Park….

Do these places have some largish catteries but few houses? 8,233 is a lot of estimated cats.

4.1 More dwellings, more cats?

Is there a correlation between estimated total cats and the number of dwellings (electricity meters)?

ggplot2::ggplot(pc_district[!is.na(GOR10NM)], aes(x = nElecMeters , y = EstimatedCatPopulation, 
                                 colour = GOR10NM)) +
  geom_point() +
  geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 92 rows containing non-finite values (stat_smooth).
## Warning: Removed 92 rows containing missing values (geom_point).

4.2 More cats, more gas?

Is there a correlation between estimated cat ownership and total gas use?

ggplot2::ggplot(pc_district[!is.na(GOR10NM)], 
                aes(x = EstimatedCatPopulation, y = total_gas_kWh, colour = GOR10NM)) +
  geom_smooth() +
  geom_point()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 291 rows containing non-finite values (stat_smooth).
## Warning: Removed 291 rows containing missing values (geom_point).

Or mean gas use and mean cats?

ggplot2::ggplot(pc_district[!is.na(GOR10NM)], 
                aes(x = mean_Cats, y = mean_gas_kWh, colour = GOR10NM)) +
  geom_smooth() +
  geom_point()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 291 rows containing non-finite values (stat_smooth).
## Warning: Removed 291 rows containing missing values (geom_point).

4.3 More cats, more electricity?

Or total electricity use and cats?

ggplot2::ggplot(pc_district[!is.na(GOR10NM)], aes(x = EstimatedCatPopulation, y = total_elec_kWh, colour = GOR10NM)) +
  geom_smooth() +
  geom_point()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 92 rows containing non-finite values (stat_smooth).
## Warning: Removed 92 rows containing missing values (geom_point).

Or mean elec use and mean cats?

ggplot2::ggplot(pc_district[!is.na(GOR10NM)], aes(x = mean_Cats, y = mean_elec_kWh, colour = GOR10NM)) +
  geom_smooth() +
  geom_point()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 92 rows containing non-finite values (stat_smooth).
## Warning: Removed 92 rows containing missing values (geom_point).

4.4 More cats, more energy?

Or total energy use and total cats?

pc_district[, total_gas_kWh := ifelse(is.na(total_gas_kWh), 0, total_gas_kWh)]
pc_district[, total_energy_kWh := total_gas_kWh + total_elec_kWh]
pc_district[, mean_energy_kWh := total_energy_kWh/nElecMeters]

ggplot2::ggplot(pc_district[!is.na(GOR10NM)], aes(x = EstimatedCatPopulation, y = total_energy_kWh, colour = GOR10NM)) +
  geom_smooth() +
  geom_point()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 92 rows containing non-finite values (stat_smooth).
## Warning: Removed 92 rows containing missing values (geom_point).

Let’s try a boxplot by cat deciles… Figure 4.1 suggests the median energy use is higher in postcode districts with higher cat ownership.

pc_district[, cat_decile := dplyr::ntile(EstimatedCatPopulation, 10)]
#head(pc_district[is.na(cat_decile)])
ggplot2::ggplot(pc_district[!is.na(cat_decile) & !is.na(GOR10NM)], aes(x = as.factor(cat_decile), y = total_energy_kWh/1000000)) +
  geom_boxplot() +
  facet_wrap(. ~ GOR10NM) +
  labs(x = "Cat ownership deciles",
       y = "Total domestic electricity & gas GWh",
       caption = "Postcode districts (Data: BEIS & Animal and Plant Health Agency, 2015)")
## Warning: Removed 92 rows containing non-finite values (stat_boxplot).
Cat ownership deciles and total annual residenital electricity & gas use

Figure 4.1: Cat ownership deciles and total annual residenital electricity & gas use

Well…

pc_district[, total_energy_kWh := total_gas_kWh + total_elec_kWh]
pc_district[, mean_energy_kWh := total_energy_kWh/nElecMeters]

ggplot2::ggplot(pc_district[!is.na(GOR10NM)], aes(x = mean_Cats, y = mean_energy_kWh)) +
  geom_point()
## Warning: Removed 92 rows containing missing values (geom_point).

5 R packages used

  • bookdown (Xie 2016a)
  • data.table (Dowle et al. 2015)
  • ggplot2 (Wickham 2009)
  • knitr (Xie 2016b)
  • rmarkdown (Allaire et al. 2018)

References

Allaire, JJ, Yihui Xie, Jonathan McPherson, Javier Luraschi, Kevin Ushey, Aron Atkins, Hadley Wickham, Joe Cheng, and Winston Chang. 2018. Rmarkdown: Dynamic Documents for r. https://CRAN.R-project.org/package=rmarkdown.
Dowle, M, A Srinivasan, T Short, S Lianoglou with contributions from R Saporta, and E Antonyan. 2015. Data.table: Extension of Data.frame. https://CRAN.R-project.org/package=data.table.
Wickham, Hadley. 2009. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. http://ggplot2.org.
Xie, Yihui. 2016a. Bookdown: Authoring Books and Technical Documents with R Markdown. Boca Raton, Florida: Chapman; Hall/CRC. https://github.com/rstudio/bookdown.
———. 2016b. Knitr: A General-Purpose Package for Dynamic Report Generation in r. https://CRAN.R-project.org/package=knitr.