This fridayFagPacket was first published at…


1 fridayFagPackets

Numbers that could have been done on the back of one and should probably come with a similar health warning…

Find out more.

2 It’s the cats, stupid

Inspired by @giulio_mattioli’s recent paper on the car dependence of dog ownership we thought we’d take a look at cats and residential energy demand. Why? Well people like to keep their cats warm but, more importantly, they also cut big holes in doors and/or windows to let the cats in and out. Hardly a thermally sealed envelope!

3 What’s the data?

For now we’re using:

  • postcode sector level estimates of cat ownership in the UK in 2015. Does such a thing exist? YEAH! “This dataset gives the mean estimate for population for each district, and was generated as part of the delivery of commissioned research. The data contained within this dataset are modelled figures, based on national estimates for pet population, and available information on Veterinary activity across GB. The data are accurate as of 01/01/2015. The data provided are summarised to the postcode district level. Further information on this research is available in a research publication by James Aegerter, David Fouracre & Graham C. Smith, discussing the structure and density of pet cat and dog populations across Great Britain.
  • experimental postcode level data on domestic gas and electricity ‘consumption’ for 2015 aggregated to postcode sectors
# cats
cats_DT <- data.table::fread(paste0(dp, "UK_Animal and Plant Health Agency/APHA0372-Cat_Density_Postcode_District.csv"))
cats_DT[, pcd_sector := PostcodeDistrict]

setkey(cats_DT, pcd_sector)

nrow(cats_DT)
## [1] 2830
setkey(pc_sector_energy_dt, pcd_sector)

nrow(pc_sector_energy_dt)
## [1] 3803
pc_district <- merge(cats_DT, pc_sector_energy_dt , by = "pcd_sector") # keeps only postcode sectors where we have cat data

nrow(pc_district)
## [1] 3788
# there are postcode sectors with no electricity meters - for now we'll remove them
# pending further investigation

summary(pc_district)
##   pcd_sector        PostcodeDistrict   EstimatedCatPopulation   GOR10CD            GOR10CDO           GOR10NM         
##  Length:3788        Length:3788        Min.   :    4.8        Length:3788        Length:3788        Length:3788       
##  Class :character   Class :character   1st Qu.: 1598.3        Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Median : 3114.0        Mode  :character   Mode  :character   Mode  :character  
##                                        Mean   : 3956.0                                                                
##                                        3rd Qu.: 5444.9                                                                
##                                        Max.   :23544.5                                                                
##                                                                                                                       
##    GOR10NMW             rgn              nPostcodes     total_gas_kWh         nGasMeters     i.nPostcodes   
##  Length:3788        Length:3788        Min.   :   1.0   Min.   :    22434   Min.   :    5   Min.   :   1.0  
##  Class :character   Class :character   1st Qu.:  21.0   1st Qu.: 60036467   1st Qu.: 4228   1st Qu.: 137.0  
##  Mode  :character   Mode  :character   Median : 540.5   Median :119491532   Median : 8706   Median : 251.0  
##                                        Mean   : 641.3   Mean   :127616957   Mean   : 9456   Mean   : 271.9  
##                                        3rd Qu.:1050.2   3rd Qu.:177856608   3rd Qu.:13308   3rd Qu.: 378.0  
##                                        Max.   :3653.0   Max.   :772507910   Max.   :57429   Max.   :1264.0  
##                                                         NA's   :242         NA's   :242                     
##  total_elec_kWh       nElecMeters   
##  Min.   :    10532   Min.   :    5  
##  1st Qu.: 12040943   1st Qu.: 2840  
##  Median : 21663677   Median : 5456  
##  Mean   : 23973489   Mean   : 6178  
##  3rd Qu.: 32810099   3rd Qu.: 8579  
##  Max.   :153555399   Max.   :40496  
## 
table(pc_district$GOR10NM, pc_district$rgn)
##                           
##                                E12000001 E12000002 E12000003 E12000004 E12000005 E12000006 E12000007 E12000008
##                            837         0         0         0         0         0         0         0         0
##   (pseudo) Scotland          0         0         0         0         0         0         0         0         0
##   (pseudo) Wales             0         0         0         0         0         0         0         0         0
##   East Midlands              0         0         0         0       170         0         0         0         0
##   East of England            0         0         0         0         0         0       264         0         0
##   London                     0         0         0         0         0         0         0       292         0
##   North East                 0       125         0         0         0         0         0         0         0
##   North West                 0         0       272         0         0         0         0         0         0
##   South East                 0         0         0         0         0         0         0         0       412
##   South West                 0         0         0         0         0         0         0         0         0
##   West Midlands              0         0         0         0         0       249         0         0         0
##   Yorkshire and The Humber   0         0         0       232         0         0         0         0         0
##                           
##                            E12000009 S99999999 W99999999
##                                    0         0         0
##   (pseudo) Scotland                0       439         0
##   (pseudo) Wales                   0         0       205
##   East Midlands                    0         0         0
##   East of England                  0         0         0
##   London                           0         0         0
##   North East                       0         0         0
##   North West                       0         0         0
##   South East                       0         0         0
##   South West                     291         0         0
##   West Midlands                    0         0         0
##   Yorkshire and The Humber         0         0         0

We could also use @SERL_UK’s smart meter gas/elec data, dwelling characteristics and pet ownership (but no species detail :-)

4 What do we find?

Well, in some places there seem to be a lot of estimated cats…

(We calculated mean cats per household by dividing by the number of electricity meters - probably a reasonable proxy)

pc_district[, mean_Cats := EstimatedCatPopulation/nElecMeters]
head(pc_district[, .(PostcodeDistrict, EstimatedCatPopulation, mean_Cats, nPostcodes, nElecMeters)][order(-mean_Cats)])
##    PostcodeDistrict EstimatedCatPopulation mean_Cats nPostcodes nElecMeters
## 1:             SA63                1981.13 13.854056         94         143
## 2:             LL23                8233.15 10.582455        222         778
## 3:             LL66                 104.70  9.518182         23          11
## 4:             BS28                4034.58  8.945854        226         451
## 5:             LA17                1341.85  8.886424         61         151
## 6:             EH25                7489.82  8.884721          1         843

SA63 is in south west Wales while LL23 is on the edge of the Snowdonia National Park….

4.1 More dwellings, more cats?

Is there a correlation between estimated total cats and the number of dwellings (electricity meters)?

ggplot2::ggplot(pc_district, aes(x = nElecMeters , y = EstimatedCatPopulation, 
                                 colour = GOR10NM)) +
  geom_point() +
  geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

5 More cats, more gas?

Is there a correlation between estimated cat ownership and total gas use?

ggplot2::ggplot(pc_district, aes(x = EstimatedCatPopulation, y = total_gas_kWh,
                                 colour = GOR10NM)) +
  geom_point()
## Warning: Removed 242 rows containing missing values (geom_point).

Or mean gas use and mean cats?

pc_district[, mean_gas_kWh := total_gas_kWh/nGasMeters]
ggplot2::ggplot(pc_district, aes(x = mean_Cats, y = mean_gas_kWh, colour = GOR10NM)) +
  geom_point()
## Warning: Removed 242 rows containing missing values (geom_point).

6 More cats, more electricity?

Or total electricity use and cats?

ggplot2::ggplot(pc_district, aes(x = EstimatedCatPopulation, y = total_elec_kWh, colour = GOR10NM)) +
  geom_point()

Or mean elec use and mean cats?

pc_district[, mean_elec_kWh := total_elec_kWh/nGasMeters]
ggplot2::ggplot(pc_district, aes(x = mean_Cats, y = mean_elec_kWh, colour = GOR10NM)) +
  geom_point()
## Warning: Removed 242 rows containing missing values (geom_point).

7 More cats, more energy?

Or total energy use and total cats?

pc_district[, total_energy_kWh := total_gas_kWh + total_elec_kWh]

ggplot2::ggplot(pc_district, aes(x = EstimatedCatPopulation, y = total_energy_kWh, colour = GOR10NM)) +
  geom_point() +
  geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 242 rows containing non-finite values (stat_smooth).
## Warning: Removed 242 rows containing missing values (geom_point).

Well, there may be something in there? Let’s try a boxplot by cat deciles… Figure 7.1

pc_district[, cat_decile := dplyr::ntile(EstimatedCatPopulation, 10)]
#head(pc_district[is.na(cat_decile)])
ggplot2::ggplot(pc_district[!is.na(cat_decile)], aes(x = as.factor(cat_decile), y = total_energy_kWh/1000000)) +
  geom_boxplot() +
  facet_wrap(. ~ GOR10NM) +
  labs(x = "Cat ownership deciles",
       y = "Total domestic electricity & gas GWh",
       caption = "Postcode sectors (Data: BEIS & Animal and Plant Health Agency, 2015)")
## Warning: Removed 242 rows containing non-finite values (stat_boxplot).
Cat ownership deciles and total annual residenital electricity & gas use

Figure 7.1: Cat ownership deciles and total annual residenital electricity & gas use

Well…

pc_district[, total_energy_kWh := total_gas_kWh + total_elec_kWh]
pc_district[, mean_energy_kWh := total_energy_kWh/nElecMeters]

ggplot2::ggplot(pc_district, aes(x = mean_Cats, y = mean_energy_kWh)) +
  geom_point()
## Warning: Removed 242 rows containing missing values (geom_point).

8 R packages used

  • bookdown (Xie 2016a)
  • data.table (Dowle et al. 2015)
  • ggplot2 (Wickham 2009)
  • knitr (Xie 2016b)
  • rmarkdown (Allaire et al. 2018)

References

Allaire, JJ, Yihui Xie, Jonathan McPherson, Javier Luraschi, Kevin Ushey, Aron Atkins, Hadley Wickham, Joe Cheng, and Winston Chang. 2018. Rmarkdown: Dynamic Documents for r. https://CRAN.R-project.org/package=rmarkdown.
Dowle, M, A Srinivasan, T Short, S Lianoglou with contributions from R Saporta, and E Antonyan. 2015. Data.table: Extension of Data.frame. https://CRAN.R-project.org/package=data.table.
Wickham, Hadley. 2009. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. http://ggplot2.org.
Xie, Yihui. 2016a. Bookdown: Authoring Books and Technical Documents with R Markdown. Boca Raton, Florida: Chapman; Hall/CRC. https://github.com/rstudio/bookdown.
———. 2016b. Knitr: A General-Purpose Package for Dynamic Report Generation in r. https://CRAN.R-project.org/package=knitr.