@dataknut
)This work is (c) the author(s).
This work is licensed under a Creative Commons Attribution 4.0 International License unless otherwise marked.
For the avoidance of doubt and explanation of terms please refer to the full license notice and legal code.
If you wish to use any of the material from this paper please cite as:
This work is (c) 2019 the authors.
This report uses circuit level extracts for ‘Heat Pumps’ from the NZ GREEN Grid Household Electricity Demand Data (https://dx.doi.org/10.5255/UKDA-SN-853334 (Anderson et al. 2018)). These have been extracted using the code found in https://github.com/CfSOtago/GREENGridData/blob/master/examples/code/extractCleanGridSpy1minCircuit.R
This work was supported by:
This report contains the analysis for a paper of the same name. The text is stored elsewhere for ease of editing.
hhID | linkID | r_dateTime | circuit | powerW | |
---|---|---|---|---|---|
Length:14250284 | Length:14250284 | Min. :2015-04-01 00:00:00 | Length:14250284 | Min. : -655.00 | |
Class :character | Class :character | 1st Qu.:2015-06-22 12:39:00 | Class :character | 1st Qu.: 0.00 | |
Mode :character | Mode :character | Median :2015-09-16 13:12:00 | Mode :character | Median : 0.00 | |
NA | NA | Mean :2015-09-21 08:00:39 | NA | Mean : 147.92 | |
NA | NA | 3rd Qu.:2015-12-17 17:52:00 | NA | 3rd Qu.: 61.29 | |
NA | NA | Max. :2016-03-31 23:59:00 | NA | Max. :27759.00 |
Notice that there are negawatts! Remove rf_46 and all negative values as per https://cfsotago.github.io/GREENGridData/gridSpy1mOutliersReport_v1.0.html
hhID | linkID | r_dateTime | circuit | powerW | month | year | tmpM | season | |
---|---|---|---|---|---|---|---|---|---|
Length:13298965 | Length:13298965 | Min. :2015-04-01 00:00:00 | Length:13298965 | Min. : 0.0 | Min. : 1.000 | Min. :2015 | Min. : 1.000 | Spring:3351249 | |
Class :character | Class :character | 1st Qu.:2015-06-20 15:32:00 | Class :character | 1st Qu.: 0.0 | 1st Qu.: 4.000 | 1st Qu.:2015 | 1st Qu.: 4.000 | Summer:2875049 | |
Mode :character | Mode :character | Median :2015-09-14 20:06:00 | Mode :character | Median : 0.0 | Median : 7.000 | Median :2015 | Median : 7.000 | Autumn:3471128 | |
NA | NA | Mean :2015-09-19 21:24:45 | NA | Mean : 152.0 | Mean : 6.581 | Mean :2015 | Mean : 6.581 | Winter:3601539 | |
NA | NA | 3rd Qu.:2015-12-16 12:26:00 | NA | 3rd Qu.: 50.8 | 3rd Qu.: 9.000 | 3rd Qu.:2015 | 3rd Qu.: 9.000 | NA | |
NA | NA | Max. :2016-03-31 23:59:00 | NA | Max. :27759.0 | Max. :12.000 | Max. :2016 | Max. :12.000 | NA |
Number of households in cleaned heatpump data: 28
## Loading: /Volumes/hum-csafe/Research Projects/GREEN Grid/Packaged Data for Sharing Externally/ReShare/reshare_v1.0/ggHouseholdAttributesSafe.csv.zip
## Parsed with column specification:
## cols(
## .default = col_double(),
## linkID = col_character(),
## hasApplianceSummary = col_character(),
## Oven = col_character(),
## `Fridge / Freezer 1` = col_character(),
## `Fridge / Freezer 2` = col_character(),
## `Fridge / Freezer 3` = col_character(),
## Dishwasher = col_character(),
## Microwave = col_character(),
## `Washing Machine` = col_character(),
## `Clothes Dryer` = col_character(),
## `Hot water cylinder` = col_character(),
## `Other Appliance` = col_character(),
## `Electric heater` = col_character(),
## `Heated towel rails` = col_character(),
## `PV Inverter` = col_character(),
## `Energy Storage` = col_character(),
## `Other Generation Device` = col_logical(),
## hasLongSurvey = col_character(),
## Q19_2 = col_logical(),
## Q19_5 = col_logical()
## # ... with 10 more columns
## )
## See spec(...) for full column specifications.
season | nPeople | meanMeanW | sdMeanW | nHouseholds |
---|---|---|---|---|
Spring | NA | 595.994212 | 443.635765 | 2 |
Spring | 1 | 92.230234 | 103.648048 | 2 |
Spring | 2 | 89.339624 | 44.338145 | 4 |
Spring | 3 | 210.076391 | 187.625482 | 6 |
Spring | 4+ | 175.856103 | 148.738840 | 11 |
Summer | 1 | 4.019881 | 3.746534 | 2 |
Summer | 2 | 35.275766 | 61.099420 | 3 |
Summer | 3 | 86.328405 | 145.661285 | 6 |
Summer | 4+ | 33.637416 | 74.408925 | 10 |
Autumn | NA | 387.203399 | 316.302379 | 2 |
Autumn | 1 | 70.587984 | 79.862519 | 2 |
Autumn | 2 | 73.233719 | 56.284769 | 4 |
Autumn | 3 | 245.460272 | 209.918748 | 7 |
Autumn | 4+ | 199.479290 | 165.371666 | 13 |
Winter | NA | 661.964787 | 275.647550 | 2 |
Winter | 1 | 169.532436 | 213.880258 | 2 |
Winter | 2 | 282.138922 | 71.265180 | 4 |
Winter | 3 | 476.930850 | 302.869555 | 7 |
Winter | 4+ | 413.121623 | 279.067726 | 12 |
meanMeanW | sdMeanW | nHouseholds |
---|---|---|
410.6491 | 269.1503 | 27 |
Observations are summarised to mean W per household during 16:00 - 20:00 on weekdays for year = 2015.
Figure 4.1 shows the initial p = 0.01 plot.
## Scale for 'y' is already present. Adding another scale for 'y', which
## will replace the existing scale.
Figure 4.1: Power analysis results (p = 0.01, power = 0.8)
## Saving 7 x 5 in image
Effect size at n = 1000: 9.29.
Figure 4.2 shows the plot for all results.
## Scale for 'y' is already present. Adding another scale for 'y', which
## will replace the existing scale.
Figure 4.2: Power analysis results (power = 0.8)
## Saving 7 x 5 in image
At same effect size (9.292105%, n = 1000, p = 0.01):
Full table of results:
## Using 'effectSize' as value column. Use 'value.var' to override
sampleN | p = 0.01 | p = 0.05 | p = 0.1 | p = 0.2 |
---|---|---|---|---|
50 | 42.11 | 32.82 | 27.95 | 22.10 |
100 | 29.57 | 23.13 | 19.72 | 15.62 |
150 | 24.09 | 18.86 | 16.09 | 12.75 |
200 | 20.83 | 16.32 | 13.93 | 11.04 |
250 | 18.62 | 14.60 | 12.46 | 9.87 |
300 | 16.99 | 13.32 | 11.37 | 9.01 |
350 | 15.73 | 12.33 | 10.53 | 8.34 |
400 | 14.71 | 11.53 | 9.85 | 7.80 |
450 | 13.86 | 10.87 | 9.28 | 7.36 |
500 | 13.15 | 10.31 | 8.80 | 6.98 |
550 | 12.54 | 9.83 | 8.39 | 6.65 |
600 | 12.00 | 9.41 | 8.04 | 6.37 |
650 | 11.53 | 9.04 | 7.72 | 6.12 |
700 | 11.11 | 8.72 | 7.44 | 5.90 |
750 | 10.73 | 8.42 | 7.19 | 5.70 |
800 | 10.39 | 8.15 | 6.96 | 5.52 |
850 | 10.08 | 7.91 | 6.75 | 5.35 |
900 | 9.80 | 7.69 | 6.56 | 5.20 |
950 | 9.53 | 7.48 | 6.39 | 5.06 |
1000 | 9.29 | 7.29 | 6.22 | 4.93 |
Does not require a sample. As a relatively simple example, suppose we were interested in the adoption of heat pumps in two equal sized samples. Suppose we thought in one sample (say, home owners) we thought it might be 40% and in rental properties it would be 25% (ref BRANZ 2015). What sample size would we need to conclude a significant difference with power = 0.8 and at various p values?
pwr::pwr.tp.test()
(ref pwr) can give us the answer…
n | sig.level | power | props |
---|---|---|---|
224.94 | 0.01 | 0.8 | p1 = 0.4 p2 = 0.25 |
151.17 | 0.05 | 0.8 | p1 = 0.4 p2 = 0.25 |
119.07 | 0.10 | 0.8 | p1 = 0.4 p2 = 0.25 |
86.73 | 0.20 | 0.8 | p1 = 0.4 p2 = 0.25 |
We can repeat this for other values of p1 and p2. For example, suppose both were much smaller (e.g. 10% and 15%)… Clearly we need much larger samples.
n | sig.level | power | props |
---|---|---|---|
1012.35 | 0.01 | 0.8 | p1 = 0.1 p2 = 0.15 |
680.35 | 0.05 | 0.8 | p1 = 0.1 p2 = 0.15 |
535.89 | 0.10 | 0.8 | p1 = 0.1 p2 = 0.15 |
390.31 | 0.20 | 0.8 | p1 = 0.1 p2 = 0.15 |
The above used an arcsine transform.
As a double check, using eqn to assess margin of error…
\[me = +/- z * \sqrt{\frac{p(1-p)} {n-1}}\]
If:
then the margin of error = +/- 0.078 (7.8%). So we could quote the Heat Pump uptake for owner-occupiers as 40% (+/- 7.8% [or 32.2 - 47.8] with p = 0.05).
This may be far too wide an error margin for our purposes so we may instead have recruited 500 per sample. Now the margin of error is +/- 0.043 (4.3%) so we can now quote the Heat Pump uptake for owner-occupiers as 40% (+/- 4.3% [or 35.7 - 44.3] with p = 0.05).
Use base GREENGrid and number of people but re-sample slightly.
NB 1: we create a small sample roughly 2 * the size of the GREEN Grid data. Due to small number effects and the random re-sampling with replacement process, there will be random fluctuations in the results with each run. As a consequence the results in this section will probably not match the results in the paper…
NB 2: sometimes the small-n random process doesn’t create households of a given type that we then need for analysis. Don’t worry, just re-knit until it does :-)
nPeople | mean W | sd W | n households |
---|---|---|---|
1 | 260.2741 | 135.26975 | 5 |
2 | 243.5189 | 30.20443 | 7 |
3 | 549.5341 | 319.97555 | 9 |
4+ | 432.8072 | 279.30838 | 29 |
So a sample of 50.
T test 1 <-> 3
1 person mean | 3 persons mean | Mean difference | statistic | p.value | conf.low | conf.high |
---|---|---|---|---|---|---|
260.2741 | 549.5341 | -289.2599 | -2.358998 | 0.036821 | -557.5082 | -21.01171 |
The results show that the mean power demand for the control group was 549.53W and for Intervention 1 was 260.27W. This is a (very) large difference in the mean of 289.26. The results of the t test are:
T test 1 <-> 4+
1 person mean | 4+ persons mean | Mean difference | statistic | p.value | conf.low | conf.high |
---|---|---|---|---|---|---|
260.2741 | 432.8072 | -172.5331 | -2.165191 | 0.0528381 | -347.5762 | 2.509981 |
Now:
NB: we create a larger sample roughly 40 * the size of the GREEN Grid data. Due to the random re-sampling with replacement process, there will be random fluctuations in the results with each run. Due to small number effects and the random re-sampling with replacement process, there will be random fluctuations in the results with each run. As a consequence the results in this section will probably not exactly match the results in the paper but as the new sample is large they should be quite close…
nPeople | mean W | sd W | n households |
---|---|---|---|
1 | 171.1944 | 152.06488 | 91 |
2 | 285.4655 | 60.94574 | 149 |
3 | 474.2346 | 274.00976 | 297 |
4+ | 435.3399 | 270.75666 | 463 |
So n = 1000
Figure 5.1: Mean W demand per group for large sample (Error bars = 95% confidence intervals for the sample mean)
re-run T tests 1 vs 3
1 person mean | 3 persons mean | Mean difference | statistic | p.value | conf.low | conf.high |
---|---|---|---|---|---|---|
171.1944 | 474.2346 | -303.0403 | -13.45974 | 0 | -347.3629 | -258.7177 |
In this case:
re-run T tests 1 person vs 4+
1 person mean | 4+ persons mean | Mean difference | statistic | p.value | conf.low | conf.high |
---|---|---|---|---|---|---|
171.1944 | 435.3399 | -264.1455 | -13.00654 | 0 | -304.1695 | -224.1215 |
In this case:
Analysis completed in 55.18 seconds ( 0.92 minutes) using knitr in RStudio with R version 3.5.1 (2018-07-02) running on x86_64-apple-darwin15.6.0.
R packages used:
devtools::install_github("dataknut/dkUtils")
Session info:
## R version 3.5.1 (2018-07-02)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS High Sierra 10.13.6
##
## Matrix products: default
## BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_NZ.UTF-8/en_NZ.UTF-8/en_NZ.UTF-8/C/en_NZ.UTF-8/en_NZ.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] knitr_1.21 GREENGridData_1.0 pwr_1.2-2
## [4] forcats_0.4.0 broom_0.5.1 lubridate_1.7.4
## [7] readr_1.3.1 ggplot2_3.1.0 dplyr_0.8.0.1
## [10] data.table_1.12.0 weGotThePower_0.1 dkUtils_0.0.0.9000
## [13] bookdown_0.9 markdown_0.9
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.0 highr_0.7 cellranger_1.1.0
## [4] compiler_3.5.1 pillar_1.3.1 plyr_1.8.4
## [7] prettyunits_1.0.2 progress_1.2.0 tools_3.5.1
## [10] digest_0.6.18 lattice_0.20-38 nlme_3.1-137
## [13] evaluate_0.13 tibble_2.0.1 gtable_0.2.0
## [16] pkgconfig_2.0.2 rlang_0.3.1 yaml_2.2.0
## [19] xfun_0.4 withr_2.1.2 stringr_1.4.0
## [22] generics_0.0.2 hms_0.4.2 grid_3.5.1
## [25] tidyselect_0.2.5 glue_1.3.0 R6_2.4.0
## [28] readxl_1.3.0 rmarkdown_1.11 tidyr_0.8.2
## [31] reshape2_1.4.3 purrr_0.3.0 magrittr_1.5
## [34] ellipsis_0.0.2 backports_1.1.3 scales_1.0.0
## [37] htmltools_0.3.6 assertthat_0.2.0 colorspace_1.4-0
## [40] labeling_0.3 stringi_1.3.1 lazyeval_0.2.1
## [43] munsell_0.5.0 crayon_1.3.4
Anderson, Ben, David Eyers, Rebecca Ford, Diana Giraldo Ocampo, Rana Peniamina, Janet Stephenson, Kiti Suomalainen, Lara Wilcocks, and Michael Jack. 2018. “New Zealand GREEN Grid Household Electricity Demand Study 2014-2018,” September. doi:10.5255/UKDA-SN-853334.
Champely, Stephane. 2018. Pwr: Basic Functions for Power Analysis. https://CRAN.R-project.org/package=pwr.
Csárdi, Gábor, and Rich FitzJohn. 2016. Progress: Terminal Progress Bars. https://CRAN.R-project.org/package=progress.
Dowle, M, A Srinivasan, T Short, S Lianoglou with contributions from R Saporta, and E Antonyan. 2015. Data.table: Extension of Data.frame. https://CRAN.R-project.org/package=data.table.
Grolemund, Garrett, and Hadley Wickham. 2011. “Dates and Times Made Easy with lubridate.” Journal of Statistical Software 40 (3): 1–25. http://www.jstatsoft.org/v40/i03/.
R Core Team. 2016. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Wickham, Hadley. 2009. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. http://ggplot2.org.
Wickham, Hadley, and Romain Francois. 2016. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.
Wickham, Hadley, Jim Hester, and Romain Francois. 2016. Readr: Read Tabular Data. https://CRAN.R-project.org/package=readr.
Xie, Yihui. 2016. Knitr: A General-Purpose Package for Dynamic Report Generation in R. https://CRAN.R-project.org/package=knitr.