1 About

1.1 License

This work is (c) the author(s).

License This work is licensed under a Creative Commons Attribution 4.0 International License unless otherwise marked.

For the avoidance of doubt and explanation of terms please refer to the full license notice and legal code.

1.2 Citation

If you wish to use any of the material from this paper please cite as:

  • Ben Anderson and Tom Rushby. (2018) Statistical Power, Statistical Significance, Study Design and Decision Making: A Worked Example (Sizing Demand Response Trials in New Zealand), Southampton: University of Southampton.

This work is (c) 2018 the authors.

1.3 History

Code & report history:

1.4 Data:

This report uses circuit level extracts for ‘Heat Pumps’ from the NZ GREEN Grid Household Electricity Demand Data (https://dx.doi.org/10.5255/UKDA-SN-853334 (Anderson et al. 2018)). These have been extracted using the code found in https://github.com/CfSOtago/GREENGridData/blob/master/examples/code/extractCleanGridSpy1minCircuit.R

1.5 Acknowledgements

This work was supported by:

2 Introduction

This report contains the analysis for a paper of the same name. The text is stored elsewhere for ease of editing.

3 Error, power, significance and decision making

4 Sample design: statistical power

4.1 Means

Table 4.1: Summary of mean consumption per household by season
season meanMeanW sdMeanW
Spring 58.80597 113.53102
Summer 35.13947 83.90258
Autumn 68.37439 147.37279
Winter 162.66915 325.51171

Observations are summarised to mean W per household during 16:00 - 20:00 on weekdays for year = 2015.

## Warning: replacing previous import 'data.table::melt' by 'reshape2::melt'
## when loading 'weGotThePower'
## Warning: replacing previous import 'data.table::dcast' by 'reshape2::dcast'
## when loading 'weGotThePower'

Figure 4.1 shows the initial p = 0.01 plot.

## Scale for 'y' is already present. Adding another scale for 'y', which
## will replace the existing scale.
Power analysis results (p = 0.01, power = 0.8)

Figure 4.1: Power analysis results (p = 0.01, power = 0.8)

## Saving 7 x 5 in image

Effect size at n = 1000: 28.37.

Figure 4.2 shows the plot for all results.

## Scale for 'y' is already present. Adding another scale for 'y', which
## will replace the existing scale.
Power analysis results (power = 0.8)

Figure 4.2: Power analysis results (power = 0.8)

## Saving 7 x 5 in image

Full table of results:

## Using 'effectSize' as value column. Use 'value.var' to override
Table 4.2: Power analysis for means results table (partial)
sampleN p = 0.01 p = 0.05 p = 0.1 p = 0.2
50 128.57 100.21 85.33 67.49
100 90.27 70.61 60.21 47.68
150 73.53 57.58 49.13 38.92
200 63.61 49.84 42.53 33.70
250 56.86 44.56 38.03 30.14
300 51.88 40.67 34.71 27.51
350 48.01 37.65 32.14 25.47
400 44.90 35.21 30.06 23.82
450 42.33 33.20 28.34 22.46
500 40.15 31.49 26.88 21.31
550 38.27 30.02 25.63 20.31
600 36.64 28.74 24.54 19.45
650 35.20 27.61 23.57 18.69
700 33.92 26.61 22.72 18.01
750 32.77 25.71 21.95 17.40
800 31.72 24.89 21.25 16.84
850 30.77 24.14 20.61 16.34
900 29.91 23.46 20.03 15.88
950 29.11 22.84 19.50 15.46
1000 28.37 22.26 19.00 15.06

4.2 Proportions

Does not require a sample.

Figure 4.3 shows the initial p = 0.05 plot. This shows the difference that would be required

## Scale for 'y' is already present. Adding another scale for 'y', which
## will replace the existing scale.
Power analysis results for proportions (p = 0.05, power = 0.8)

Figure 4.3: Power analysis results for proportions (p = 0.05, power = 0.8)

## Saving 7 x 5 in image

Figure 4.4 shows the plot for all results.

## Scale for 'y' is already present. Adding another scale for 'y', which
## will replace the existing scale.
Power analysis results (power = 0.8)

Figure 4.4: Power analysis results (power = 0.8)

## Saving 7 x 5 in image

Full table of results:

## Using 'effectSize' as value column. Use 'value.var' to override
Table 4.3: Power analysis for proportions results table (partial)
sampleN p = 0.01 p = 0.05 p = 0.1 p = 0.2
50 68.35 56.03 49.73 42.44
100 48.33 39.62 35.16 30.01
150 39.46 32.35 28.71 24.51
200 34.17 28.01 24.86 21.22
250 30.57 25.06 22.24 18.98
300 27.90 22.88 20.30 17.33
350 25.83 21.18 18.80 16.04
400 24.16 19.81 17.58 15.01
450 22.78 18.68 16.58 14.15
500 21.61 17.72 15.72 13.42
550 20.61 16.89 15.00 12.80
600 19.73 16.17 14.36 12.25
650 18.96 15.54 13.79 11.77
700 18.27 14.97 13.29 11.34
750 17.65 14.47 12.84 10.96
800 17.09 14.01 12.43 10.61
850 16.57 13.59 12.06 10.29
900 16.11 13.21 11.72 10.00
950 15.68 12.86 11.41 9.73
1000 15.28 12.53 11.12 9.49

5 Testing for differences: effect sizes, confidence intervals and p values

5.1 Getting it ‘wrong’

Table 5.1: Number of households and summary statistics per group
group mean W sd W n households
Control 162.66915 325.51171 28
Intervention 1 35.13947 83.90258 22
Intervention 2 58.80597 113.53102 26
Intervention 3 68.37439 147.37279 29

T test group 1

Table 5.2: T test results (Group 1 vs Control)
Control mean Group 1 mean Mean difference statistic p.value conf.low conf.high
162.6691 35.13947 -127.5297 -1.990661 0.0552626 -258.11 3.050644

The results show that the mean power demand for the control group was 162.67W and for Intervention 1 was 35.14W. This is a (very) large difference in the mean of 127.53. The results of the t test are:

  • effect size = 128W or 78% representing a substantial bang for buck for whatever caused the difference;
  • 95% confidence interval for the test = -258.11 to 3.05 representing considerable uncertainty/variation;
  • p value of 0.055 representing a relatively low risk of a false positive result but which (just) fails the conventional p < 0.05 threshold.

T test Group 2

Table 5.3: T test results (Group 2 vs Control)
Control mean Group 2 mean Mean difference statistic p.value conf.low conf.high
162.6691 58.80597 -103.8632 -1.587604 0.1216582 -236.8285 29.10212

Now:

  • effect size = 104W or 63.85% representing a still reasonable bang for buck for whatever caused the difference;
  • 95% confidence interval for the test = -236.83 to 29.1 representing even greater uncertainty/variation;
  • p value of 0.122 representing a higher risk of a false positive result which fails the conventional p < 0.05 threshold and also the less conservative p < 0.1.

To detect Intervention Group 2’s effect size of 63.85% would have required control and trial group sizes of 47 respectively.

5.2 Getting it ‘right’

Table 5.4: Number of households and summary statistics per group
group mean W sd W n households
Control 175.77480 331.8449 1148
Intervention 1 36.21610 83.0941 904
Intervention 2 67.89879 121.1046 1018
Intervention 3 70.97243 147.6943 1130
Mean W demand per group for large sample (Error bars = 95% confidence intervals for the sample mean)

Figure 5.1: Mean W demand per group for large sample (Error bars = 95% confidence intervals for the sample mean)

re-run T tests Group 1

Table 5.5: T test results (Group 1 vs Control)
Control mean Group 1 mean Mean difference statistic p.value conf.low conf.high
175.7748 67.89879 -107.876 -10.27012 0 -128.4801 -87.27195

In this case:

  • effect size = 107.8760107W or 61.37% representing a still reasonable bang for buck for whatever caused the difference;
  • 95% confidence interval for the test = -128.48 to -87.27 representing much less uncertainty/variation;
  • p value of 0 representing a very low risk of a false positive result as it passes all conventional thresholds.

6 Summary and recommendations

6.1 Statistical power and sample design

6.2 Reporting statistical tests of difference (effects)

6.3 Making inferences and taking decisions

7 Acknowledgments

8 Runtime

Analysis completed in 73.55 seconds ( 1.23 minutes) using knitr in RStudio with R version 3.5.1 (2018-07-02) running on x86_64-apple-darwin15.6.0.

9 R environment

R packages used:

  • base R - for the basics (R Core Team 2016)
  • data.table - for fast (big) data handling (Dowle et al. 2015)
  • lubridate - date manipulation (Grolemund and Wickham 2011)
  • ggplot2 - for slick graphics (Wickham 2009)
  • readr - for csv reading/writing (Wickham, Hester, and Francois 2016)
  • dplyr - for select and contains (Wickham and Francois 2016)
  • progress - for progress bars (Csárdi and FitzJohn 2016)
  • knitr - to create this document & neat tables (Xie 2016)
  • GREENGrid - for local NZ GREEN Grid project utilities

Session info:

## R version 3.5.1 (2018-07-02)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS High Sierra 10.13.6
## 
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_NZ.UTF-8/en_NZ.UTF-8/en_NZ.UTF-8/C/en_NZ.UTF-8/en_NZ.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] knitr_1.20         forcats_0.3.0      broom_0.5.0       
## [4] lubridate_1.7.4    readr_1.1.1        ggplot2_3.0.0     
## [7] dplyr_0.7.6        data.table_1.11.4  dkUtils_0.0.0.9000
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.18      highr_0.7         pillar_1.3.0     
##  [4] compiler_3.5.1    plyr_1.8.4        bindr_0.1.1      
##  [7] tools_3.5.1       digest_0.6.15     lattice_0.20-35  
## [10] nlme_3.1-137      evaluate_0.11     tibble_1.4.2     
## [13] gtable_0.2.0      pkgconfig_2.0.1   rlang_0.3.0.1    
## [16] cli_1.0.0         yaml_2.2.0        xfun_0.3         
## [19] bindrcpp_0.2.2    pwr_1.2-2         withr_2.1.2      
## [22] stringr_1.3.1     hms_0.4.2         rprojroot_1.3-2  
## [25] grid_3.5.1        tidyselect_0.2.5  glue_1.3.0       
## [28] R6_2.2.2          fansi_0.2.3       rmarkdown_1.10   
## [31] bookdown_0.7      reshape2_1.4.3    weGotThePower_0.1
## [34] tidyr_0.8.2       purrr_0.2.5       magrittr_1.5     
## [37] backports_1.1.2   scales_0.5.0      htmltools_0.3.6  
## [40] assertthat_0.2.0  colorspace_1.3-2  labeling_0.3     
## [43] utf8_1.1.4        stringi_1.2.4     lazyeval_0.2.1   
## [46] munsell_0.5.0     crayon_1.3.4

References

Anderson, Ben, David Eyers, Rebecca Ford, Diana Giraldo Ocampo, Rana Peniamina, Janet Stephenson, Kiti Suomalainen, Lara Wilcocks, and Michael Jack. 2018. “New Zealand GREEN Grid Household Electricity Demand Study 2014-2018,” September. doi:10.5255/UKDA-SN-853334.

Csárdi, Gábor, and Rich FitzJohn. 2016. Progress: Terminal Progress Bars. https://CRAN.R-project.org/package=progress.

Dowle, M, A Srinivasan, T Short, S Lianoglou with contributions from R Saporta, and E Antonyan. 2015. Data.table: Extension of Data.frame. https://CRAN.R-project.org/package=data.table.

Grolemund, Garrett, and Hadley Wickham. 2011. “Dates and Times Made Easy with lubridate.” Journal of Statistical Software 40 (3): 1–25. http://www.jstatsoft.org/v40/i03/.

R Core Team. 2016. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

Wickham, Hadley. 2009. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. http://ggplot2.org.

Wickham, Hadley, and Romain Francois. 2016. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.

Wickham, Hadley, Jim Hester, and Romain Francois. 2016. Readr: Read Tabular Data. https://CRAN.R-project.org/package=readr.

Xie, Yihui. 2016. Knitr: A General-Purpose Package for Dynamic Report Generation in R. https://CRAN.R-project.org/package=knitr.