1 About

1.1 Paper circulation:

  • Public

1.2 License

This work is made available under the Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) License.

This means you are free to:

  • Share — copy and redistribute the material in any medium or format
  • Adapt — remix, transform, and build upon the material for any purpose, even commercially.

Under the following terms:

  • Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
  • ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.
  • No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.

Notices:

  • You do not have to comply with the license for elements of the material in the public domain or where your use is permitted by an applicable exception or limitation.
  • No warranties are given. The license may not give you all of the permissions necessary for your intended use. For example, other rights such as publicity, privacy, or moral rights may limit how you use the material. #YMMV

For the avoidance of doubt and explanation of terms please refer to the full license notice and legal code.

1.3 Citation

If you wish to use any of the material from this paper please cite as:

  • Ben Anderson and Tom Rushby. (2018) Statistical Power, Statistical Significance, Study Design and Decision Making: A Worked Example (Sizing Demand Response Trials in New Zealand), Southampton: University of Southampton.

This work is (c) 2018 the authors.

1.4 History

Code history is generally tracked via the paper repo:

1.5 Data:

This report uses circuit level extracts for ‘Heat Pumps’ from the NZ GREEN Grid Household Electricity Demand Data (https://dx.doi.org/10.5255/UKDA-SN-853334 (Anderson et al. 2018)). These have been extracted using the code found in https://github.com/CfSOtago/GREENGridData/blob/master/examples/code/extractCleanGridSpy1minCircuit.R

1.6 Acknowledgements

This work was supported by:

2 Introduction

This report contains the analysis for a paper of the same name. The text is stored elsewhere for ease of editing.

3 Error, power, significance and decision making

4 Sample design: statistical power

Figure 4.1 shows the initial p = 0.05 plot.

## Scale for 'y' is already present. Adding another scale for 'y', which
## will replace the existing scale.
Power analysis results (p = 0.05, power = 0.8)

Figure 4.1: Power analysis results (p = 0.05, power = 0.8)

## Saving 7 x 5 in image

Effect size at n = 1000: 11.12.

Figure 4.2 shows the plot for all results.

## Scale for 'y' is already present. Adding another scale for 'y', which
## will replace the existing scale.
Power analysis results (power = 0.8)

Figure 4.2: Power analysis results (power = 0.8)

## Saving 7 x 5 in image

Full table of results:

## Using 'effectSize' as value column. Use 'value.var' to override
Table 4.1: Full results table (part)
sampleN p = 0.01 p = 0.05 p = 0.1 p = 0.2
50 64.25 50.08 42.64 33.73
100 45.11 35.28 30.09 23.83
150 36.75 28.78 24.55 19.45
200 31.79 24.91 21.25 16.84
250 28.41 22.27 19.01 15.06
300 25.93 20.32 17.35 13.75
350 23.99 18.81 16.06 12.73
400 22.44 17.60 15.02 11.90
450 21.15 16.59 14.16 11.22
500 20.06 15.74 13.43 10.65
550 19.13 15.00 12.81 10.15
600 18.31 14.36 12.26 9.72
650 17.59 13.80 11.78 9.34
700 16.95 13.30 11.35 9.00
750 16.37 12.85 10.97 8.69
800 15.85 12.44 10.62 8.42
850 15.38 12.07 10.30 8.17
900 14.95 11.73 10.01 7.94
950 14.55 11.41 9.74 7.72
1000 14.18 11.12 9.50 7.53

5 Testing for differences: effect sizes, confidence intervals and p values

5.1 Getting it ‘wrong’

Table 5.1: Number of households and summary statistics per group
group mean W sd W n households
Control 162.66915 325.51171 28
Intervention 1 35.13947 83.90258 22
Intervention 2 58.80597 113.53102 26
Intervention 3 68.37439 147.37279 29

T test group 1

Table 5.2: T test results (Group 1 vs Control)
Control mean Group 1 mean Mean difference statistic p.value conf.low conf.high
162.6691 35.13947 -127.5297 -1.990661 0.0552626 -258.11 3.050644

The results show that the mean power demand for the control group was 162.67W and for Intervention 1 was 35.14W. This is a (very) large difference in the mean of 127.53. The results of the t test are:

  • effect size = 128W or 78% representing a substantial bang for buck for whatever caused the difference;
  • 95% confidence interval for the test = -258.11 to 3.05 representing considerable uncertainty/variation;
  • p value of 0.055 representing a relatively low risk of a false positive result but which (just) fails the conventional p < 0.05 threshold.

T test Group 2

Table 5.3: T test results (Group 2 vs Control)
Control mean Group 2 mean Mean difference statistic p.value conf.low conf.high
162.6691 58.80597 -103.8632 -1.587604 0.1216582 -236.8285 29.10212

Now:

  • effect size = 104W or 63.85% representing a still reasonable bang for buck for whatever caused the difference;
  • 95% confidence interval for the test = -236.83 to 29.1 representing even greater uncertainty/variation;
  • p value of 0.122 representing a higher risk of a false positive result which fails the conventional p < 0.05 threshold and also the less conservative p < 0.1.

To detect Intervention Group 2’s effect size of 63.85% would have required control and trial group sizes of 47 respectively.

5.2 Getting it ‘right’

Table 5.4: Number of households and summary statistics per group
group mean W sd W n households
Control 157.38342 319.32150 1190
Intervention 1 34.49009 80.06252 836
Intervention 2 60.35725 113.91731 1020
Intervention 3 67.36074 142.31734 1154
Mean W demand per group for large sample (Error bars = 95% confidence intervals for the sample mean)

Figure 5.1: Mean W demand per group for large sample (Error bars = 95% confidence intervals for the sample mean)

re-run T tests Group 1

Table 5.5: T test results (Group 1 vs Control)
Control mean Group 1 mean Mean difference statistic p.value conf.low conf.high
157.3834 60.35725 -97.02617 -9.780754 0 -116.4846 -77.5677

In this case:

  • effect size = 97.0261674W or 61.65% representing a still reasonable bang for buck for whatever caused the difference;
  • 95% confidence interval for the test = -116.48 to -77.57 representing much less uncertainty/variation;
  • p value of 0 representing a very low risk of a false positive result as it passes all conventional thresholds.

6 Summary and recommendations

6.1 Statistical power and sample design

6.2 Reporting statistical tests of difference (effects)

6.3 Making inferences and taking decisions

7 Acknowledgments

8 Runtime

Analysis completed in 51.97 seconds ( 0.87 minutes) using knitr in RStudio with R version 3.5.1 (2018-07-02) running on x86_64-apple-darwin15.6.0.

9 R environment

R packages used:

  • base R - for the basics (R Core Team 2016)
  • data.table - for fast (big) data handling (Dowle et al. 2015)
  • lubridate - date manipulation (Grolemund and Wickham 2011)
  • ggplot2 - for slick graphics (Wickham 2009)
  • readr - for csv reading/writing (Wickham, Hester, and Francois 2016)
  • dplyr - for select and contains (Wickham and Francois 2016)
  • progress - for progress bars (Csárdi and FitzJohn 2016)
  • knitr - to create this document & neat tables (Xie 2016)
  • GREENGrid - for local NZ GREEN Grid project utilities

Session info:

## R version 3.5.1 (2018-07-02)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS High Sierra 10.13.6
## 
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_NZ.UTF-8/en_NZ.UTF-8/en_NZ.UTF-8/C/en_NZ.UTF-8/en_NZ.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] knitr_1.20         broom_0.5.0        GREENGridData_1.0 
##  [4] SAVEr_0.0.1.9000   lubridate_1.7.4    readr_1.1.1       
##  [7] ggplot2_3.1.0      dplyr_0.7.7        data.table_1.11.8 
## [10] myUtils_0.0.0.9000
## 
## loaded via a namespace (and not attached):
##  [1] progress_1.2.0    tidyselect_0.2.5  xfun_0.4         
##  [4] purrr_0.2.5       reshape2_1.4.3    haven_1.1.2      
##  [7] lattice_0.20-35   colorspace_1.3-2  htmltools_0.3.6  
## [10] yaml_2.2.0        utf8_1.1.4        rlang_0.3.0.1    
## [13] pillar_1.3.0      glue_1.3.0        withr_2.1.2      
## [16] tidyverse_1.2.1   modelr_0.1.2      readxl_1.1.0     
## [19] bindrcpp_0.2.2    bindr_0.1.1       plyr_1.8.4       
## [22] stringr_1.3.1     munsell_0.5.0     gtable_0.2.0     
## [25] cellranger_1.1.0  rvest_0.3.2       evaluate_0.12    
## [28] labeling_0.3      forcats_0.3.0     fansi_0.4.0      
## [31] highr_0.7         Rcpp_0.12.19      scales_1.0.0     
## [34] backports_1.1.2   jsonlite_1.5      hms_0.4.2        
## [37] digest_0.6.18     stringi_1.2.4     bookdown_0.7     
## [40] grid_3.5.1        rprojroot_1.3-2   cli_1.0.1        
## [43] tools_3.5.1       magrittr_1.5      lazyeval_0.2.1   
## [46] tibble_1.4.2      crayon_1.3.4      tidyr_0.8.1      
## [49] pkgconfig_2.0.2   xml2_1.2.0        prettyunits_1.0.2
## [52] assertthat_0.2.0  rmarkdown_1.10    httr_1.3.1       
## [55] R6_2.3.0          nlme_3.1-137      compiler_3.5.1

References

Anderson, Ben, David Eyers, Rebecca Ford, Diana Giraldo Ocampo, Rana Peniamina, Janet Stephenson, Kiti Suomalainen, Lara Wilcocks, and Michael Jack. 2018. “New Zealand GREEN Grid Household Electricity Demand Study 2014-2018,” September. doi:10.5255/UKDA-SN-853334.

Csárdi, Gábor, and Rich FitzJohn. 2016. Progress: Terminal Progress Bars. https://CRAN.R-project.org/package=progress.

Dowle, M, A Srinivasan, T Short, S Lianoglou with contributions from R Saporta, and E Antonyan. 2015. Data.table: Extension of Data.frame. https://CRAN.R-project.org/package=data.table.

Grolemund, Garrett, and Hadley Wickham. 2011. “Dates and Times Made Easy with lubridate.” Journal of Statistical Software 40 (3): 1–25. http://www.jstatsoft.org/v40/i03/.

R Core Team. 2016. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

Wickham, Hadley. 2009. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. http://ggplot2.org.

Wickham, Hadley, and Romain Francois. 2016. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.

Wickham, Hadley, Jim Hester, and Romain Francois. 2016. Readr: Read Tabular Data. https://CRAN.R-project.org/package=readr.

Xie, Yihui. 2016. Knitr: A General-Purpose Package for Dynamic Report Generation in R. https://CRAN.R-project.org/package=knitr.