Administrator approval is now required for registering new accounts. If you are registering a new account, and are external to the University, please ask the repository owner to contact ServiceLine to request your account be approved. Repository owners must include the newly registered email address, and specific repository in the request for approval.

Commit 301c0016 authored by Ben Anderson's avatar Ben Anderson
Browse files

added equation, updated proportions, re-ran to word & html

parent c4e914d2
No preview for this file type
---
title: "Equation tests"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
Variance:
$\sigma^{2} = \frac{\sum\limits_{i=1}^{n} \left(x_{i} - \bar{x}\right)^{2}} {n-1}$
Standard deviation:
$\sigma = \sqrt{\frac{\sum\limits_{i=1}^{n} \left(x_{i} - \bar{x}\right)^{2}} {n-1}}$
Margins of error (proportions)
$$me = +/- z * \sqrt{\frac{p(1-p)} {n-1}}$$
\ No newline at end of file
......@@ -9,6 +9,10 @@ author: '`r paste0(params$author)` (Contact: b.anderson@soton.ac.uk, `@dataknut`
date: 'Last run at: `r Sys.time()`'
always_allow_html: yes
output:
bookdown::word_document2:
fig_caption: yes
toc: yes
toc_depth: 2
bookdown::html_document2:
code_folding: hide
fig_caption: yes
......@@ -17,10 +21,6 @@ output:
toc: yes
toc_depth: 2
toc_float: yes
bookdown::word_document2:
fig_caption: yes
toc: yes
toc_depth: 2
bookdown::pdf_document2:
fig_caption: yes
keep_tex: yes
......@@ -50,6 +50,7 @@ rmdLibs <- c("data.table", # data munching
"broom", # tidy test results
"dkUtils", # utilities from devtools::install_github("dataknut/dkUtils")
"forcats", # category manipulation
"pwr", # power stuff
"knitr" # for kable
)
# load them
......@@ -72,7 +73,7 @@ labelProfilePlot <- function(plot){
myParams <- list()
myParams$repoLoc <- dkUtils::findParentDirectory("weGotThePower")
myParams$dPath <- "~/Dropbox/Work/data/nzGREENGrid/dataExtracts/"
myParams$dPath <- "~/Dropbox/Work/Otago_CfS_Ben/data/nzGREENGrid/dataExtracts/"
#myParams$dPath <- "~/Data/NZ_GREENGrid/safe/gridSpy/1min/dataExtracts/"
# created from https://dx.doi.org/10.5255/UKDA-SN-853334
# using https://github.com/CfSOtago/GREENGridData/blob/master/examples/code/extractCleanGridSpy1minCircuit.R
......@@ -350,7 +351,7 @@ knitr::kable(dt, caption = "Samples required if p1 = 40% and p2 = 25%", digits =
We can repeat this for other values of p1 and p2. For example, suppose both were much smaller (e.g. 10% and 15%)... Clearly we need _much_ larger samples.
```{r propTable1}
```{r propTable2}
dt <- getPropN(p1 =0.1, p2 =0.15)
knitr::kable(dt, caption = "Samples required if p1 = 10% and p2 = 15%", digits = 2)
```
......@@ -360,6 +361,8 @@ The above used an arcsine transform.
As a double check, using eqn to assess margin of error...
$$me = +/- z * \sqrt{\frac{p(1-p)} {n-1}}$$
If:
* p = 0.4 (40%)
......@@ -383,8 +386,6 @@ emr <- round(em,3)
This may be far too wide an error margin for our purposes so we may instead have recruited 500 per sample. Now the margin of error is +/- `r emr` (`r 100*emr`%) so we can now quote the Heat Pump uptake for owner-occupiers as 40% (+/- `r 100*emr`% [or `r 40 - 100*emr` - `r 40 + 100*emr`] with p = 0.05).
In much the same way as we did for means, we can calculate error margins
# Testing for differences: effect sizes, confidence intervals and p values
## Getting it 'wrong'
......@@ -597,7 +598,8 @@ R packages used:
* dplyr - for select and contains [@dplyr]
* progress - for progress bars [@progress]
* knitr - to create this document & neat tables [@knitr]
* GREENGrid - for local NZ GREEN Grid project utilities
* pwr - non-base power analysis [@pwr]
* dkUtils - for local dataknut utilities :-) `devtools::install_github("dataknut/dkUtils")`
Session info:
......
......@@ -239,7 +239,7 @@ div.tocify {
<h1 class="title toc-ignore">Statistical Power, Statistical Significance, Study Design and Decision Making: A Worked Example</h1>
<h3 class="subtitle"><em>Sizing Demand Response Trials in New Zealand</em></h3>
<h4 class="author"><em>Ben Anderson and Tom Rushby (Contact: <a href="mailto:b.anderson@soton.ac.uk">b.anderson@soton.ac.uk</a>, <code>@dataknut</code>)</em></h4>
<h4 class="date"><em>Last run at: 2018-11-09 17:04:01</em></h4>
<h4 class="date"><em>Last run at: 2018-11-13 09:57:46</em></h4>
</div>
......@@ -514,183 +514,93 @@ Figure 4.2: Power analysis results (power = 0.8)
</div>
<div id="proportions" class="section level2">
<h2><span class="header-section-number">4.2</span> Proportions</h2>
<p>Does not require a sample.</p>
<p>Figure <a href="#fig:propSampleSizeFig80">4.3</a> shows the initial p = 0.05 plot. This shows the difference that would be required</p>
<pre><code>## Scale for &#39;y&#39; is already present. Adding another scale for &#39;y&#39;, which
## will replace the existing scale.</code></pre>
<div class="figure"><span id="fig:propSampleSizeFig80"></span>
<img src="weGotThePowerDraftPaper_files/figure-html/propSampleSizeFig80-1.png" alt="Power analysis results for proportions (p = 0.05, power = 0.8)" width="672" />
<p class="caption">
Figure 4.3: Power analysis results for proportions (p = 0.05, power = 0.8)
</p>
</div>
<pre><code>## Saving 7 x 5 in image</code></pre>
<p>Figure <a href="#fig:propSampleSizeFig80all">4.4</a> shows the plot for all results.</p>
<pre><code>## Scale for &#39;y&#39; is already present. Adding another scale for &#39;y&#39;, which
## will replace the existing scale.</code></pre>
<div class="figure"><span id="fig:propSampleSizeFig80all"></span>
<img src="weGotThePowerDraftPaper_files/figure-html/propSampleSizeFig80all-1.png" alt="Power analysis results (power = 0.8)" width="672" />
<p class="caption">
Figure 4.4: Power analysis results (power = 0.8)
</p>
</div>
<pre><code>## Saving 7 x 5 in image</code></pre>
<p>Full table of results:</p>
<pre><code>## Using &#39;effectSize&#39; as value column. Use &#39;value.var&#39; to override</code></pre>
<p>Does not require a sample. As a relatively simple example, suppose we were interested in the adoption of heat pumps in two equal sized samples. Suppose we thought in one sample (say, home owners) we thought it might be 40% and in rental properties it would be 25% (ref BRANZ 2015). What sample size would we need to conclude a significant difference with power = 0.8 and at various p values?</p>
<p><code>pwr::pwr.tp.test()</code> (ref pwr) can give us the answer…</p>
<table>
<caption><span id="tab:propPowerTable">Table 4.3: </span>Power analysis for proportions results table (partial)</caption>
<caption><span id="tab:propTable1">Table 4.3: </span>Samples required if p1 = 40% and p2 = 25%</caption>
<thead>
<tr class="header">
<th align="right">sampleN</th>
<th align="right">p = 0.01</th>
<th align="right">p = 0.05</th>
<th align="right">p = 0.1</th>
<th align="right">p = 0.2</th>
<th align="right">n</th>
<th align="right">sig.level</th>
<th align="right">power</th>
<th align="left">props</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td align="right">50</td>
<td align="right">68.35</td>
<td align="right">56.03</td>
<td align="right">49.73</td>
<td align="right">42.44</td>
</tr>
<tr class="even">
<td align="right">100</td>
<td align="right">48.33</td>
<td align="right">39.62</td>
<td align="right">35.16</td>
<td align="right">30.01</td>
</tr>
<tr class="odd">
<td align="right">150</td>
<td align="right">39.46</td>
<td align="right">32.35</td>
<td align="right">28.71</td>
<td align="right">24.51</td>
</tr>
<tr class="even">
<td align="right">200</td>
<td align="right">34.17</td>
<td align="right">28.01</td>
<td align="right">24.86</td>
<td align="right">21.22</td>
</tr>
<tr class="odd">
<td align="right">250</td>
<td align="right">30.57</td>
<td align="right">25.06</td>
<td align="right">22.24</td>
<td align="right">18.98</td>
<td align="right">224.94</td>
<td align="right">0.01</td>
<td align="right">0.8</td>
<td align="left">p1 = 0.4 p2 = 0.25</td>
</tr>
<tr class="even">
<td align="right">300</td>
<td align="right">27.90</td>
<td align="right">22.88</td>
<td align="right">20.30</td>
<td align="right">17.33</td>
<td align="right">151.17</td>
<td align="right">0.05</td>
<td align="right">0.8</td>
<td align="left">p1 = 0.4 p2 = 0.25</td>
</tr>
<tr class="odd">
<td align="right">350</td>
<td align="right">25.83</td>
<td align="right">21.18</td>
<td align="right">18.80</td>
<td align="right">16.04</td>
<td align="right">119.07</td>
<td align="right">0.10</td>
<td align="right">0.8</td>
<td align="left">p1 = 0.4 p2 = 0.25</td>
</tr>
<tr class="even">
<td align="right">400</td>
<td align="right">24.16</td>
<td align="right">19.81</td>
<td align="right">17.58</td>
<td align="right">15.01</td>
<td align="right">86.73</td>
<td align="right">0.20</td>
<td align="right">0.8</td>
<td align="left">p1 = 0.4 p2 = 0.25</td>
</tr>
<tr class="odd">
<td align="right">450</td>
<td align="right">22.78</td>
<td align="right">18.68</td>
<td align="right">16.58</td>
<td align="right">14.15</td>
</tr>
<tr class="even">
<td align="right">500</td>
<td align="right">21.61</td>
<td align="right">17.72</td>
<td align="right">15.72</td>
<td align="right">13.42</td>
</tr>
<tr class="odd">
<td align="right">550</td>
<td align="right">20.61</td>
<td align="right">16.89</td>
<td align="right">15.00</td>
<td align="right">12.80</td>
</tr>
<tr class="even">
<td align="right">600</td>
<td align="right">19.73</td>
<td align="right">16.17</td>
<td align="right">14.36</td>
<td align="right">12.25</td>
</tr>
<tr class="odd">
<td align="right">650</td>
<td align="right">18.96</td>
<td align="right">15.54</td>
<td align="right">13.79</td>
<td align="right">11.77</td>
</tr>
<tr class="even">
<td align="right">700</td>
<td align="right">18.27</td>
<td align="right">14.97</td>
<td align="right">13.29</td>
<td align="right">11.34</td>
</tr>
<tr class="odd">
<td align="right">750</td>
<td align="right">17.65</td>
<td align="right">14.47</td>
<td align="right">12.84</td>
<td align="right">10.96</td>
</tr>
<tr class="even">
<td align="right">800</td>
<td align="right">17.09</td>
<td align="right">14.01</td>
<td align="right">12.43</td>
<td align="right">10.61</td>
</tbody>
</table>
<p>We can repeat this for other values of p1 and p2. For example, suppose both were much smaller (e.g. 10% and 15%)… Clearly we need <em>much</em> larger samples.</p>
<table>
<caption><span id="tab:propTable2">Table 4.4: </span>Samples required if p1 = 10% and p2 = 15%</caption>
<thead>
<tr class="header">
<th align="right">n</th>
<th align="right">sig.level</th>
<th align="right">power</th>
<th align="left">props</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td align="right">850</td>
<td align="right">16.57</td>
<td align="right">13.59</td>
<td align="right">12.06</td>
<td align="right">10.29</td>
<td align="right">1012.35</td>
<td align="right">0.01</td>
<td align="right">0.8</td>
<td align="left">p1 = 0.1 p2 = 0.15</td>
</tr>
<tr class="even">
<td align="right">900</td>
<td align="right">16.11</td>
<td align="right">13.21</td>
<td align="right">11.72</td>
<td align="right">10.00</td>
<td align="right">680.35</td>
<td align="right">0.05</td>
<td align="right">0.8</td>
<td align="left">p1 = 0.1 p2 = 0.15</td>
</tr>
<tr class="odd">
<td align="right">950</td>
<td align="right">15.68</td>
<td align="right">12.86</td>
<td align="right">11.41</td>
<td align="right">9.73</td>
<td align="right">535.89</td>
<td align="right">0.10</td>
<td align="right">0.8</td>
<td align="left">p1 = 0.1 p2 = 0.15</td>
</tr>
<tr class="even">
<td align="right">1000</td>
<td align="right">15.28</td>
<td align="right">12.53</td>
<td align="right">11.12</td>
<td align="right">9.49</td>
<td align="right">390.31</td>
<td align="right">0.20</td>
<td align="right">0.8</td>
<td align="left">p1 = 0.1 p2 = 0.15</td>
</tr>
</tbody>
</table>
<p>The above used an arcsine transform.</p>
<p>As a double check, using eqn to assess margin of error…</p>
<p><span class="math display">\[me = +/- z * \sqrt{\frac{p(1-p)} {n-1}}\]</span></p>
<p>If:</p>
<ul>
<li>p = 0.4 (40%)</li>
<li>n = 151</li>
</ul>
<p>then the margin of error = +/- 0.078 (7.8%). So we could quote the Heat Pump uptake for owner-occupiers as 40% (+/- 7.8% [or 32.2 - 47.8] with p = 0.05).</p>
<p>This may be far too wide an error margin for our purposes so we may instead have recruited 500 per sample. Now the margin of error is +/- 0.043 (4.3%) so we can now quote the Heat Pump uptake for owner-occupiers as 40% (+/- 4.3% [or 35.7 - 44.3] with p = 0.05).</p>
</div>
</div>
<div id="testing-for-differences-effect-sizes-confidence-intervals-and-p-values" class="section level1">
......@@ -816,27 +726,27 @@ Figure 4.4: Power analysis results (power = 0.8)
<tbody>
<tr class="odd">
<td align="left">Control</td>
<td align="right">175.77480</td>
<td align="right">331.8449</td>
<td align="right">1148</td>
<td align="right">169.60064</td>
<td align="right">328.56355</td>
<td align="right">1140</td>
</tr>
<tr class="even">
<td align="left">Intervention 1</td>
<td align="right">36.21610</td>
<td align="right">83.0941</td>
<td align="right">904</td>
<td align="right">34.50149</td>
<td align="right">82.94015</td>
<td align="right">907</td>
</tr>
<tr class="odd">
<td align="left">Intervention 2</td>
<td align="right">67.89879</td>
<td align="right">121.1046</td>
<td align="right">1018</td>
<td align="right">59.84020</td>
<td align="right">112.74650</td>
<td align="right">1008</td>
</tr>
<tr class="even">
<td align="left">Intervention 3</td>
<td align="right">70.97243</td>
<td align="right">147.6943</td>
<td align="right">1130</td>
<td align="right">73.24102</td>
<td align="right">148.25869</td>
<td align="right">1145</td>
</tr>
</tbody>
</table>
......@@ -862,20 +772,20 @@ Figure 5.1: Mean W demand per group for large sample (Error bars = 95% confidenc
</thead>
<tbody>
<tr class="odd">
<td align="right">175.7748</td>
<td align="right">67.89879</td>
<td align="right">-107.876</td>
<td align="right">-10.27012</td>
<td align="right">169.6006</td>
<td align="right">59.8402</td>
<td align="right">-109.7604</td>
<td align="right">-10.59573</td>
<td align="right">0</td>
<td align="right">-128.4801</td>
<td align="right">-87.27195</td>
<td align="right">-130.0807</td>
<td align="right">-89.44015</td>
</tr>
</tbody>
</table>
<p>In this case:</p>
<ul>
<li>effect size = 107.8760107W or 61.37% representing a still <em>reasonable bang for buck</em> for whatever caused the difference;</li>
<li>95% confidence interval for the test = -128.48 to -87.27 representing <em>much less</em> uncertainty/variation;</li>
<li>effect size = 109.7604326W or 64.72% representing a still <em>reasonable bang for buck</em> for whatever caused the difference;</li>
<li>95% confidence interval for the test = -130.08 to -89.44 representing <em>much less</em> uncertainty/variation;</li>
<li>p value of 0 representing a <em>very low</em> risk of a false positive result as it passes all conventional thresholds.</li>
</ul>
</div>
......@@ -897,7 +807,7 @@ Figure 5.1: Mean W demand per group for large sample (Error bars = 95% confidenc
</div>
<div id="runtime" class="section level1">
<h1><span class="header-section-number">8</span> Runtime</h1>
<p>Analysis completed in 73.55 seconds ( 1.23 minutes) using <a href="https://cran.r-project.org/package=knitr">knitr</a> in <a href="http://www.rstudio.com">RStudio</a> with R version 3.5.1 (2018-07-02) running on x86_64-apple-darwin15.6.0.</p>
<p>Analysis completed in 51.37 seconds ( 0.86 minutes) using <a href="https://cran.r-project.org/package=knitr">knitr</a> in <a href="http://www.rstudio.com">RStudio</a> with R version 3.5.1 (2018-07-02) running on x86_64-apple-darwin15.6.0.</p>
</div>
<div id="r-environment" class="section level1">
<h1><span class="header-section-number">9</span> R environment</h1>
......@@ -911,7 +821,8 @@ Figure 5.1: Mean W demand per group for large sample (Error bars = 95% confidenc
<li>dplyr - for select and contains <span class="citation">(Wickham and Francois 2016)</span></li>
<li>progress - for progress bars <span class="citation">(Csárdi and FitzJohn 2016)</span></li>
<li>knitr - to create this document &amp; neat tables <span class="citation">(Xie 2016)</span></li>
<li>GREENGrid - for local NZ GREEN Grid project utilities</li>
<li>pwr - non-base power analysis <span class="citation">(Champely 2018)</span></li>
<li>dkUtils - for local dataknut utilities :-) <code>devtools::install_github(&quot;dataknut/dkUtils&quot;)</code></li>
</ul>
<p>Session info:</p>
<pre><code>## R version 3.5.1 (2018-07-02)
......@@ -929,27 +840,28 @@ Figure 5.1: Mean W demand per group for large sample (Error bars = 95% confidenc
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] knitr_1.20 forcats_0.3.0 broom_0.5.0
## [4] lubridate_1.7.4 readr_1.1.1 ggplot2_3.0.0
## [7] dplyr_0.7.6 data.table_1.11.4 dkUtils_0.0.0.9000
## [1] knitr_1.20 pwr_1.2-2 forcats_0.3.0
## [4] broom_0.5.0 lubridate_1.7.4 readr_1.1.1
## [7] ggplot2_3.1.0 dplyr_0.7.7 data.table_1.11.8
## [10] dkUtils_0.0.0.9000
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.18 highr_0.7 pillar_1.3.0
## [1] Rcpp_0.12.19 highr_0.7 pillar_1.3.0
## [4] compiler_3.5.1 plyr_1.8.4 bindr_0.1.1
## [7] tools_3.5.1 digest_0.6.15 lattice_0.20-35
## [10] nlme_3.1-137 evaluate_0.11 tibble_1.4.2
## [13] gtable_0.2.0 pkgconfig_2.0.1 rlang_0.3.0.1
## [16] cli_1.0.0 yaml_2.2.0 xfun_0.3
## [19] bindrcpp_0.2.2 pwr_1.2-2 withr_2.1.2
## [22] stringr_1.3.1 hms_0.4.2 rprojroot_1.3-2
## [25] grid_3.5.1 tidyselect_0.2.5 glue_1.3.0
## [28] R6_2.2.2 fansi_0.2.3 rmarkdown_1.10
## [31] bookdown_0.7 reshape2_1.4.3 weGotThePower_0.1
## [34] tidyr_0.8.2 purrr_0.2.5 magrittr_1.5
## [37] backports_1.1.2 scales_0.5.0 htmltools_0.3.6
## [40] assertthat_0.2.0 colorspace_1.3-2 labeling_0.3
## [43] utf8_1.1.4 stringi_1.2.4 lazyeval_0.2.1
## [46] munsell_0.5.0 crayon_1.3.4</code></pre>
## [7] tools_3.5.1 digest_0.6.18 lattice_0.20-35
## [10] nlme_3.1-137 evaluate_0.12 tibble_1.4.2
## [13] gtable_0.2.0 pkgconfig_2.0.2 rlang_0.3.0.1
## [16] cli_1.0.1 yaml_2.2.0 xfun_0.4
## [19] bindrcpp_0.2.2 withr_2.1.2 stringr_1.3.1
## [22] hms_0.4.2 rprojroot_1.3-2 grid_3.5.1
## [25] tidyselect_0.2.5 glue_1.3.0 R6_2.3.0
## [28] fansi_0.4.0 rmarkdown_1.10 bookdown_0.7
## [31] reshape2_1.4.3 weGotThePower_0.1 tidyr_0.8.1
## [34] purrr_0.2.5 magrittr_1.5 backports_1.1.2
## [37] scales_1.0.0 htmltools_0.3.6 assertthat_0.2.0
## [40] colorspace_1.3-2 labeling_0.3 utf8_1.1.4
## [43] stringi_1.2.4 lazyeval_0.2.1 munsell_0.5.0
## [46] crayon_1.3.4</code></pre>
</div>
<div id="references" class="section level1 unnumbered">
<h1>References</h1>
......@@ -957,6 +869,9 @@ Figure 5.1: Mean W demand per group for large sample (Error bars = 95% confidenc
<div id="ref-anderson_new_2018">
<p>Anderson, Ben, David Eyers, Rebecca Ford, Diana Giraldo Ocampo, Rana Peniamina, Janet Stephenson, Kiti Suomalainen, Lara Wilcocks, and Michael Jack. 2018. “New Zealand GREEN Grid Household Electricity Demand Study 2014-2018,” September. doi:<a href="https://doi.org/10.5255/UKDA-SN-853334">10.5255/UKDA-SN-853334</a>.</p>
</div>
<div id="ref-pwr">
<p>Champely, Stephane. 2018. <em>Pwr: Basic Functions for Power Analysis</em>. <a href="https://CRAN.R-project.org/package=pwr" class="uri">https://CRAN.R-project.org/package=pwr</a>.</p>
</div>
<div id="ref-progress">
<p>Csárdi, Gábor, and Rich FitzJohn. 2016. <em>Progress: Terminal Progress Bars</em>. <a href="https://CRAN.R-project.org/package=progress" class="uri">https://CRAN.R-project.org/package=progress</a>.</p>
</div>
......
This source diff could not be displayed because it is too large. You can view the blob instead.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment