Commit 94de08e5 authored by Ben Anderson's avatar Ben Anderson
Browse files

fixed data paths for HCS use; added notes re random re-sampling process;...

fixed data paths for HCS use; added notes re random re-sampling process; removed word version as they won't match (due to random re-sampling process)
parent 94dd429b
......@@ -9,10 +9,6 @@ author: '`r paste0(params$authors)` (Contact: b.anderson@soton.ac.uk, `@dataknut
date: 'Last run at: `r Sys.time()`'
always_allow_html: yes
output:
bookdown::word_document2:
fig_caption: yes
toc: yes
toc_depth: 2
bookdown::html_document2:
code_folding: hide
fig_caption: yes
......@@ -21,6 +17,10 @@ output:
toc: yes
toc_depth: 2
toc_float: yes
bookdown::word_document2:
fig_caption: yes
toc: yes
toc_depth: 2
bookdown::pdf_document2:
fig_caption: yes
keep_tex: yes
......@@ -74,12 +74,13 @@ labelProfilePlot <- function(plot){
myParams <- list()
myParams$repoLoc <- dkUtils::findParentDirectory("weGotThePower")
myParams$dPath <- "~/Dropbox/Work/Otago_CfS_Ben/data/nzGREENGrid/"
#myParams$dPath <- "~/Data/NZ_GREENGrid/safe/gridSpy/1min/dataExtracts/"
myParams$dPath <- "/Volumes/hum-csafe/Research Projects/GREEN Grid/" # requires Otago HCS access
heatPumpData <- paste0(myParams$dPath, "cleanData/safe/gridSpy/1min/dataExtracts/Heat Pump_2015-04-01_2016-03-31_observations.csv.gz")
# created from https://dx.doi.org/10.5255/UKDA-SN-853334
# using https://github.com/CfSOtago/GREENGridData/blob/master/examples/code/extractCleanGridSpy1minCircuit.R
heatPumpData <- paste0(myParams$dPath, "dataExtracts/Heat Pump_2015-04-01_2016-03-31_observations.csv.gz")
ggHHData <- paste0(myParams$dPath, "ggHouseholdAttributesSafe.csv")
ggHHData <- paste0(myParams$dPath, "Packaged Data for Sharing Externally/ReShare/reshare_v1.0/ggHouseholdAttributesSafe.csv.zip")
myParams$GGDataDOI <- "https://dx.doi.org/10.5255/UKDA-SN-853334"
plotCaption <- paste0("Source: ", myParams$GGDataDOI)
......@@ -252,7 +253,7 @@ testPower <- 0.8
testMean <- mean(linkedTestDT[season == "Winter"]$meanW)
testSD <- sd(linkedTestDT[season == "Winter"]$meanW)
# use package function
# use package function - for details of what this does see https://github.com/dataknut/weGotThePower/blob/master/R/power.R
meansPowerDT <- weGotThePower::estimateMeanEffectSizes(testMean,testSD,testSamples,testPower) # auto-produces range of p values
```
......@@ -455,7 +456,9 @@ This may be far too wide an error margin for our purposes so we may instead have
Use base GREENGrid and number of people but re-sample slightly.
> NB: we create a small sample roughly 2 * the size of the GREEN Grid data. Due to small numbers and the random re-sampling with replacement process, there will be random fluctuations in the results with each run. As a result the results in this section will probably not match the results in the paper...
> NB 1: we create a small sample roughly 2 * the size of the GREEN Grid data. Due to small number effects and the random re-sampling with replacement process, there will be random fluctuations in the results with each run. As a consequence the results in this section will probably not match the results in the paper...
> NB 2: sometimes the small-n random process doesn't create households of a given type that we then need for analysis. Don't worry, just re-knit until it does :-)
```{r smallNTable}
......@@ -566,7 +569,7 @@ Now:
## Getting it 'right'
> NB: we create a larger sample roughly 40 * the size of the GREEN Grid data. Due to the random re-sampling with replacement process, there will be random fluctuations in the results with each run. As a result the results in this section will probably not exactly match the results in the paper but as the sample is large they should be quite close...
> NB: we create a larger sample roughly 40 * the size of the GREEN Grid data. Due to the random re-sampling with replacement process, there will be random fluctuations in the results with each run. Due to small number effects and the random re-sampling with replacement process, there will be random fluctuations in the results with each run. As a consequence the results in this section will probably not exactly match the results in the paper but as the new sample is large they should be quite close...
```{r creatLargeN}
# fix.
......
......@@ -239,7 +239,7 @@ div.tocify {
<h1 class="title toc-ignore">Statistical Power, Statistical Significance, Study Design and Decision Making: A Worked Example</h1>
<h3 class="subtitle"><em>Sizing Demand Response Trials in New Zealand</em></h3>
<h4 class="author"><em>Ben Anderson, Tom Rushby, Abubakr Bahaj and Patrick James (Contact: <a href="mailto:b.anderson@soton.ac.uk">b.anderson@soton.ac.uk</a>, <code>@dataknut</code>)</em></h4>
<h4 class="date"><em>Last run at: 2019-01-08 16:34:45</em></h4>
<h4 class="date"><em>Last run at: 2019-01-08 16:53:56</em></h4>
</div>
......@@ -449,7 +449,7 @@ div.tocify {
</tbody>
</table>
<p>Number of households in cleaned heatpump data: 28</p>
<pre><code>## Loading: ~/Dropbox/Work/Otago_CfS_Ben/data/nzGREENGrid/ggHouseholdAttributesSafe.csv</code></pre>
<pre><code>## Loading: /Volumes/hum-csafe/Research Projects/GREEN Grid/Packaged Data for Sharing Externally/ReShare/reshare_v1.0/ggHouseholdAttributesSafe.csv.zip</code></pre>
<pre><code>## Parsed with column specification:
## cols(
## .default = col_integer(),
......@@ -471,9 +471,9 @@ div.tocify {
## `Energy Storage` = col_character(),
## `Other Generation Device` = col_character(),
## hasLongSurvey = col_character(),
## StartDate = col_character(),
## Q14_1 = col_double()
## # ... with 12 more columns
## Q14_1 = col_double(),
## Q19_2 = col_character()
## # ... with 11 more columns
## )</code></pre>
<pre><code>## See spec(...) for full column specifications.</code></pre>
<table>
......@@ -927,7 +927,10 @@ Figure 4.2: Power analysis results (power = 0.8)
<h2><span class="header-section-number">5.1</span> Getting it ‘wrong’</h2>
<p>Use base GREENGrid and number of people but re-sample slightly.</p>
<blockquote>
<p>NB: we create a small sample roughly 2 * the size of the GREEN Grid data. Due to small numbers and the random re-sampling with replacement process, there will be random fluctuations in the results with each run. As a result the results in this section will probably not match the results in the paper…</p>
<p>NB 1: we create a small sample roughly 2 * the size of the GREEN Grid data. Due to small number effects and the random re-sampling with replacement process, there will be random fluctuations in the results with each run. As a consequence the results in this section will probably not match the results in the paper…</p>
</blockquote>
<blockquote>
<p>NB 2: sometimes the small-n random process doesn’t create households of a given type that we then need for analysis. Don’t worry, just re-knit until it does :-)</p>
</blockquote>
<table>
<caption><span id="tab:smallNTable">Table 5.1: </span>Number of households and summary statistics per group (winter heat pump use)</caption>
......@@ -942,27 +945,27 @@ Figure 4.2: Power analysis results (power = 0.8)
<tbody>
<tr class="odd">
<td align="left">1</td>
<td align="right">219.9445</td>
<td align="right">174.63250</td>
<td align="right">3</td>
<td align="right">320.7686</td>
<td align="right">0.0000</td>
<td align="right">5</td>
</tr>
<tr class="even">
<td align="left">2</td>
<td align="right">245.5305</td>
<td align="right">29.07944</td>
<td align="right">7</td>
<td align="right">303.8158</td>
<td align="right">82.4674</td>
<td align="right">8</td>
</tr>
<tr class="odd">
<td align="left">3</td>
<td align="right">456.6428</td>
<td align="right">297.16180</td>
<td align="right">14</td>
<td align="right">496.0141</td>
<td align="right">382.4916</td>
<td align="right">10</td>
</tr>
<tr class="even">
<td align="left">4+</td>
<td align="right">337.1304</td>
<td align="right">221.18396</td>
<td align="right">26</td>
<td align="right">443.0247</td>
<td align="right">287.5028</td>
<td align="right">27</td>
</tr>
</tbody>
</table>
......@@ -984,21 +987,21 @@ Figure 4.2: Power analysis results (power = 0.8)
</thead>
<tbody>
<tr class="odd">
<td align="right">219.9445</td>
<td align="right">456.6428</td>
<td align="right">-236.6983</td>
<td align="right">-1.844202</td>
<td align="right">0.1249639</td>
<td align="right">-567.4628</td>
<td align="right">94.06627</td>
<td align="right">320.7686</td>
<td align="right">496.0141</td>
<td align="right">-175.2455</td>
<td align="right">-1.448855</td>
<td align="right">0.1813074</td>
<td align="right">-448.8635</td>
<td align="right">98.37247</td>
</tr>
</tbody>
</table>
<p>The results show that the mean power demand for the control group was 456.64W and for Intervention 1 was 219.94W. This is a (very) large difference in the mean of 236.7. The results of the t test are:</p>
<p>The results show that the mean power demand for the control group was 496.01W and for Intervention 1 was 320.77W. This is a (very) large difference in the mean of 175.25. The results of the t test are:</p>
<ul>
<li>effect size = 237W or 52% representing a <em>substantial bang for buck</em> for whatever caused the difference;</li>
<li>95% confidence interval for the test = -567.46 to 94.07 representing <em>considerable</em> uncertainty/variation;</li>
<li>p value of 0.125 representing a <em>relatively low</em> risk of a false positive result but which (just) fails the conventional p &lt; 0.05 threshold.</li>
<li>effect size = 175W or 35% representing a <em>substantial bang for buck</em> for whatever caused the difference;</li>
<li>95% confidence interval for the test = -448.86 to 98.37 representing <em>considerable</em> uncertainty/variation;</li>
<li>p value of 0.181 representing a <em>relatively low</em> risk of a false positive result but which (just) fails the conventional p &lt; 0.05 threshold.</li>
</ul>
<p>T test 1 &lt;-&gt; 4+</p>
<table>
......@@ -1016,27 +1019,27 @@ Figure 4.2: Power analysis results (power = 0.8)
</thead>
<tbody>
<tr class="odd">
<td align="right">219.9445</td>
<td align="right">337.1304</td>
<td align="right">-117.1859</td>
<td align="right">-1.067661</td>
<td align="right">0.3689357</td>
<td align="right">-480.9376</td>
<td align="right">246.5658</td>
<td align="right">320.7686</td>
<td align="right">443.0247</td>
<td align="right">-122.2561</td>
<td align="right">-2.209582</td>
<td align="right">0.0361451</td>
<td align="right">-235.9884</td>
<td align="right">-8.523724</td>
</tr>
</tbody>
</table>
<p>Now:</p>
<ul>
<li>effect size = 117W or 34.76% representing a still <em>reasonable bang for buck</em> for whatever caused the difference;</li>
<li>95% confidence interval for the test = -480.94 to 246.57 representing <em>even greater</em> uncertainty/variation;</li>
<li>p value of 0.369 representing a <em>higher</em> risk of a false positive result which fails the conventional p &lt; 0.05 threshold and also the less conservative p &lt; 0.1.</li>
<li>effect size = 122W or 27.6% representing a still <em>reasonable bang for buck</em> for whatever caused the difference;</li>
<li>95% confidence interval for the test = -235.99 to -8.52 representing <em>even greater</em> uncertainty/variation;</li>
<li>p value of 0.036 representing a <em>higher</em> risk of a false positive result which fails the conventional p &lt; 0.05 threshold and also the less conservative p &lt; 0.1.</li>
</ul>
</div>
<div id="getting-it-right" class="section level2">
<h2><span class="header-section-number">5.2</span> Getting it ‘right’</h2>
<blockquote>
<p>NB: we create a larger sample roughly 40 * the size of the GREEN Grid data. Due to the random re-sampling with replacement process, there will be random fluctuations in the results with each run. As a result the results in this section will probably not exactly match the results in the paper but as the sample is large they should be quite close…</p>
<p>NB: we create a larger sample roughly 40 * the size of the GREEN Grid data. Due to the random re-sampling with replacement process, there will be random fluctuations in the results with each run. Due to small number effects and the random re-sampling with replacement process, there will be random fluctuations in the results with each run. As a consequence the results in this section will probably not exactly match the results in the paper but as the new sample is large they should be quite close…</p>
</blockquote>
<table>
<caption><span id="tab:creatLargeN">Table 5.4: </span>Number of households and summary statistics per group</caption>
......@@ -1051,27 +1054,27 @@ Figure 4.2: Power analysis results (power = 0.8)
<tbody>
<tr class="odd">
<td align="left">1</td>
<td align="right">176.4068</td>
<td align="right">151.94566</td>
<td align="right">88</td>
<td align="right">164.0661</td>
<td align="right">152.05614</td>
<td align="right">83</td>
</tr>
<tr class="even">
<td align="left">2</td>
<td align="right">276.2114</td>
<td align="right">59.33808</td>
<td align="right">156</td>
<td align="right">276.8820</td>
<td align="right">59.47509</td>
<td align="right">180</td>
</tr>
<tr class="odd">
<td align="left">3</td>
<td align="right">486.1851</td>
<td align="right">296.32261</td>
<td align="right">281</td>
<td align="right">464.4023</td>
<td align="right">276.92468</td>
<td align="right">285</td>
</tr>
<tr class="even">
<td align="left">4+</td>
<td align="right">423.5017</td>
<td align="right">265.73994</td>
<td align="right">475</td>
<td align="right">409.6488</td>
<td align="right">267.02900</td>
<td align="right">452</td>
</tr>
</tbody>
</table>
......@@ -1098,20 +1101,20 @@ Figure 5.1: Mean W demand per group for large sample (Error bars = 95% confidenc
</thead>
<tbody>
<tr class="odd">
<td align="right">176.4068</td>
<td align="right">486.1851</td>
<td align="right">-309.7783</td>
<td align="right">-12.92046</td>
<td align="right">164.0661</td>
<td align="right">464.4023</td>
<td align="right">-300.3362</td>
<td align="right">-12.83388</td>
<td align="right">0</td>
<td align="right">-356.967</td>
<td align="right">-262.5896</td>
<td align="right">-346.4263</td>
<td align="right">-254.246</td>
</tr>
</tbody>
</table>
<p>In this case:</p>
<ul>
<li>effect size = 309.778282W or 63.72% representing a still <em>reasonable bang for buck</em> for whatever caused the difference;</li>
<li>95% confidence interval for the test = -356.97 to -262.59 representing <em>much less</em> uncertainty/variation;</li>
<li>effect size = 300.336192W or 64.67% representing a still <em>reasonable bang for buck</em> for whatever caused the difference;</li>
<li>95% confidence interval for the test = -346.43 to -254.25 representing <em>much less</em> uncertainty/variation;</li>
<li>p value of 0 representing a <em>very low</em> risk of a false positive result as it passes all conventional thresholds.</li>
</ul>
<p>re-run T tests 1 person vs 4+</p>
......@@ -1130,20 +1133,20 @@ Figure 5.1: Mean W demand per group for large sample (Error bars = 95% confidenc
</thead>
<tbody>
<tr class="odd">
<td align="right">176.4068</td>
<td align="right">423.5017</td>
<td align="right">-247.0949</td>
<td align="right">-12.1879</td>
<td align="right">164.0661</td>
<td align="right">409.6488</td>
<td align="right">-245.5827</td>
<td align="right">-11.75696</td>
<td align="right">0</td>
<td align="right">-287.0707</td>
<td align="right">-207.1191</td>
<td align="right">-286.7853</td>
<td align="right">-204.3801</td>
</tr>
</tbody>
</table>
<p>In this case:</p>
<ul>
<li>effect size = 247.094924W or 58.35% representing a still <em>reasonable bang for buck</em> for whatever caused the difference;</li>
<li>95% confidence interval for the test = -287.07 to -207.12 representing <em>much less</em> uncertainty/variation;</li>
<li>effect size = 245.5827379W or 59.95% representing a still <em>reasonable bang for buck</em> for whatever caused the difference;</li>
<li>95% confidence interval for the test = -286.79 to -204.38 representing <em>much less</em> uncertainty/variation;</li>
<li>p value of 0 representing a <em>very low</em> risk of a false positive result as it passes all conventional thresholds.</li>
</ul>
</div>
......@@ -1165,7 +1168,7 @@ Figure 5.1: Mean W demand per group for large sample (Error bars = 95% confidenc
</div>
<div id="runtime" class="section level1">
<h1><span class="header-section-number">8</span> Runtime</h1>
<p>Analysis completed in 35.98 seconds ( 0.6 minutes) using <a href="https://cran.r-project.org/package=knitr">knitr</a> in <a href="http://www.rstudio.com">RStudio</a> with R version 3.5.1 (2018-07-02) running on x86_64-apple-darwin15.6.0.</p>
<p>Analysis completed in 43.65 seconds ( 0.73 minutes) using <a href="https://cran.r-project.org/package=knitr">knitr</a> in <a href="http://www.rstudio.com">RStudio</a> with R version 3.5.1 (2018-07-02) running on x86_64-apple-darwin15.6.0.</p>
</div>
<div id="r-environment" class="section level1">
<h1><span class="header-section-number">9</span> R environment</h1>
......
This source diff could not be displayed because it is too large. You can view the blob instead.
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment