Administrator approval is now required for registering new accounts. If you are registering a new account, and are external to the University, please ask the repository owner to contact ServiceLine to request your account be approved. Repository owners must include the newly registered email address, and specific repository in the request for approval.

Commit 94de08e5 authored by Ben Anderson's avatar Ben Anderson
Browse files

fixed data paths for HCS use; added notes re random re-sampling process;...

fixed data paths for HCS use; added notes re random re-sampling process; removed word version as they won't match (due to random re-sampling process)
parent 94dd429b
......@@ -9,10 +9,6 @@ author: '`r paste0(params$authors)` (Contact: b.anderson@soton.ac.uk, `@dataknut
date: 'Last run at: `r Sys.time()`'
always_allow_html: yes
output:
bookdown::word_document2:
fig_caption: yes
toc: yes
toc_depth: 2
bookdown::html_document2:
code_folding: hide
fig_caption: yes
......@@ -21,6 +17,10 @@ output:
toc: yes
toc_depth: 2
toc_float: yes
bookdown::word_document2:
fig_caption: yes
toc: yes
toc_depth: 2
bookdown::pdf_document2:
fig_caption: yes
keep_tex: yes
......@@ -74,12 +74,13 @@ labelProfilePlot <- function(plot){
myParams <- list()
myParams$repoLoc <- dkUtils::findParentDirectory("weGotThePower")
myParams$dPath <- "~/Dropbox/Work/Otago_CfS_Ben/data/nzGREENGrid/"
#myParams$dPath <- "~/Data/NZ_GREENGrid/safe/gridSpy/1min/dataExtracts/"
myParams$dPath <- "/Volumes/hum-csafe/Research Projects/GREEN Grid/" # requires Otago HCS access
heatPumpData <- paste0(myParams$dPath, "cleanData/safe/gridSpy/1min/dataExtracts/Heat Pump_2015-04-01_2016-03-31_observations.csv.gz")
# created from https://dx.doi.org/10.5255/UKDA-SN-853334
# using https://github.com/CfSOtago/GREENGridData/blob/master/examples/code/extractCleanGridSpy1minCircuit.R
heatPumpData <- paste0(myParams$dPath, "dataExtracts/Heat Pump_2015-04-01_2016-03-31_observations.csv.gz")
ggHHData <- paste0(myParams$dPath, "ggHouseholdAttributesSafe.csv")
ggHHData <- paste0(myParams$dPath, "Packaged Data for Sharing Externally/ReShare/reshare_v1.0/ggHouseholdAttributesSafe.csv.zip")
myParams$GGDataDOI <- "https://dx.doi.org/10.5255/UKDA-SN-853334"
plotCaption <- paste0("Source: ", myParams$GGDataDOI)
......@@ -252,7 +253,7 @@ testPower <- 0.8
testMean <- mean(linkedTestDT[season == "Winter"]$meanW)
testSD <- sd(linkedTestDT[season == "Winter"]$meanW)
# use package function
# use package function - for details of what this does see https://github.com/dataknut/weGotThePower/blob/master/R/power.R
meansPowerDT <- weGotThePower::estimateMeanEffectSizes(testMean,testSD,testSamples,testPower) # auto-produces range of p values
```
......@@ -455,7 +456,9 @@ This may be far too wide an error margin for our purposes so we may instead have
Use base GREENGrid and number of people but re-sample slightly.
> NB: we create a small sample roughly 2 * the size of the GREEN Grid data. Due to small numbers and the random re-sampling with replacement process, there will be random fluctuations in the results with each run. As a result the results in this section will probably not match the results in the paper...
> NB 1: we create a small sample roughly 2 * the size of the GREEN Grid data. Due to small number effects and the random re-sampling with replacement process, there will be random fluctuations in the results with each run. As a consequence the results in this section will probably not match the results in the paper...
> NB 2: sometimes the small-n random process doesn't create households of a given type that we then need for analysis. Don't worry, just re-knit until it does :-)
```{r smallNTable}
......@@ -566,7 +569,7 @@ Now:
## Getting it 'right'
> NB: we create a larger sample roughly 40 * the size of the GREEN Grid data. Due to the random re-sampling with replacement process, there will be random fluctuations in the results with each run. As a result the results in this section will probably not exactly match the results in the paper but as the sample is large they should be quite close...
> NB: we create a larger sample roughly 40 * the size of the GREEN Grid data. Due to the random re-sampling with replacement process, there will be random fluctuations in the results with each run. Due to small number effects and the random re-sampling with replacement process, there will be random fluctuations in the results with each run. As a consequence the results in this section will probably not exactly match the results in the paper but as the new sample is large they should be quite close...
```{r creatLargeN}
# fix.
......
......@@ -239,7 +239,7 @@ div.tocify {
<h1 class="title toc-ignore">Statistical Power, Statistical Significance, Study Design and Decision Making: A Worked Example</h1>
<h3 class="subtitle"><em>Sizing Demand Response Trials in New Zealand</em></h3>
<h4 class="author"><em>Ben Anderson, Tom Rushby, Abubakr Bahaj and Patrick James (Contact: <a href="mailto:b.anderson@soton.ac.uk">b.anderson@soton.ac.uk</a>, <code>@dataknut</code>)</em></h4>
<h4 class="date"><em>Last run at: 2019-01-08 16:34:45</em></h4>
<h4 class="date"><em>Last run at: 2019-01-08 16:53:56</em></h4>
</div>
......@@ -449,7 +449,7 @@ div.tocify {
</tbody>
</table>
<p>Number of households in cleaned heatpump data: 28</p>
<pre><code>## Loading: ~/Dropbox/Work/Otago_CfS_Ben/data/nzGREENGrid/ggHouseholdAttributesSafe.csv</code></pre>
<pre><code>## Loading: /Volumes/hum-csafe/Research Projects/GREEN Grid/Packaged Data for Sharing Externally/ReShare/reshare_v1.0/ggHouseholdAttributesSafe.csv.zip</code></pre>
<pre><code>## Parsed with column specification:
## cols(
## .default = col_integer(),
......@@ -471,9 +471,9 @@ div.tocify {
## `Energy Storage` = col_character(),
## `Other Generation Device` = col_character(),
## hasLongSurvey = col_character(),
## StartDate = col_character(),
## Q14_1 = col_double()
## # ... with 12 more columns
## Q14_1 = col_double(),
## Q19_2 = col_character()
## # ... with 11 more columns
## )</code></pre>
<pre><code>## See spec(...) for full column specifications.</code></pre>
<table>
......@@ -927,7 +927,10 @@ Figure 4.2: Power analysis results (power = 0.8)
<h2><span class="header-section-number">5.1</span> Getting it ‘wrong’</h2>
<p>Use base GREENGrid and number of people but re-sample slightly.</p>
<blockquote>
<p>NB: we create a small sample roughly 2 * the size of the GREEN Grid data. Due to small numbers and the random re-sampling with replacement process, there will be random fluctuations in the results with each run. As a result the results in this section will probably not match the results in the paper…</p>
<p>NB 1: we create a small sample roughly 2 * the size of the GREEN Grid data. Due to small number effects and the random re-sampling with replacement process, there will be random fluctuations in the results with each run. As a consequence the results in this section will probably not match the results in the paper…</p>
</blockquote>
<blockquote>
<p>NB 2: sometimes the small-n random process doesn’t create households of a given type that we then need for analysis. Don’t worry, just re-knit until it does :-)</p>
</blockquote>
<table>
<caption><span id="tab:smallNTable">Table 5.1: </span>Number of households and summary statistics per group (winter heat pump use)</caption>
......@@ -942,27 +945,27 @@ Figure 4.2: Power analysis results (power = 0.8)
<tbody>
<tr class="odd">
<td align="left">1</td>
<td align="right">219.9445</td>
<td align="right">174.63250</td>
<td align="right">3</td>
<td align="right">320.7686</td>
<td align="right">0.0000</td>
<td align="right">5</td>
</tr>
<tr class="even">
<td align="left">2</td>
<td align="right">245.5305</td>
<td align="right">29.07944</td>
<td align="right">7</td>
<td align="right">303.8158</td>
<td align="right">82.4674</td>
<td align="right">8</td>
</tr>
<tr class="odd">
<td align="left">3</td>
<td align="right">456.6428</td>
<td align="right">297.16180</td>
<td align="right">14</td>
<td align="right">496.0141</td>
<td align="right">382.4916</td>
<td align="right">10</td>
</tr>
<tr class="even">
<td align="left">4+</td>
<td align="right">337.1304</td>
<td align="right">221.18396</td>
<td align="right">26</td>
<td align="right">443.0247</td>
<td align="right">287.5028</td>
<td align="right">27</td>
</tr>
</tbody>
</table>
......@@ -984,21 +987,21 @@ Figure 4.2: Power analysis results (power = 0.8)
</thead>
<tbody>
<tr class="odd">
<td align="right">219.9445</td>
<td align="right">456.6428</td>
<td align="right">-236.6983</td>
<td align="right">-1.844202</td>
<td align="right">0.1249639</td>
<td align="right">-567.4628</td>
<td align="right">94.06627</td>
<td align="right">320.7686</td>
<td align="right">496.0141</td>
<td align="right">-175.2455</td>
<td align="right">-1.448855</td>
<td align="right">0.1813074</td>
<td align="right">-448.8635</td>
<td align="right">98.37247</td>
</tr>
</tbody>
</table>
<p>The results show that the mean power demand for the control group was 456.64W and for Intervention 1 was 219.94W. This is a (very) large difference in the mean of 236.7. The results of the t test are:</p>
<p>The results show that the mean power demand for the control group was 496.01W and for Intervention 1 was 320.77W. This is a (very) large difference in the mean of 175.25. The results of the t test are:</p>
<ul>
<li>effect size = 237W or 52% representing a <em>substantial bang for buck</em> for whatever caused the difference;</li>
<li>95% confidence interval for the test = -567.46 to 94.07 representing <em>considerable</em> uncertainty/variation;</li>
<li>p value of 0.125 representing a <em>relatively low</em> risk of a false positive result but which (just) fails the conventional p &lt; 0.05 threshold.</li>
<li>effect size = 175W or 35% representing a <em>substantial bang for buck</em> for whatever caused the difference;</li>
<li>95% confidence interval for the test = -448.86 to 98.37 representing <em>considerable</em> uncertainty/variation;</li>
<li>p value of 0.181 representing a <em>relatively low</em> risk of a false positive result but which (just) fails the conventional p &lt; 0.05 threshold.</li>
</ul>
<p>T test 1 &lt;-&gt; 4+</p>
<table>
......@@ -1016,27 +1019,27 @@ Figure 4.2: Power analysis results (power = 0.8)
</thead>
<tbody>
<tr class="odd">
<td align="right">219.9445</td>
<td align="right">337.1304</td>
<td align="right">-117.1859</td>
<td align="right">-1.067661</td>
<td align="right">0.3689357</td>
<td align="right">-480.9376</td>
<td align="right">246.5658</td>
<td align="right">320.7686</td>
<td align="right">443.0247</td>
<td align="right">-122.2561</td>
<td align="right">-2.209582</td>
<td align="right">0.0361451</td>
<td align="right">-235.9884</td>
<td align="right">-8.523724</td>
</tr>
</tbody>
</table>
<p>Now:</p>
<ul>
<li>effect size = 117W or 34.76% representing a still <em>reasonable bang for buck</em> for whatever caused the difference;</li>
<li>95% confidence interval for the test = -480.94 to 246.57 representing <em>even greater</em> uncertainty/variation;</li>
<li>p value of 0.369 representing a <em>higher</em> risk of a false positive result which fails the conventional p &lt; 0.05 threshold and also the less conservative p &lt; 0.1.</li>
<li>effect size = 122W or 27.6% representing a still <em>reasonable bang for buck</em> for whatever caused the difference;</li>
<li>95% confidence interval for the test = -235.99 to -8.52 representing <em>even greater</em> uncertainty/variation;</li>
<li>p value of 0.036 representing a <em>higher</em> risk of a false positive result which fails the conventional p &lt; 0.05 threshold and also the less conservative p &lt; 0.1.</li>
</ul>
</div>
<div id="getting-it-right" class="section level2">
<h2><span class="header-section-number">5.2</span> Getting it ‘right’</h2>
<blockquote>
<p>NB: we create a larger sample roughly 40 * the size of the GREEN Grid data. Due to the random re-sampling with replacement process, there will be random fluctuations in the results with each run. As a result the results in this section will probably not exactly match the results in the paper but as the sample is large they should be quite close…</p>
<p>NB: we create a larger sample roughly 40 * the size of the GREEN Grid data. Due to the random re-sampling with replacement process, there will be random fluctuations in the results with each run. Due to small number effects and the random re-sampling with replacement process, there will be random fluctuations in the results with each run. As a consequence the results in this section will probably not exactly match the results in the paper but as the new sample is large they should be quite close…</p>
</blockquote>
<table>
<caption><span id="tab:creatLargeN">Table 5.4: </span>Number of households and summary statistics per group</caption>
......@@ -1051,27 +1054,27 @@ Figure 4.2: Power analysis results (power = 0.8)
<tbody>
<tr class="odd">
<td align="left">1</td>
<td align="right">176.4068</td>
<td align="right">151.94566</td>
<td align="right">88</td>
<td align="right">164.0661</td>
<td align="right">152.05614</td>
<td align="right">83</td>
</tr>
<tr class="even">
<td align="left">2</td>
<td align="right">276.2114</td>
<td align="right">59.33808</td>
<td align="right">156</td>
<td align="right">276.8820</td>
<td align="right">59.47509</td>
<td align="right">180</td>
</tr>
<tr class="odd">
<td align="left">3</td>
<td align="right">486.1851</td>
<td align="right">296.32261</td>
<td align="right">281</td>
<td align="right">464.4023</td>
<td align="right">276.92468</td>
<td align="right">285</td>
</tr>
<tr class="even">
<td align="left">4+</td>
<td align="right">423.5017</td>
<td align="right">265.73994</td>
<td align="right">475</td>
<td align="right">409.6488</td>
<td align="right">267.02900</td>
<td align="right">452</td>
</tr>
</tbody>
</table>
......@@ -1098,20 +1101,20 @@ Figure 5.1: Mean W demand per group for large sample (Error bars = 95% confidenc
</thead>
<tbody>
<tr class="odd">
<td align="right">176.4068</td>
<td align="right">486.1851</td>
<td align="right">-309.7783</td>
<td align="right">-12.92046</td>
<td align="right">164.0661</td>
<td align="right">464.4023</td>
<td align="right">-300.3362</td>
<td align="right">-12.83388</td>
<td align="right">0</td>
<td align="right">-356.967</td>
<td align="right">-262.5896</td>
<td align="right">-346.4263</td>
<td align="right">-254.246</td>
</tr>
</tbody>
</table>
<p>In this case:</p>
<ul>
<li>effect size = 309.778282W or 63.72% representing a still <em>reasonable bang for buck</em> for whatever caused the difference;</li>
<li>95% confidence interval for the test = -356.97 to -262.59 representing <em>much less</em> uncertainty/variation;</li>
<li>effect size = 300.336192W or 64.67% representing a still <em>reasonable bang for buck</em> for whatever caused the difference;</li>
<li>95% confidence interval for the test = -346.43 to -254.25 representing <em>much less</em> uncertainty/variation;</li>
<li>p value of 0 representing a <em>very low</em> risk of a false positive result as it passes all conventional thresholds.</li>
</ul>
<p>re-run T tests 1 person vs 4+</p>
......@@ -1130,20 +1133,20 @@ Figure 5.1: Mean W demand per group for large sample (Error bars = 95% confidenc
</thead>
<tbody>
<tr class="odd">
<td align="right">176.4068</td>
<td align="right">423.5017</td>
<td align="right">-247.0949</td>
<td align="right">-12.1879</td>
<td align="right">164.0661</td>
<td align="right">409.6488</td>
<td align="right">-245.5827</td>
<td align="right">-11.75696</td>
<td align="right">0</td>
<td align="right">-287.0707</td>
<td align="right">-207.1191</td>
<td align="right">-286.7853</td>
<td align="right">-204.3801</td>
</tr>
</tbody>
</table>
<p>In this case:</p>
<ul>
<li>effect size = 247.094924W or 58.35% representing a still <em>reasonable bang for buck</em> for whatever caused the difference;</li>
<li>95% confidence interval for the test = -287.07 to -207.12 representing <em>much less</em> uncertainty/variation;</li>
<li>effect size = 245.5827379W or 59.95% representing a still <em>reasonable bang for buck</em> for whatever caused the difference;</li>
<li>95% confidence interval for the test = -286.79 to -204.38 representing <em>much less</em> uncertainty/variation;</li>
<li>p value of 0 representing a <em>very low</em> risk of a false positive result as it passes all conventional thresholds.</li>
</ul>
</div>
......@@ -1165,7 +1168,7 @@ Figure 5.1: Mean W demand per group for large sample (Error bars = 95% confidenc
</div>
<div id="runtime" class="section level1">
<h1><span class="header-section-number">8</span> Runtime</h1>
<p>Analysis completed in 35.98 seconds ( 0.6 minutes) using <a href="https://cran.r-project.org/package=knitr">knitr</a> in <a href="http://www.rstudio.com">RStudio</a> with R version 3.5.1 (2018-07-02) running on x86_64-apple-darwin15.6.0.</p>
<p>Analysis completed in 43.65 seconds ( 0.73 minutes) using <a href="https://cran.r-project.org/package=knitr">knitr</a> in <a href="http://www.rstudio.com">RStudio</a> with R version 3.5.1 (2018-07-02) running on x86_64-apple-darwin15.6.0.</p>
</div>
<div id="r-environment" class="section level1">
<h1><span class="header-section-number">9</span> R environment</h1>
......
This source diff could not be displayed because it is too large. You can view the blob instead.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment