diff --git a/Rmd/cleaningFeederData.Rmd b/Rmd/cleaningFeederData.Rmd index 7a0f7ad6fb7e50fac17b8ef9ead186f94bf76e66..aa31ad5ec002c798e509b654822e30c77169eca0 100644 --- a/Rmd/cleaningFeederData.Rmd +++ b/Rmd/cleaningFeederData.Rmd @@ -9,25 +9,24 @@ author: '`r params$authors`' date: 'Last run at: `r Sys.time()`' output: bookdown::html_document2: - code_folding: hide + self_contained: TRUE fig_caption: yes + code_folding: hide number_sections: yes - self_contained: yes - toc: yes - toc_depth: 3 - toc_float: yes - bookdown::word_document2: - fig_caption: yes toc: yes toc_depth: 2 + toc_float: TRUE bookdown::pdf_document2: fig_caption: yes - keep_tex: yes + number_sections: yes + bookdown::word_document2: + fig_caption: yes number_sections: yes toc: yes toc_depth: 2 + fig_width: 5 always_allow_html: yes -bibliography: '`r path.expand("~/bibliography.bib")`' +bibliography: '`r paste0(here::here(), "/bibliography.bib")`' --- ```{r setup} @@ -73,8 +72,12 @@ Loaded data from `r dFile`... (using drake) ```{r loadData} origDataDT <- drake::readd(origData) # readd the drake object -head(origDataDT) + uniqDataDT <- drake::readd(uniqData) # readd the drake object + +kableExtra::kable(head(origDataDT), digits = 2, + caption = "Counts per feeder (long table)") %>% + kable_styling() ``` Check data prep worked OK. @@ -82,8 +85,8 @@ Check data prep worked OK. ```{r dataPrep} # check t <- origDataDT[, .(nObs = .N, - firstDate = min(rDateTime), - lastDate = max(rDateTime), + firstDate = min(rDateTime, na.rm = TRUE), + lastDate = max(rDateTime, na.rm = TRUE), meankW = mean(kW, na.rm = TRUE) ), keyby = .(region, feeder_ID)] @@ -104,7 +107,7 @@ message("So we have ", tidyNum(nrow(origDataDT) - nrow(uniqDataDT)), " duplicate pc <- 100*((nrow(origDataDT) - nrow(uniqDataDT))/nrow(origDataDT)) message("That's ", round(pc,2), "%") -feederDT <- uniqDataDT # use dt with no duplicates +feederDT <- uniqDataDT[!is.na(rDateTime)] # use dt with no duplicates origDataDT <- NULL # save memory ``` @@ -114,7 +117,7 @@ So we remove the duplicates... Try aggregated demand profiles of mean kW by season and feeder and day of the week... Remove the legend so we can see the plot. -```{r kwProfiles} +```{r kwProfiles, fig.width=8} plotDT <- feederDT[, .(meankW = mean(kW), nObs = .N), keyby = .(rTime, season, feeder_ID, rDoW)] @@ -131,7 +134,7 @@ Is that what we expect? Number of observations per feeder per day - gaps will be visible (totally missing days) as will low counts (partially missing days) - we would expect 24 * 4... Convert this to a % of expected... -```{r basicCountTile, fig.height=10} +```{r basicCountTile, fig.height=10, fig.width=8} plotDT <- feederDT[, .(nObs = .N), keyby = .(rDate, feeder_ID)] plotDT[, propExpected := nObs/(24*4)] @@ -148,7 +151,7 @@ This is not good. There are both gaps (missing days) and partial days. **Lots** What does it look like if we aggregate across all feeders by time? There are `r uniqueN(feederDT$feeder_ID)` feeders so we should get this many at best How close do we get? -```{r aggVisN} +```{r aggVisN, fig.width=8} plotDT <- feederDT[, .(nObs = .N, meankW = mean(kW)), keyby = .(rTime, rDate, season)] @@ -167,7 +170,7 @@ That really doesn't look too good. There are some very odd fluctuations in there What do the mean kw patterns look like per feeder per day? -```{r basickWTile, fig.height=10} +```{r basickWTile, fig.height=10, fig.width=8} plotDT <- feederDT[, .(meankW = mean(kW, na.rm = TRUE)), keyby = .(rDate, feeder_ID)] ggplot2::ggplot(plotDT, aes(x = rDate, y = feeder_ID, fill = meankW)) + @@ -183,7 +186,7 @@ Missing data is even more clearly visible. What about mean kw across all feeders? -```{r aggViskW} +```{r aggViskW, fig.width=8} plotDT <- feederDT[, .(nObs = .N, meankW = mean(kW)), keyby = .(rTime, rDate, season)] @@ -213,7 +216,7 @@ summary(dateTimesDT) Let's see how many unique feeders we have per dateTime. Surely we have at least one sending data each half-hour? -```{r tileFeeders} +```{r tileFeeders, fig.width=8} ggplot2::ggplot(dateTimesDT, aes(x = rDate, y = rTime, fill = nFeeders)) + geom_tile() + scale_fill_viridis_c() + @@ -224,7 +227,7 @@ No. As we suspected from the previous plots, we clearly have some dateTimes wher Are there time of day patterns? It looks like it... -```{r missingProfiles} +```{r missingProfiles, fig.width=8} dateTimesDT[, rYear := lubridate::year(rDateTime)] plotDT <- dateTimesDT[, .(meanN = mean(nFeeders), meankW = mean(meankW)), keyby = .(rTime, season, rYear)] @@ -240,7 +243,7 @@ Oh yes. After 2003. Why? What about the kW? -```{r kWProfiles} +```{r kWProfiles, fig.width=8} ggplot2::ggplot(plotDT, aes(y = meankW, x = rTime, colour = season)) + geom_line() + @@ -251,7 +254,7 @@ ggplot2::ggplot(plotDT, aes(y = meankW, x = rTime, colour = season)) + Those look as we'd expect. But do we see a correlation between the number of observations per hour and the mean kW after 2003? There is a suspicion that as mean kw goes up so do the number of observations per hour... although this could just be a correlation with low demand periods (night time?) -```{r compareProfiles} +```{r compareProfiles, fig.width=8} ggplot2::ggplot(plotDT, aes(y = meankW, x = meanN, colour = season)) + geom_point() + facet_wrap(rYear ~ .) + @@ -278,7 +281,7 @@ The wide dataset has a count of NAs per row (dateTime) from which we infer how m ```{r} wDT <- drake::readd(wideData) # back from the drake -head(wDT) +names(wDT) ``` If we take the mean of the number of feeders reporting per day (date) then a value of 25 will indicate a day when _all_ feeders have _all_ data (since it would be the mean of all the '25's). @@ -307,13 +310,13 @@ nrow(aggDT[propExpected == 1]) If we plot the mean then we will see which days get closest to having a full dataset. -```{r bestDaysMean} +```{r bestDaysMean, fig.width=8} ggplot2::ggplot(aggDT, aes(x = rDate, colour = season, y = meanOK)) + geom_point() ``` Re-plot by the % of expected if we assume we _should_ have 25 feeders * 24 hours * 4 per hour (will be the same shape): -```{r bestDaysProp} +```{r bestDaysProp, fig.width=8} ggplot2::ggplot(aggDT, aes(x = rDate, colour = season, y = 100*propExpected)) + geom_point() + labs(y = "%") ``` diff --git a/Rmd/cleaningFeederData_allData.log b/Rmd/cleaningFeederData_allData.log new file mode 100644 index 0000000000000000000000000000000000000000..8b7564885cbe3188d26fd17986f41690f91ac17e --- /dev/null +++ b/Rmd/cleaningFeederData_allData.log @@ -0,0 +1,540 @@ +This is pdfTeX, Version 3.1415926-2.5-1.40.14 (TeX Live 2013) (format=pdflatex 2020.4.15) 8 JUL 2020 22:59 +entering extended mode + restricted \write18 enabled. + %&-line parsing enabled. +**/home/ba1e12/git.Soton/ba1e12/datacleaning/docs/cleaningFeederData_allData.te +x + +(/home/ba1e12/git.Soton/ba1e12/datacleaning/docs/cleaningFeederData_allData.tex +LaTeX2e <2011/06/27> +Babel <v3.8m> and hyphenation patterns for english, dumylang, nohyphenation, lo +aded. +(/usr/share/texlive/texmf-dist/tex/latex/base/article.cls +Document Class: article 2007/10/19 v1.4h Standard LaTeX document class +(/usr/share/texlive/texmf-dist/tex/latex/base/size10.clo +File: size10.clo 2007/10/19 v1.4h Standard LaTeX file (size option) +) +\c@part=\count79 +\c@section=\count80 +\c@subsection=\count81 +\c@subsubsection=\count82 +\c@paragraph=\count83 +\c@subparagraph=\count84 +\c@figure=\count85 +\c@table=\count86 +\abovecaptionskip=\skip41 +\belowcaptionskip=\skip42 +\bibindent=\dimen102 +) (/usr/share/texlive/texmf-dist/tex/latex/lm/lmodern.sty +Package: lmodern 2009/10/30 v1.6 Latin Modern Fonts +LaTeX Font Info: Overwriting symbol font `operators' in version `normal' +(Font) OT1/cmr/m/n --> OT1/lmr/m/n on input line 22. +LaTeX Font Info: Overwriting symbol font `letters' in version `normal' +(Font) OML/cmm/m/it --> OML/lmm/m/it on input line 23. +LaTeX Font Info: Overwriting symbol font `symbols' in version `normal' +(Font) OMS/cmsy/m/n --> OMS/lmsy/m/n on input line 24. +LaTeX Font Info: Overwriting symbol font `largesymbols' in version `normal' +(Font) OMX/cmex/m/n --> OMX/lmex/m/n on input line 25. +LaTeX Font Info: Overwriting symbol font `operators' in version `bold' +(Font) OT1/cmr/bx/n --> OT1/lmr/bx/n on input line 26. +LaTeX Font Info: Overwriting symbol font `letters' in version `bold' +(Font) OML/cmm/b/it --> OML/lmm/b/it on input line 27. +LaTeX Font Info: Overwriting symbol font `symbols' in version `bold' +(Font) OMS/cmsy/b/n --> OMS/lmsy/b/n on input line 28. +LaTeX Font Info: Overwriting symbol font `largesymbols' in version `bold' +(Font) OMX/cmex/m/n --> OMX/lmex/m/n on input line 29. +LaTeX Font Info: Overwriting math alphabet `\mathbf' in version `normal' +(Font) OT1/cmr/bx/n --> OT1/lmr/bx/n on input line 31. +LaTeX Font Info: Overwriting math alphabet `\mathsf' in version `normal' +(Font) OT1/cmss/m/n --> OT1/lmss/m/n on input line 32. +LaTeX Font Info: Overwriting math alphabet `\mathit' in version `normal' +(Font) OT1/cmr/m/it --> OT1/lmr/m/it on input line 33. +LaTeX Font Info: Overwriting math alphabet `\mathtt' in version `normal' +(Font) OT1/cmtt/m/n --> OT1/lmtt/m/n on input line 34. +LaTeX Font Info: Overwriting math alphabet `\mathbf' in version `bold' +(Font) OT1/cmr/bx/n --> OT1/lmr/bx/n on input line 35. +LaTeX Font Info: Overwriting math alphabet `\mathsf' in version `bold' +(Font) OT1/cmss/bx/n --> OT1/lmss/bx/n on input line 36. +LaTeX Font Info: Overwriting math alphabet `\mathit' in version `bold' +(Font) OT1/cmr/bx/it --> OT1/lmr/bx/it on input line 37. +LaTeX Font Info: Overwriting math alphabet `\mathtt' in version `bold' +(Font) OT1/cmtt/m/n --> OT1/lmtt/m/n on input line 38. +) (/usr/share/texlive/texmf-dist/tex/latex/amsfonts/amssymb.sty +Package: amssymb 2013/01/14 v3.01 AMS font symbols +(/usr/share/texlive/texmf-dist/tex/latex/amsfonts/amsfonts.sty +Package: amsfonts 2013/01/14 v3.01 Basic AMSFonts support +\@emptytoks=\toks14 +\symAMSa=\mathgroup4 +\symAMSb=\mathgroup5 +LaTeX Font Info: Overwriting math alphabet `\mathfrak' in version `bold' +(Font) U/euf/m/n --> U/euf/b/n on input line 106. +)) (/usr/share/texlive/texmf-dist/tex/latex/amsmath/amsmath.sty +Package: amsmath 2013/01/14 v2.14 AMS math features +\@mathmargin=\skip43 +For additional information on amsmath, use the `?' option. +(/usr/share/texlive/texmf-dist/tex/latex/amsmath/amstext.sty +Package: amstext 2000/06/29 v2.01 +(/usr/share/texlive/texmf-dist/tex/latex/amsmath/amsgen.sty +File: amsgen.sty 1999/11/30 v2.0 +\@emptytoks=\toks15 +\ex@=\dimen103 +)) (/usr/share/texlive/texmf-dist/tex/latex/amsmath/amsbsy.sty +Package: amsbsy 1999/11/29 v1.2d +\pmbraise@=\dimen104 +) (/usr/share/texlive/texmf-dist/tex/latex/amsmath/amsopn.sty +Package: amsopn 1999/12/14 v2.01 operator names +) +\inf@bad=\count87 +LaTeX Info: Redefining \frac on input line 210. +\uproot@=\count88 +\leftroot@=\count89 +LaTeX Info: Redefining \overline on input line 306. +\classnum@=\count90 +\DOTSCASE@=\count91 +LaTeX Info: Redefining \ldots on input line 378. +LaTeX Info: Redefining \dots on input line 381. +LaTeX Info: Redefining \cdots on input line 466. +\Mathstrutbox@=\box26 +\strutbox@=\box27 +\big@size=\dimen105 +LaTeX Font Info: Redeclaring font encoding OML on input line 566. +LaTeX Font Info: Redeclaring font encoding OMS on input line 567. +\macc@depth=\count92 +\c@MaxMatrixCols=\count93 +\dotsspace@=\muskip10 +\c@parentequation=\count94 +\dspbrk@lvl=\count95 +\tag@help=\toks16 +\row@=\count96 +\column@=\count97 +\maxfields@=\count98 +\andhelp@=\toks17 +\eqnshift@=\dimen106 +\alignsep@=\dimen107 +\tagshift@=\dimen108 +\tagwidth@=\dimen109 +\totwidth@=\dimen110 +\lineht@=\dimen111 +\@envbody=\toks18 +\multlinegap=\skip44 +\multlinetaggap=\skip45 +\mathdisplay@stack=\toks19 +LaTeX Info: Redefining \[ on input line 2665. +LaTeX Info: Redefining \] on input line 2666. +) (/usr/share/texlive/texmf-dist/tex/generic/ifxetex/ifxetex.sty +Package: ifxetex 2010/09/12 v0.6 Provides ifxetex conditional +) (/usr/share/texlive/texmf-dist/tex/generic/oberdiek/ifluatex.sty +Package: ifluatex 2010/03/01 v1.3 Provides the ifluatex switch (HO) +Package ifluatex Info: LuaTeX not detected. +) (/usr/share/texlive/texmf-dist/tex/latex/base/fixltx2e.sty +Package: fixltx2e 2006/09/13 v1.1m fixes to LaTeX +LaTeX Info: Redefining \em on input line 420. +LaTeX Info: The control sequence `\[' is already robust on input line 471. +LaTeX Info: The control sequence `\]' is already robust on input line 472. +) (/usr/share/texlive/texmf-dist/tex/latex/base/fontenc.sty +Package: fontenc 2005/09/27 v1.99g Standard LaTeX package +(/usr/share/texlive/texmf-dist/tex/latex/base/t1enc.def +File: t1enc.def 2005/09/27 v1.99g Standard LaTeX file +LaTeX Font Info: Redeclaring font encoding T1 on input line 43. +)) (/usr/share/texlive/texmf-dist/tex/latex/base/inputenc.sty +Package: inputenc 2008/03/30 v1.1d Input encoding file +\inpenc@prehook=\toks20 +\inpenc@posthook=\toks21 +(/usr/share/texlive/texmf-dist/tex/latex/base/utf8.def +File: utf8.def 2008/04/05 v1.1m UTF-8 support for inputenc +Now handling font encoding OML ... +... no UTF-8 mapping file for font encoding OML +Now handling font encoding T1 ... +... processing UTF-8 mapping file for font encoding T1 +(/usr/share/texlive/texmf-dist/tex/latex/base/t1enc.dfu +File: t1enc.dfu 2008/04/05 v1.1m UTF-8 support for inputenc + defining Unicode char U+00A1 (decimal 161) + defining Unicode char U+00A3 (decimal 163) + defining Unicode char U+00AB (decimal 171) + defining Unicode char U+00BB (decimal 187) + defining Unicode char U+00BF (decimal 191) + defining Unicode char U+00C0 (decimal 192) + defining Unicode char U+00C1 (decimal 193) + defining Unicode char U+00C2 (decimal 194) + defining Unicode char U+00C3 (decimal 195) + defining Unicode char U+00C4 (decimal 196) + defining Unicode char U+00C5 (decimal 197) + defining Unicode char U+00C6 (decimal 198) + defining Unicode char U+00C7 (decimal 199) + defining Unicode char U+00C8 (decimal 200) + defining Unicode char U+00C9 (decimal 201) + defining Unicode char U+00CA (decimal 202) + defining Unicode char U+00CB (decimal 203) + defining Unicode char U+00CC (decimal 204) + defining Unicode char U+00CD (decimal 205) + defining Unicode char U+00CE (decimal 206) + defining Unicode char U+00CF (decimal 207) + defining Unicode char U+00D0 (decimal 208) + defining Unicode char U+00D1 (decimal 209) + defining Unicode char U+00D2 (decimal 210) + defining Unicode char U+00D3 (decimal 211) + defining Unicode char U+00D4 (decimal 212) + defining Unicode char U+00D5 (decimal 213) + defining Unicode char U+00D6 (decimal 214) + defining Unicode char U+00D8 (decimal 216) + defining Unicode char U+00D9 (decimal 217) + defining Unicode char U+00DA (decimal 218) + defining Unicode char U+00DB (decimal 219) + defining Unicode char U+00DC (decimal 220) + defining Unicode char U+00DD (decimal 221) + defining Unicode char U+00DE (decimal 222) + defining Unicode char U+00DF (decimal 223) + defining Unicode char U+00E0 (decimal 224) + defining Unicode char U+00E1 (decimal 225) + defining Unicode char U+00E2 (decimal 226) + defining Unicode char U+00E3 (decimal 227) + defining Unicode char U+00E4 (decimal 228) + defining Unicode char U+00E5 (decimal 229) + defining Unicode char U+00E6 (decimal 230) + defining Unicode char U+00E7 (decimal 231) + defining Unicode char U+00E8 (decimal 232) + defining Unicode char U+00E9 (decimal 233) + defining Unicode char U+00EA (decimal 234) + defining Unicode char U+00EB (decimal 235) + defining Unicode char U+00EC (decimal 236) + defining Unicode char U+00ED (decimal 237) + defining Unicode char U+00EE (decimal 238) + defining Unicode char U+00EF (decimal 239) + defining Unicode char U+00F0 (decimal 240) + defining Unicode char U+00F1 (decimal 241) + defining Unicode char U+00F2 (decimal 242) + defining Unicode char U+00F3 (decimal 243) + defining Unicode char U+00F4 (decimal 244) + defining Unicode char U+00F5 (decimal 245) + defining Unicode char U+00F6 (decimal 246) + defining Unicode char U+00F8 (decimal 248) + defining Unicode char U+00F9 (decimal 249) + defining Unicode char U+00FA (decimal 250) + defining Unicode char U+00FB (decimal 251) + defining Unicode char U+00FC (decimal 252) + defining Unicode char U+00FD (decimal 253) + defining Unicode char U+00FE (decimal 254) + defining Unicode char U+00FF (decimal 255) + defining Unicode char U+0102 (decimal 258) + defining Unicode char U+0103 (decimal 259) + defining Unicode char U+0104 (decimal 260) + defining Unicode char U+0105 (decimal 261) + defining Unicode char U+0106 (decimal 262) + defining Unicode char U+0107 (decimal 263) + defining Unicode char U+010C (decimal 268) + defining Unicode char U+010D (decimal 269) + defining Unicode char U+010E (decimal 270) + defining Unicode char U+010F (decimal 271) + defining Unicode char U+0110 (decimal 272) + defining Unicode char U+0111 (decimal 273) + defining Unicode char U+0118 (decimal 280) + defining Unicode char U+0119 (decimal 281) + defining Unicode char U+011A (decimal 282) + defining Unicode char U+011B (decimal 283) + defining Unicode char U+011E (decimal 286) + defining Unicode char U+011F (decimal 287) + defining Unicode char U+0130 (decimal 304) + defining Unicode char U+0131 (decimal 305) + defining Unicode char U+0132 (decimal 306) + defining Unicode char U+0133 (decimal 307) + defining Unicode char U+0139 (decimal 313) + defining Unicode char U+013A (decimal 314) + defining Unicode char U+013D (decimal 317) + defining Unicode char U+013E (decimal 318) + defining Unicode char U+0141 (decimal 321) + defining Unicode char U+0142 (decimal 322) + defining Unicode char U+0143 (decimal 323) + defining Unicode char U+0144 (decimal 324) + defining Unicode char U+0147 (decimal 327) + defining Unicode char U+0148 (decimal 328) + defining Unicode char U+014A (decimal 330) + defining Unicode char U+014B (decimal 331) + defining Unicode char U+0150 (decimal 336) + defining Unicode char U+0151 (decimal 337) + defining Unicode char U+0152 (decimal 338) + defining Unicode char U+0153 (decimal 339) + defining Unicode char U+0154 (decimal 340) + defining Unicode char U+0155 (decimal 341) + defining Unicode char U+0158 (decimal 344) + defining Unicode char U+0159 (decimal 345) + defining Unicode char U+015A (decimal 346) + defining Unicode char U+015B (decimal 347) + defining Unicode char U+015E (decimal 350) + defining Unicode char U+015F (decimal 351) + defining Unicode char U+0160 (decimal 352) + defining Unicode char U+0161 (decimal 353) + defining Unicode char U+0162 (decimal 354) + defining Unicode char U+0163 (decimal 355) + defining Unicode char U+0164 (decimal 356) + defining Unicode char U+0165 (decimal 357) + defining Unicode char U+016E (decimal 366) + defining Unicode char U+016F (decimal 367) + defining Unicode char U+0170 (decimal 368) + defining Unicode char U+0171 (decimal 369) + defining Unicode char U+0178 (decimal 376) + defining Unicode char U+0179 (decimal 377) + defining Unicode char U+017A (decimal 378) + defining Unicode char U+017B (decimal 379) + defining Unicode char U+017C (decimal 380) + defining Unicode char U+017D (decimal 381) + defining Unicode char U+017E (decimal 382) + defining Unicode char U+200C (decimal 8204) + defining Unicode char U+2013 (decimal 8211) + defining Unicode char U+2014 (decimal 8212) + defining Unicode char U+2018 (decimal 8216) + defining Unicode char U+2019 (decimal 8217) + defining Unicode char U+201A (decimal 8218) + defining Unicode char U+201C (decimal 8220) + defining Unicode char U+201D (decimal 8221) + defining Unicode char U+201E (decimal 8222) + defining Unicode char U+2030 (decimal 8240) + defining Unicode char U+2031 (decimal 8241) + defining Unicode char U+2039 (decimal 8249) + defining Unicode char U+203A (decimal 8250) + defining Unicode char U+2423 (decimal 9251) +) +Now handling font encoding OT1 ... +... processing UTF-8 mapping file for font encoding OT1 +(/usr/share/texlive/texmf-dist/tex/latex/base/ot1enc.dfu +File: ot1enc.dfu 2008/04/05 v1.1m UTF-8 support for inputenc + defining Unicode char U+00A1 (decimal 161) + defining Unicode char U+00A3 (decimal 163) + defining Unicode char U+00B8 (decimal 184) + defining Unicode char U+00BF (decimal 191) + defining Unicode char U+00C5 (decimal 197) + defining Unicode char U+00C6 (decimal 198) + defining Unicode char U+00D8 (decimal 216) + defining Unicode char U+00DF (decimal 223) + defining Unicode char U+00E6 (decimal 230) + defining Unicode char U+00EC (decimal 236) + defining Unicode char U+00ED (decimal 237) + defining Unicode char U+00EE (decimal 238) + defining Unicode char U+00EF (decimal 239) + defining Unicode char U+00F8 (decimal 248) + defining Unicode char U+0131 (decimal 305) + defining Unicode char U+0141 (decimal 321) + defining Unicode char U+0142 (decimal 322) + defining Unicode char U+0152 (decimal 338) + defining Unicode char U+0153 (decimal 339) + defining Unicode char U+2013 (decimal 8211) + defining Unicode char U+2014 (decimal 8212) + defining Unicode char U+2018 (decimal 8216) + defining Unicode char U+2019 (decimal 8217) + defining Unicode char U+201C (decimal 8220) + defining Unicode char U+201D (decimal 8221) +) +Now handling font encoding OMS ... +... processing UTF-8 mapping file for font encoding OMS +(/usr/share/texlive/texmf-dist/tex/latex/base/omsenc.dfu +File: omsenc.dfu 2008/04/05 v1.1m UTF-8 support for inputenc + defining Unicode char U+00A7 (decimal 167) + defining Unicode char U+00B6 (decimal 182) + defining Unicode char U+00B7 (decimal 183) + defining Unicode char U+2020 (decimal 8224) + defining Unicode char U+2021 (decimal 8225) + defining Unicode char U+2022 (decimal 8226) +) +Now handling font encoding OMX ... +... no UTF-8 mapping file for font encoding OMX +Now handling font encoding U ... +... no UTF-8 mapping file for font encoding U + defining Unicode char U+00A9 (decimal 169) + defining Unicode char U+00AA (decimal 170) + defining Unicode char U+00AE (decimal 174) + defining Unicode char U+00BA (decimal 186) + defining Unicode char U+02C6 (decimal 710) + defining Unicode char U+02DC (decimal 732) + defining Unicode char U+200C (decimal 8204) + defining Unicode char U+2026 (decimal 8230) + defining Unicode char U+2122 (decimal 8482) + defining Unicode char U+2423 (decimal 9251) +)) (/usr/share/texlive/texmf-dist/tex/latex/microtype/microtype.sty +Package: microtype 2013/03/13 v2.5 Micro-typographical refinements (RS) +(/usr/share/texlive/texmf-dist/tex/latex/graphics/keyval.sty +Package: keyval 1999/03/16 v1.13 key=value parser (DPC) +\KV@toks@=\toks22 +) +\MT@toks=\toks23 +\MT@count=\count99 +LaTeX Info: Redefining \textls on input line 771. +\MT@outer@kern=\dimen112 +LaTeX Info: Redefining \textmicrotypecontext on input line 1290. +\MT@listname@count=\count100 +(/usr/share/texlive/texmf-dist/tex/latex/microtype/microtype-pdftex.def +File: microtype-pdftex.def 2013/03/13 v2.5 Definitions specific to pdftex (RS) +LaTeX Info: Redefining \lsstyle on input line 889. +LaTeX Info: Redefining \lslig on input line 889. +\MT@outer@space=\skip46 +) +Package microtype Info: Loading configuration file microtype.cfg. +(/usr/share/texlive/texmf-dist/tex/latex/microtype/microtype.cfg +File: microtype.cfg 2013/03/13 v2.5 microtype main configuration file (RS) +)) (/usr/share/texlive/texmf-dist/tex/latex/hyperref/hyperref.sty +Package: hyperref 2012/11/06 v6.83m Hypertext links for LaTeX +(/usr/share/texlive/texmf-dist/tex/generic/oberdiek/hobsub-hyperref.sty +Package: hobsub-hyperref 2012/05/28 v1.13 Bundle oberdiek, subset hyperref (HO) + +(/usr/share/texlive/texmf-dist/tex/generic/oberdiek/hobsub-generic.sty +Package: hobsub-generic 2012/05/28 v1.13 Bundle oberdiek, subset generic (HO) +Package: hobsub 2012/05/28 v1.13 Construct package bundles (HO) +Package: infwarerr 2010/04/08 v1.3 Providing info/warning/error messages (HO) +Package: ltxcmds 2011/11/09 v1.22 LaTeX kernel commands for general use (HO) +Package hobsub Info: Skipping package `ifluatex' (already loaded). +Package: ifvtex 2010/03/01 v1.5 Detect VTeX and its facilities (HO) +Package ifvtex Info: VTeX not detected. +Package: intcalc 2007/09/27 v1.1 Expandable calculations with integers (HO) +Package: ifpdf 2011/01/30 v2.3 Provides the ifpdf switch (HO) +Package ifpdf Info: pdfTeX in PDF mode is detected. +Package: etexcmds 2011/02/16 v1.5 Avoid name clashes with e-TeX commands (HO) +Package etexcmds Info: Could not find \expanded. +(etexcmds) That can mean that you are not using pdfTeX 1.50 or +(etexcmds) that some package has redefined \expanded. +(etexcmds) In the latter case, load this package earlier. +Package: kvsetkeys 2012/04/25 v1.16 Key value parser (HO) +Package: kvdefinekeys 2011/04/07 v1.3 Define keys (HO) +Package: pdftexcmds 2011/11/29 v0.20 Utility functions of pdfTeX for LuaTeX (HO +) +Package pdftexcmds Info: LuaTeX not detected. +Package pdftexcmds Info: \pdf@primitive is available. +Package pdftexcmds Info: \pdf@ifprimitive is available. +Package pdftexcmds Info: \pdfdraftmode found. +Package: pdfescape 2011/11/25 v1.13 Implements pdfTeX's escape features (HO) +Package: bigintcalc 2012/04/08 v1.3 Expandable calculations on big integers (HO +) +Package: bitset 2011/01/30 v1.1 Handle bit-vector datatype (HO) +Package: uniquecounter 2011/01/30 v1.2 Provide unlimited unique counter (HO) +) +Package hobsub Info: Skipping package `hobsub' (already loaded). +Package: letltxmacro 2010/09/02 v1.4 Let assignment for LaTeX macros (HO) +Package: hopatch 2012/05/28 v1.2 Wrapper for package hooks (HO) +Package: xcolor-patch 2011/01/30 xcolor patch +Package: atveryend 2011/06/30 v1.8 Hooks at the very end of document (HO) +Package atveryend Info: \enddocument detected (standard20110627). +Package: atbegshi 2011/10/05 v1.16 At begin shipout hook (HO) +Package: refcount 2011/10/16 v3.4 Data extraction from label references (HO) +Package: hycolor 2011/01/30 v1.7 Color options for hyperref/bookmark (HO) +) (/usr/share/texlive/texmf-dist/tex/latex/oberdiek/auxhook.sty +Package: auxhook 2011/03/04 v1.3 Hooks for auxiliary files (HO) +) (/usr/share/texlive/texmf-dist/tex/latex/oberdiek/kvoptions.sty +Package: kvoptions 2011/06/30 v3.11 Key value format for package options (HO) +) +\@linkdim=\dimen113 +\Hy@linkcounter=\count101 +\Hy@pagecounter=\count102 +(/usr/share/texlive/texmf-dist/tex/latex/hyperref/pd1enc.def +File: pd1enc.def 2012/11/06 v6.83m Hyperref: PDFDocEncoding definition (HO) +Now handling font encoding PD1 ... +... no UTF-8 mapping file for font encoding PD1 +) +\Hy@SavedSpaceFactor=\count103 +(/usr/share/texlive/texmf-dist/tex/latex/latexconfig/hyperref.cfg +File: hyperref.cfg 2002/06/06 v1.2 hyperref configuration of TeXLive +) +Package hyperref Info: Option `unicode' set `true' on input line 4319. +(/usr/share/texlive/texmf-dist/tex/latex/hyperref/puenc.def +File: puenc.def 2012/11/06 v6.83m Hyperref: PDF Unicode definition (HO) +Now handling font encoding PU ... +... no UTF-8 mapping file for font encoding PU +) +Package hyperref Info: Hyper figures OFF on input line 4443. +Package hyperref Info: Link nesting OFF on input line 4448. +Package hyperref Info: Hyper index ON on input line 4451. +Package hyperref Info: Plain pages OFF on input line 4458. +Package hyperref Info: Backreferencing OFF on input line 4463. +Package hyperref Info: Implicit mode ON; LaTeX internals redefined. +Package hyperref Info: Bookmarks ON on input line 4688. +\c@Hy@tempcnt=\count104 +(/usr/share/texlive/texmf-dist/tex/latex/url/url.sty +\Urlmuskip=\muskip11 +Package: url 2006/04/12 ver 3.3 Verb mode for urls, etc. +) +LaTeX Info: Redefining \url on input line 5041. +\XeTeXLinkMargin=\dimen114 +\Fld@menulength=\count105 +\Field@Width=\dimen115 +\Fld@charsize=\dimen116 +Package hyperref Info: Hyper figures OFF on input line 6295. +Package hyperref Info: Link nesting OFF on input line 6300. +Package hyperref Info: Hyper index ON on input line 6303. +Package hyperref Info: backreferencing OFF on input line 6310. +Package hyperref Info: Link coloring OFF on input line 6315. +Package hyperref Info: Link coloring with OCG OFF on input line 6320. +Package hyperref Info: PDF/A mode OFF on input line 6325. +LaTeX Info: Redefining \ref on input line 6365. +LaTeX Info: Redefining \pageref on input line 6369. +\Hy@abspage=\count106 +\c@Item=\count107 +\c@Hfootnote=\count108 +) + +Package hyperref Message: Driver (autodetected): hpdftex. + +(/usr/share/texlive/texmf-dist/tex/latex/hyperref/hpdftex.def +File: hpdftex.def 2012/11/06 v6.83m Hyperref driver for pdfTeX +\Fld@listcount=\count109 +\c@bookmark@seq@number=\count110 +(/usr/share/texlive/texmf-dist/tex/latex/oberdiek/rerunfilecheck.sty +Package: rerunfilecheck 2011/04/15 v1.7 Rerun checks for auxiliary files (HO) +Package uniquecounter Info: New unique counter `rerunfilecheck' on input line 2 +82. +) +\Hy@SectionHShift=\skip47 +) +Package hyperref Info: Option `breaklinks' set `true' on input line 30. +(/usr/share/texlive/texmf-dist/tex/latex/geometry/geometry.sty +Package: geometry 2010/09/12 v5.6 Page Geometry +\Gm@cnth=\count111 +\Gm@cntv=\count112 +\c@Gm@tempcnt=\count113 +\Gm@bindingoffset=\dimen117 +\Gm@wd@mp=\dimen118 +\Gm@odd@mp=\dimen119 +\Gm@even@mp=\dimen120 +\Gm@layoutwidth=\dimen121 +\Gm@layoutheight=\dimen122 +\Gm@layouthoffset=\dimen123 +\Gm@layoutvoffset=\dimen124 +\Gm@dimlist=\toks24 +) (/usr/share/texlive/texmf-dist/tex/latex/graphics/color.sty +Package: color 2005/11/14 v1.0j Standard LaTeX Color (DPC) +(/usr/share/texlive/texmf-dist/tex/latex/latexconfig/color.cfg +File: color.cfg 2007/01/18 v1.5 color configuration of teTeX/TeXLive +) +Package color Info: Driver file: pdftex.def on input line 130. +(/usr/share/texlive/texmf-dist/tex/latex/pdftex-def/pdftex.def +File: pdftex.def 2011/05/27 v0.06d Graphics/color for pdfTeX +\Gread@gobject=\count114 +)) (/usr/share/texlive/texmf-dist/tex/latex/fancyvrb/fancyvrb.sty +Package: fancyvrb 2008/02/07 + +Style option: `fancyvrb' v2.7a, with DG/SPQR fixes, and firstline=lastline fix +<2008/02/07> (tvz) +\FV@CodeLineNo=\count115 +\FV@InFile=\read1 +\FV@TabBox=\box28 +\c@FancyVerbLine=\count116 +\FV@StepNumber=\count117 +\FV@OutFile=\write3 +) + +! LaTeX Error: File `framed.sty' not found. + +Type X to quit or <RETURN> to proceed, +or enter new name. (Default extension: sty) + +Enter file name: +! Emergency stop. +<read *> + +l.40 \definecolor + {shadecolor}{RGB}{248,248,248}^^M +Here is how much of TeX's memory you used: + 10860 strings out of 495063 + 153982 string characters out of 3182201 + 252362 words of memory out of 3000000 + 14008 multiletter control sequences out of 15000+200000 + 4403 words of font info for 15 fonts, out of 3000000 for 9000 + 14 hyphenation exceptions out of 8191 + 31i,0n,35p,299b,272s stack positions out of 5000i,500n,10000p,200000b,50000s + +! ==> Fatal error occurred, no output PDF file produced! diff --git a/_drakeCleanFeeders.R b/_drakeCleanFeeders.R index c4a47172175e04508a12aec8622122cd85c4cdfb..9e0fbd70f41d78079d0bc2a59b3c8e8e379ca6dd 100644 --- a/_drakeCleanFeeders.R +++ b/_drakeCleanFeeders.R @@ -154,8 +154,9 @@ my_plan <- drake::drake_plan( wideData = toWide(uniqData), saveLong = saveData(uniqData, "L"), # doesn't actually return anything saveWide = saveData(wideData, "W"), # doesn't actually return anything - htmlOut = makeReport(rmdFile, version, "html"), # html output - pdfOut = makeReport(rmdFile, version, "pdf") # pdf - must be some way to do this without re-running the whole thing + # pdf output fails + #pdfOut = makeReport(rmdFile, version, "pdf"), # pdf - must be some way to do this without re-running the whole thing + htmlOut = makeReport(rmdFile, version, "html") # html output ) # see https://books.ropensci.org/drake/projects.html#usage diff --git a/bibliography.bib b/bibliography.bib new file mode 100644 index 0000000000000000000000000000000000000000..04c87119d59d7bd4c4a37bb1e495e517a61a957d --- /dev/null +++ b/bibliography.bib @@ -0,0 +1,117 @@ + ############## + # R packages + + @Manual{baseR, + title = {R: A Language and Environment for Statistical Computing}, + author = {{R Core Team}}, + organization = {R Foundation for Statistical Computing}, + address = {Vienna, Austria}, + year = {2016}, + url = {https://www.R-project.org/}, + } + @Manual{bookdown, + title = {bookdown: Authoring Books and Technical Documents with R Markdown}, + author = {Yihui Xie}, + year = {2018}, + note = {R package version 0.9}, + url = {https://github.com/rstudio/bookdown}, + } + + + + @Manual{data.table, + title = {data.table: Extension of Data.frame}, + author = {M Dowle and A Srinivasan and T Short and S Lianoglou with contributions from R Saporta and E Antonyan}, + year = {2015}, + note = {R package version 1.9.6}, + url = {https://CRAN.R-project.org/package=data.table}, + } + + @Article{drake, + title = {The drake R package: a pipeline toolkit for reproducibility and high-performance computing}, + author = {William Michael Landau}, + journal = {Journal of Open Source Software}, + year = {2018}, + volume = {3}, + number = {21}, + url = {https://doi.org/10.21105/joss.00550}, + } + + @Book{ggplot2, + author = {Hadley Wickham}, + title = {ggplot2: Elegant Graphics for Data Analysis}, + publisher = {Springer-Verlag New York}, + year = {2009}, + isbn = {978-0-387-98140-6}, + url = {http://ggplot2.org}, + } + @Manual{here, + title = {here: A Simpler Way to Find Your Files}, + author = {Kirill Müller}, + year = {2017}, + note = {R package version 0.1}, + url = {https://CRAN.R-project.org/package=here}, + } +@Manual{kableExtra, + title = {kableExtra: Construct Complex Table with 'kable' and Pipe Syntax}, + author = {Hao Zhu}, + year = {2019}, + note = {R package version 1.0.1}, + url = {https://CRAN.R-project.org/package=kableExtra}, + } + @Manual{knitr, + title = {knitr: A General-Purpose Package for Dynamic Report Generation in R}, + author = {Yihui Xie}, + year = {2016}, + url = {https://CRAN.R-project.org/package=knitr}, + } + @Article{lubridate, + title = {Dates and Times Made Easy with {lubridate}}, + author = {Garrett Grolemund and Hadley Wickham}, + journal = {Journal of Statistical Software}, + year = {2011}, + volume = {40}, + number = {3}, + pages = {1--25}, + url = {http://www.jstatsoft.org/v40/i03/}, + } + @Manual{rmarkdown, + title = {rmarkdown: Dynamic Documents for R}, + author = {JJ Allaire and Yihui Xie and Jonathan McPherson and Javier Luraschi and Kevin Ushey and Aron Atkins and Hadley Wickham and Joe Cheng and Winston Chang and Richard Iannone}, + year = {2020}, + note = {R package version 2.1}, + url = {https://github.com/rstudio/rmarkdown}, + } + @Book{rmarkdownBook, + title = {R Markdown: The Definitive Guide}, + author = {Yihui Xie and J.J. Allaire and Garrett Grolemund}, + publisher = {Chapman and Hall/CRC}, + address = {Boca Raton, Florida}, + year = {2018}, + note = {ISBN 9781138359338}, + url = {https://bookdown.org/yihui/rmarkdown}, + } + @Manual{skimr, + title = {skimr: skimr}, + author = {Eduardo {Arino de la Rubia} and Hao Zhu and Shannon Ellis and Elin Waring and Michael Quinn}, + year = {2017}, + note = {R package version 1.0}, + url = {https://github.com/ropenscilabs/skimr}, + } + + @Manual{tidyverse, + title = {tidyverse: Easily Install and Load 'Tidyverse' Packages}, + author = {Hadley Wickham}, + year = {2017}, + note = {R package version 1.1.1}, + url = {https://CRAN.R-project.org/package=tidyverse}, + } + +@Manual{viridis, + title = {viridis: Default Color Maps from 'matplotlib'}, + author = {Simon Garnier}, + year = {2018}, + note = {R package version 0.5.1}, + url = {https://CRAN.R-project.org/package=viridis}, + } + \ No newline at end of file diff --git a/docs/cleaningFeederData_allData.tex b/docs/cleaningFeederData_allData.tex new file mode 100644 index 0000000000000000000000000000000000000000..6db441e85ad06170a8b9aff0165dec4d94a824df --- /dev/null +++ b/docs/cleaningFeederData_allData.tex @@ -0,0 +1,1977 @@ +\documentclass[]{article} +\usepackage{lmodern} +\usepackage{amssymb,amsmath} +\usepackage{ifxetex,ifluatex} +\usepackage{fixltx2e} % provides \textsubscript +\ifnum 0\ifxetex 1\fi\ifluatex 1\fi=0 % if pdftex + \usepackage[T1]{fontenc} + \usepackage[utf8]{inputenc} +\else % if luatex or xelatex + \ifxetex + \usepackage{mathspec} + \else + \usepackage{fontspec} + \fi + \defaultfontfeatures{Ligatures=TeX,Scale=MatchLowercase} +\fi +% use upquote if available, for straight quotes in verbatim environments +\IfFileExists{upquote.sty}{\usepackage{upquote}}{} +% use microtype if available +\IfFileExists{microtype.sty}{% +\usepackage[]{microtype} +\UseMicrotypeSet[protrusion]{basicmath} % disable protrusion for tt fonts +}{} +\PassOptionsToPackage{hyphens}{url} % url is loaded by hyperref +\usepackage[unicode=true]{hyperref} +\hypersetup{ + pdftitle={Testing electricity substation/feeder data}, + pdfauthor={Ben Anderson \& Ellis Ridett}, + pdfborder={0 0 0}, + breaklinks=true} +\urlstyle{same} % don't use monospace font for urls +\usepackage[margin=1in]{geometry} +\usepackage{color} +\usepackage{fancyvrb} +\newcommand{\VerbBar}{|} +\newcommand{\VERB}{\Verb[commandchars=\\\{\}]} +\DefineVerbatimEnvironment{Highlighting}{Verbatim}{commandchars=\\\{\}} +% Add ',fontsize=\small' for more characters per line +\usepackage{framed} +\definecolor{shadecolor}{RGB}{248,248,248} +\newenvironment{Shaded}{\begin{snugshade}}{\end{snugshade}} +\newcommand{\KeywordTok}[1]{\textcolor[rgb]{0.13,0.29,0.53}{\textbf{#1}}} +\newcommand{\DataTypeTok}[1]{\textcolor[rgb]{0.13,0.29,0.53}{#1}} +\newcommand{\DecValTok}[1]{\textcolor[rgb]{0.00,0.00,0.81}{#1}} +\newcommand{\BaseNTok}[1]{\textcolor[rgb]{0.00,0.00,0.81}{#1}} +\newcommand{\FloatTok}[1]{\textcolor[rgb]{0.00,0.00,0.81}{#1}} +\newcommand{\ConstantTok}[1]{\textcolor[rgb]{0.00,0.00,0.00}{#1}} +\newcommand{\CharTok}[1]{\textcolor[rgb]{0.31,0.60,0.02}{#1}} +\newcommand{\SpecialCharTok}[1]{\textcolor[rgb]{0.00,0.00,0.00}{#1}} +\newcommand{\StringTok}[1]{\textcolor[rgb]{0.31,0.60,0.02}{#1}} +\newcommand{\VerbatimStringTok}[1]{\textcolor[rgb]{0.31,0.60,0.02}{#1}} +\newcommand{\SpecialStringTok}[1]{\textcolor[rgb]{0.31,0.60,0.02}{#1}} +\newcommand{\ImportTok}[1]{#1} +\newcommand{\CommentTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{\textit{#1}}} +\newcommand{\DocumentationTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{\textbf{\textit{#1}}}} +\newcommand{\AnnotationTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{\textbf{\textit{#1}}}} +\newcommand{\CommentVarTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{\textbf{\textit{#1}}}} +\newcommand{\OtherTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{#1}} +\newcommand{\FunctionTok}[1]{\textcolor[rgb]{0.00,0.00,0.00}{#1}} +\newcommand{\VariableTok}[1]{\textcolor[rgb]{0.00,0.00,0.00}{#1}} +\newcommand{\ControlFlowTok}[1]{\textcolor[rgb]{0.13,0.29,0.53}{\textbf{#1}}} +\newcommand{\OperatorTok}[1]{\textcolor[rgb]{0.81,0.36,0.00}{\textbf{#1}}} +\newcommand{\BuiltInTok}[1]{#1} +\newcommand{\ExtensionTok}[1]{#1} +\newcommand{\PreprocessorTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{\textit{#1}}} +\newcommand{\AttributeTok}[1]{\textcolor[rgb]{0.77,0.63,0.00}{#1}} +\newcommand{\RegionMarkerTok}[1]{#1} +\newcommand{\InformationTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{\textbf{\textit{#1}}}} +\newcommand{\WarningTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{\textbf{\textit{#1}}}} +\newcommand{\AlertTok}[1]{\textcolor[rgb]{0.94,0.16,0.16}{#1}} +\newcommand{\ErrorTok}[1]{\textcolor[rgb]{0.64,0.00,0.00}{\textbf{#1}}} +\newcommand{\NormalTok}[1]{#1} +\usepackage{graphicx,grffile} +\makeatletter +\def\maxwidth{\ifdim\Gin@nat@width>\linewidth\linewidth\else\Gin@nat@width\fi} +\def\maxheight{\ifdim\Gin@nat@height>\textheight\textheight\else\Gin@nat@height\fi} +\makeatother +% Scale images if necessary, so that they will not overflow the page +% margins by default, and it is still possible to overwrite the defaults +% using explicit options in \includegraphics[width, height, ...]{} +\setkeys{Gin}{width=\maxwidth,height=\maxheight,keepaspectratio} +\IfFileExists{parskip.sty}{% +\usepackage{parskip} +}{% else +\setlength{\parindent}{0pt} +\setlength{\parskip}{6pt plus 2pt minus 1pt} +} +\setlength{\emergencystretch}{3em} % prevent overfull lines +\providecommand{\tightlist}{% + \setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}} +\setcounter{secnumdepth}{0} +% Redefines (sub)paragraphs to behave more like sections +\ifx\paragraph\undefined\else +\let\oldparagraph\paragraph +\renewcommand{\paragraph}[1]{\oldparagraph{#1}\mbox{}} +\fi +\ifx\subparagraph\undefined\else +\let\oldsubparagraph\subparagraph +\renewcommand{\subparagraph}[1]{\oldsubparagraph{#1}\mbox{}} +\fi + +% set default figure placement to htbp +\makeatletter +\def\fps@figure{htbp} +\makeatother + +\usepackage{etoolbox} +\makeatletter +\providecommand{\subtitle}[1]{% add subtitle to \maketitle + \apptocmd{\@title}{\par {\large #1 \par}}{}{} +} +\makeatother + +\title{Testing electricity substation/feeder data} +\providecommand{\subtitle}[1]{} +\subtitle{Outliers and missing data\ldots{}} +\author{Ben Anderson \& Ellis Ridett} +\date{Last run at: 2020-07-08 22:56:02} + +\begin{document} +\maketitle + +\begin{Shaded} +\begin{Highlighting}[] +\CommentTok{# Knitr setup ----} +\NormalTok{knitr}\OperatorTok{::}\NormalTok{opts_chunk}\OperatorTok{$}\KeywordTok{set}\NormalTok{(}\DataTypeTok{echo =} \OtherTok{TRUE}\NormalTok{)} +\NormalTok{knitr}\OperatorTok{::}\NormalTok{opts_chunk}\OperatorTok{$}\KeywordTok{set}\NormalTok{(}\DataTypeTok{warning =} \OtherTok{FALSE}\NormalTok{) }\CommentTok{# for final tidy run} +\NormalTok{knitr}\OperatorTok{::}\NormalTok{opts_chunk}\OperatorTok{$}\KeywordTok{set}\NormalTok{(}\DataTypeTok{message =} \OtherTok{FALSE}\NormalTok{) }\CommentTok{# for final tidy run} + +\CommentTok{# Set start time ----} +\NormalTok{startTime <-}\StringTok{ }\KeywordTok{proc.time}\NormalTok{()} + +\CommentTok{# Libraries ----} +\NormalTok{rmdLibs <-}\StringTok{ }\KeywordTok{c}\NormalTok{(}\StringTok{"kableExtra"} \CommentTok{# tables} +\NormalTok{)} +\CommentTok{# load them} +\NormalTok{dataCleaning}\OperatorTok{::}\KeywordTok{loadLibraries}\NormalTok{(rmdLibs)} +\end{Highlighting} +\end{Shaded} + +\begin{verbatim} +## kableExtra +## TRUE +\end{verbatim} + +\begin{Shaded} +\begin{Highlighting}[] +\CommentTok{# Parameters ----} +\CommentTok{#dFile <- "~/Dropbox/Ben_IOW_SS.csv" # edit for your set up} + + +\CommentTok{# Functions ----} +\CommentTok{# put more general ones that could be useful to everyone in /R so they are built into the package.} + +\CommentTok{# put functions relevant to this analysis here} +\end{Highlighting} +\end{Shaded} + +\section{Intro}\label{intro} + +We have some electricity substation feeder data that has been cleaned to +give mean kW per 15 minutes. + +There seem to be some NA kW values and a lot of missing time stamps. We +want to select the `best' (i.e most complete) days within a +day-of-the-week/season/year sampling frame. If we can't do that we may +have to resort to seasonal mean kW profiles by hour \& day of the +week\ldots{} + +Code used to generate this report: +\url{https://git.soton.ac.uk/ba1e12/spatialec/-/blob/master/isleOfWight/cleaningFeederData.Rmd} + +\section{Data prep}\label{data-prep} + +\subsection{Load data}\label{load-data} + +Loaded data from +/mnt/SERG\_data/Ellis\_IOW/Cleaned\_SS\_Amps/amps\_all\_substations.csv.gz\ldots{} +(using drake) + +\begin{Shaded} +\begin{Highlighting}[] +\NormalTok{origDataDT <-}\StringTok{ }\NormalTok{drake}\OperatorTok{::}\KeywordTok{readd}\NormalTok{(origData) }\CommentTok{# readd the drake object} +\KeywordTok{head}\NormalTok{(origDataDT)} +\end{Highlighting} +\end{Shaded} + +\begin{verbatim} +## Time region sub_region rDateTime rTime +## 1: 2003-01-13T10:30:00Z ARRN ARRN 2003-01-13 10:30:00 10:30:00 +## 2: 2003-01-13T10:45:00Z ARRN ARRN 2003-01-13 10:45:00 10:45:00 +## 3: 2003-01-13T11:15:00Z ARRN ARRN 2003-01-13 11:15:00 11:15:00 +## 4: 2003-01-13T11:30:00Z ARRN ARRN 2003-01-13 11:30:00 11:30:00 +## 5: 2003-01-13T11:45:00Z ARRN ARRN 2003-01-13 11:45:00 11:45:00 +## 6: 2003-01-13T12:15:00Z ARRN ARRN 2003-01-13 12:15:00 12:15:00 +## rDate rYear rDoW kW feeder_ID season +## 1: 2003-01-13 2003 Mon 2.0000000 ARRN_ARRN Winter +## 2: 2003-01-13 2003 Mon 18.2500000 ARRN_ARRN Winter +## 3: 2003-01-13 2003 Mon 0.6666667 ARRN_ARRN Winter +## 4: 2003-01-13 2003 Mon 28.5000000 ARRN_ARRN Winter +## 5: 2003-01-13 2003 Mon 19.5555556 ARRN_ARRN Winter +## 6: 2003-01-13 2003 Mon 12.8000000 ARRN_ARRN Winter +\end{verbatim} + +\begin{Shaded} +\begin{Highlighting}[] +\NormalTok{uniqDataDT <-}\StringTok{ }\NormalTok{drake}\OperatorTok{::}\KeywordTok{readd}\NormalTok{(uniqData) }\CommentTok{# readd the drake object} +\end{Highlighting} +\end{Shaded} + +Check data prep worked OK. + +\begin{Shaded} +\begin{Highlighting}[] +\CommentTok{# check} +\NormalTok{t <-}\StringTok{ }\NormalTok{origDataDT[, .(}\DataTypeTok{nObs =}\NormalTok{ .N,} + \DataTypeTok{firstDate =} \KeywordTok{min}\NormalTok{(rDateTime),} + \DataTypeTok{lastDate =} \KeywordTok{max}\NormalTok{(rDateTime),} + \DataTypeTok{meankW =} \KeywordTok{mean}\NormalTok{(kW, }\DataTypeTok{na.rm =} \OtherTok{TRUE}\NormalTok{)} +\NormalTok{), keyby =}\StringTok{ }\NormalTok{.(region, feeder_ID)]} + +\NormalTok{kableExtra}\OperatorTok{::}\KeywordTok{kable}\NormalTok{(t, }\DataTypeTok{digits =} \DecValTok{2}\NormalTok{,} + \DataTypeTok{caption =} \StringTok{"Counts per feeder (long table)"}\NormalTok{) }\OperatorTok{%>%} +\StringTok{ }\KeywordTok{kable_styling}\NormalTok{()} +\end{Highlighting} +\end{Shaded} + +Counts per feeder (long table) + +region + +feeder\_ID + +nObs + +firstDate + +lastDate + +meankW + +ARRN + +ARRN\_ARRN + +94909 + +2003-01-13 10:30:00 + +2017-10-25 22:15:00 + +151.74 + +BINS + +BINS\_C1T0 + +218480 + +2001-09-21 07:30:00 + +2017-10-13 23:45:00 + +94.70 + +BINS + +BINS\_C2T0 + +208447 + +2001-09-21 07:30:00 + +2017-10-13 23:45:00 + +93.91 + +BINS + +BINS\_E1L5 + +414980 + +2001-09-21 07:30:00 + +2017-10-13 23:15:00 + +94.31 + +BINS + +BINS\_E2L5 + +115260 + +2001-10-10 12:00:00 + +2017-10-04 20:45:00 + +20.58 + +BINS + +BINS\_E3L5 + +337064 + +2001-10-10 12:00:00 + +2017-10-14 23:30:00 + +59.67 + +FFPV + +FFPV\_FFPV + +32278 + +2014-09-25 09:15:00 + +2017-10-11 16:15:00 + +36.14 + +FRES + +FRES\_E1L5 + +452480 + +2001-10-10 12:00:00 + +2017-10-13 23:45:00 + +53.08 + +FRES + +FRES\_E1T0 + +188186 + +2001-09-11 15:15:00 + +2017-09-01 19:00:00 + +128.98 + +FRES + +FRES\_E2L5 + +178744 + +2001-10-10 12:00:00 + +2017-10-13 23:30:00 + +25.64 + +FRES + +FRES\_E2T0 + +164910 + +2001-09-11 15:15:00 + +2017-10-12 23:45:00 + +122.44 + +FRES + +FRES\_E3L5 + +463006 + +2001-10-10 12:00:00 + +2017-10-13 23:00:00 + +50.65 + +FRES + +FRES\_E4L5 + +15752 + +2010-07-30 17:00:00 + +2017-09-18 19:45:00 + +60.89 + +FRES + +FRES\_E6L5 + +317352 + +2001-09-11 15:15:00 + +2017-10-13 23:00:00 + +85.32 + +NEWP + +NEWP\_E11L5 + +367422 + +2005-01-20 10:00:00 + +2017-09-28 23:15:00 + +72.32 + +NEWP + +NEWP\_E13L5 + +252979 + +2010-01-01 00:15:00 + +2017-09-28 23:45:00 + +126.20 + +NEWP + +NEWP\_E15L5 + +295094 + +2008-01-07 12:00:00 + +2017-10-10 23:45:00 + +76.95 + +NEWP + +NEWP\_E17L5 + +63422 + +2011-03-10 12:45:00 + +2017-10-11 23:30:00 + +11.44 + +NEWP + +NEWP\_E19L5 + +126299 + +2011-03-14 09:45:00 + +2017-10-11 23:45:00 + +18.38 + +NEWP + +NEWP\_E1L5 + +318151 + +2001-10-10 12:15:00 + +2017-09-26 23:45:00 + +45.66 + +NEWP + +NEWP\_E1T0 + +101494 + +2001-09-11 15:30:00 + +2017-09-18 19:45:00 + +475.07 + +NEWP + +NEWP\_E2L5 + +67835 + +2001-09-11 15:30:00 + +2017-09-26 22:45:00 + +58.44 + +NEWP + +NEWP\_E2T0 + +399812 + +2001-10-10 12:15:00 + +2017-09-27 12:00:00 + +426.55 + +NEWP + +NEWP\_E3L5 + +480643 + +2001-10-10 12:15:00 + +2017-09-26 23:45:00 + +73.64 + +NEWP + +NEWP\_E3T0 + +246265 + +2005-08-03 11:15:00 + +2017-09-26 23:45:00 + +383.05 + +NEWP + +NEWP\_E4L5 + +191514 + +NA + +NA + +105.57 + +NEWP + +NEWP\_E5L5 + +448392 + +2001-09-11 15:15:00 + +2017-09-27 23:45:00 + +42.46 + +NEWP + +NEWP\_E6L5 + +434217 + +2001-09-11 15:30:00 + +2017-09-27 23:45:00 + +69.91 + +NEWP + +NEWP\_E7L5 + +306799 + +2001-10-10 12:15:00 + +2017-09-27 23:15:00 + +71.96 + +NEWP + +NEWP\_E8L5 + +537871 + +2001-10-10 12:15:00 + +2017-09-27 23:30:00 + +139.40 + +NEWP + +NEWP\_E9L5 + +363063 + +2002-12-19 22:30:00 + +2017-09-28 23:45:00 + +101.30 + +RYDE + +RYDE\_E1L5 + +356616 + +2001-09-21 09:30:00 + +2017-10-11 23:45:00 + +70.48 + +RYDE + +RYDE\_E1T0 \&E1S0 + +251062 + +2001-10-10 12:15:00 + +2017-10-11 23:30:00 + +336.55 + +RYDE + +RYDE\_E2L5 + +297293 + +NA + +NA + +71.14 + +RYDE + +RYDE\_E2T0 + +238332 + +2001-10-10 12:15:00 + +2017-10-11 23:45:00 + +351.26 + +RYDE + +RYDE\_E3L5 + +304293 + +2001-09-21 09:30:00 + +2017-10-11 23:45:00 + +85.22 + +RYDE + +RYDE\_E4L5 + +519366 + +NA + +NA + +70.23 + +RYDE + +RYDE\_E5L5 + +362368 + +2001-09-21 09:30:00 + +2017-10-12 23:15:00 + +82.05 + +RYDE + +RYDE\_E6L5 + +442859 + +2001-09-21 09:30:00 + +2017-10-12 23:45:00 + +96.24 + +RYDE + +RYDE\_E7L5 + +324195 + +2001-09-21 09:30:00 + +2017-10-12 22:45:00 + +69.86 + +RYDE + +RYDE\_E8L5 + +275373 + +2001-10-10 12:15:00 + +2017-10-12 23:15:00 + +57.04 + +RYDE + +RYDE\_E9L5 + +267617 + +2001-09-25 17:00:00 + +2017-10-12 23:30:00 + +59.20 + +SADO + +SADO\_E1L5 + +212775 + +2001-09-21 13:30:00 + +2017-10-25 23:15:00 + +50.98 + +SADO + +SADO\_E1T0 + +421960 + +2001-09-21 13:30:00 + +2017-10-25 23:45:00 + +230.66 + +SADO + +SADO\_E2L5 + +178715 + +2001-09-21 13:30:00 + +2017-10-25 23:15:00 + +39.74 + +SADO + +SADO\_E2T0 + +412191 + +2001-10-10 12:15:00 + +2017-10-25 23:30:00 + +173.51 + +SADO + +SADO\_E3L5 + +272831 + +2001-09-21 13:30:00 + +2017-10-25 23:15:00 + +64.61 + +SADO + +SADO\_E4L5 + +479020 + +2001-09-21 13:30:00 + +2017-10-25 23:45:00 + +58.38 + +SADO + +SADO\_E5L5 + +343918 + +2001-09-21 13:30:00 + +2017-10-25 23:45:00 + +82.67 + +SADO + +SADO\_E6L5 + +239227 + +2001-09-21 13:30:00 + +2017-10-25 23:30:00 + +56.34 + +SADO + +SADO\_E8L5 + +282455 + +2004-08-16 17:45:00 + +2017-10-25 23:30:00 + +89.57 + +SHAL + +SHAL\_C3L5 + +163204 + +2001-10-10 12:45:00 + +2017-10-15 23:15:00 + +38.22 + +SHAL + +SHAL\_C4L5 + +187940 + +2001-09-11 15:30:00 + +2017-10-15 23:45:00 + +38.77 + +SHAL + +SHAL\_C5L5 + +29417 + +2015-12-03 15:00:00 + +2017-10-15 23:45:00 + +36.35 + +SHAL + +SHAL\_E1L5 + +465913 + +2001-10-10 12:15:00 + +2017-10-14 23:30:00 + +70.65 + +SHAL + +SHAL\_E1T0 + +181132 + +2001-10-10 12:15:00 + +2017-10-14 23:15:00 + +101.23 + +SHAL + +SHAL\_E2L5 + +290286 + +2001-10-10 12:15:00 + +2017-10-15 23:00:00 + +47.09 + +SHAL + +SHAL\_E2T0 + +174129 + +2001-10-10 12:30:00 + +2017-10-14 22:45:00 + +107.44 + +SHAL + +SHAL\_E3L5 + +258805 + +2010-03-11 07:00:00 + +2017-10-15 23:45:00 + +33.26 + +SHAL + +SHAL\_E4L5 + +322135 + +2001-09-11 15:30:00 + +2017-10-16 12:30:00 + +54.03 + +SHAN + +SHAN\_E1L5 + +288894 + +2001-09-21 14:15:00 + +2017-10-24 23:15:00 + +63.52 + +SHAN + +SHAN\_E1T0 + +330691 + +2001-10-10 12:15:00 + +2017-10-24 23:45:00 + +226.58 + +SHAN + +SHAN\_E2L5 + +321760 + +2001-09-21 14:15:00 + +2017-10-25 23:15:00 + +72.63 + +SHAN + +SHAN\_E2T0 + +315053 + +2001-10-10 12:15:00 + +2017-10-24 23:45:00 + +186.69 + +SHAN + +SHAN\_E3L5 + +105606 + +2001-09-21 14:15:00 + +2017-10-25 23:15:00 + +26.30 + +SHAN + +SHAN\_E4L5 + +216626 + +2001-09-21 14:15:00 + +2017-10-25 23:30:00 + +33.63 + +SHAN + +SHAN\_E5L5 + +254742 + +2001-09-21 14:15:00 + +2017-10-25 23:15:00 + +48.50 + +SHAN + +SHAN\_E6L5 + +363107 + +2001-09-21 14:15:00 + +2017-10-25 23:15:00 + +68.69 + +SHAN + +SHAN\_E7L5 + +384165 + +2001-09-21 14:15:00 + +2017-10-25 23:15:00 + +66.12 + +SHAN + +SHAN\_E8L5 + +146605 + +2002-02-05 17:30:00 + +2017-10-25 23:15:00 + +25.28 + +VENT + +VENT\_E1L5 + +203617 + +2001-09-11 15:45:00 + +2017-10-15 23:15:00 + +33.24 + +VENT + +VENT\_E1T0 + +240745 + +2001-09-11 15:30:00 + +2017-10-15 23:15:00 + +191.42 + +VENT + +VENT\_E2L5 + +402307 + +2001-09-27 14:00:00 + +2017-10-16 23:45:00 + +46.68 + +VENT + +VENT\_E2T0 + +208020 + +2001-09-11 15:45:00 + +2017-10-15 23:30:00 + +115.47 + +VENT + +VENT\_E3L5 + +493337 + +2001-09-11 15:45:00 + +2017-10-16 23:45:00 + +83.59 + +VENT + +VENT\_E4L5 + +387037 + +2001-09-11 15:45:00 + +2017-10-16 23:30:00 + +40.86 + +VENT + +VENT\_E5L5 + +481677 + +2001-09-27 14:00:00 + +2017-10-16 23:45:00 + +88.43 + +VENT + +VENT\_E6L5 + +6631 + +2001-09-27 14:00:00 + +2017-10-24 20:15:00 + +3.95 + +Do a duplicate check by feeder\_ID, dateTime \& kW. In theory there +should not be any. + +\begin{Shaded} +\begin{Highlighting}[] +\KeywordTok{message}\NormalTok{(}\StringTok{"Original data nrows: "}\NormalTok{, }\KeywordTok{tidyNum}\NormalTok{(}\KeywordTok{nrow}\NormalTok{(origDataDT)))} + +\KeywordTok{message}\NormalTok{(}\StringTok{"Unique data nrows: "}\NormalTok{, }\KeywordTok{tidyNum}\NormalTok{(}\KeywordTok{nrow}\NormalTok{(uniqDataDT)))} + +\KeywordTok{message}\NormalTok{(}\StringTok{"So we have "}\NormalTok{, }\KeywordTok{tidyNum}\NormalTok{(}\KeywordTok{nrow}\NormalTok{(origDataDT) }\OperatorTok{-}\StringTok{ }\KeywordTok{nrow}\NormalTok{(uniqDataDT)), }\StringTok{" duplicates..."}\NormalTok{)} + +\NormalTok{pc <-}\StringTok{ }\DecValTok{100}\OperatorTok{*}\NormalTok{((}\KeywordTok{nrow}\NormalTok{(origDataDT) }\OperatorTok{-}\StringTok{ }\KeywordTok{nrow}\NormalTok{(uniqDataDT))}\OperatorTok{/}\KeywordTok{nrow}\NormalTok{(origDataDT))} +\KeywordTok{message}\NormalTok{(}\StringTok{"That's "}\NormalTok{, }\KeywordTok{round}\NormalTok{(pc,}\DecValTok{2}\NormalTok{), }\StringTok{"%"}\NormalTok{)} + +\NormalTok{feederDT <-}\StringTok{ }\NormalTok{uniqDataDT }\CommentTok{# use dt with no duplicates} +\NormalTok{origDataDT <-}\StringTok{ }\OtherTok{NULL} \CommentTok{# save memory} +\end{Highlighting} +\end{Shaded} + +So we remove the duplicates\ldots{} + +\section{Basic patterns}\label{basic-patterns} + +Try aggregated demand profiles of mean kW by season and feeder and day +of the week\ldots{} Remove the legend so we can see the plot. + +\begin{Shaded} +\begin{Highlighting}[] +\NormalTok{plotDT <-}\StringTok{ }\NormalTok{feederDT[, .(}\DataTypeTok{meankW =} \KeywordTok{mean}\NormalTok{(kW),} + \DataTypeTok{nObs =}\NormalTok{ .N), keyby =}\StringTok{ }\NormalTok{.(rTime, season, feeder_ID, rDoW)]} + +\NormalTok{ggplot2}\OperatorTok{::}\KeywordTok{ggplot}\NormalTok{(plotDT, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ rTime, }\DataTypeTok{y =}\NormalTok{ meankW, }\DataTypeTok{colour =}\NormalTok{ feeder_ID)) }\OperatorTok{+} +\StringTok{ }\KeywordTok{geom_line}\NormalTok{() }\OperatorTok{+} +\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{legend.position=}\StringTok{"none"}\NormalTok{) }\OperatorTok{+}\StringTok{ }\CommentTok{# remove legend so we can see the plot} +\StringTok{ }\KeywordTok{facet_grid}\NormalTok{(season }\OperatorTok{~}\StringTok{ }\NormalTok{rDoW)} +\end{Highlighting} +\end{Shaded} + +\includegraphics{/home/ba1e12/git.Soton/ba1e12/datacleaning/docs/cleaningFeederData_allData_files/figure-latex/kwProfiles-1.pdf} + +Is that what we expect? + +\section{Test for missing}\label{test-for-missing} + +Number of observations per feeder per day - gaps will be visible +(totally missing days) as will low counts (partially missing days) - we +would expect 24 * 4\ldots{} Convert this to a \% of expected\ldots{} + +\begin{Shaded} +\begin{Highlighting}[] +\NormalTok{plotDT <-}\StringTok{ }\NormalTok{feederDT[, .(}\DataTypeTok{nObs =}\NormalTok{ .N), keyby =}\StringTok{ }\NormalTok{.(rDate, feeder_ID)]} +\NormalTok{plotDT[, propExpected }\OperatorTok{:}\ErrorTok{=}\StringTok{ }\NormalTok{nObs}\OperatorTok{/}\NormalTok{(}\DecValTok{24}\OperatorTok{*}\DecValTok{4}\NormalTok{)]} + +\NormalTok{ggplot2}\OperatorTok{::}\KeywordTok{ggplot}\NormalTok{(plotDT, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ rDate, }\DataTypeTok{y =}\NormalTok{ feeder_ID, }\DataTypeTok{fill =} \DecValTok{100}\OperatorTok{*}\NormalTok{propExpected)) }\OperatorTok{+} +\StringTok{ }\KeywordTok{geom_tile}\NormalTok{() }\OperatorTok{+} +\StringTok{ }\KeywordTok{scale_x_date}\NormalTok{(}\DataTypeTok{date_breaks =} \StringTok{"3 months"}\NormalTok{, }\DataTypeTok{date_labels =} \StringTok{"%B %Y"}\NormalTok{) }\OperatorTok{+} +\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{axis.text.x=}\KeywordTok{element_text}\NormalTok{(}\DataTypeTok{angle=}\DecValTok{90}\NormalTok{, }\DataTypeTok{hjust=}\DecValTok{1}\NormalTok{)) }\OperatorTok{+} +\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{legend.position=}\StringTok{"bottom"}\NormalTok{) }\OperatorTok{+} +\StringTok{ }\KeywordTok{scale_fill_viridis_c}\NormalTok{(}\DataTypeTok{name=}\StringTok{"% expected"}\NormalTok{)} +\end{Highlighting} +\end{Shaded} + +\includegraphics{/home/ba1e12/git.Soton/ba1e12/datacleaning/docs/cleaningFeederData_allData_files/figure-latex/basicCountTile-1.pdf} + +This is not good. There are both gaps (missing days) and partial days. +\textbf{Lots} of partial days. Why is the data relatively good up to the +end of 2003? + +What does it look like if we aggregate across all feeders by time? There +are 78 feeders so we should get this many at best How close do we get? + +\begin{Shaded} +\begin{Highlighting}[] +\NormalTok{plotDT <-}\StringTok{ }\NormalTok{feederDT[, .(}\DataTypeTok{nObs =}\NormalTok{ .N,} + \DataTypeTok{meankW =} \KeywordTok{mean}\NormalTok{(kW)), keyby =}\StringTok{ }\NormalTok{.(rTime, rDate, season)]} + +\NormalTok{plotDT[, propExpected }\OperatorTok{:}\ErrorTok{=}\StringTok{ }\NormalTok{nObs}\OperatorTok{/}\KeywordTok{uniqueN}\NormalTok{(feederDT}\OperatorTok{$}\NormalTok{feeder_ID)] }\CommentTok{# we now have all feeders per time so...} + +\NormalTok{ggplot2}\OperatorTok{::}\KeywordTok{ggplot}\NormalTok{(plotDT, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ rDate, }\DataTypeTok{y =}\NormalTok{ rTime, }\DataTypeTok{fill =} \DecValTok{100}\OperatorTok{*}\NormalTok{propExpected)) }\OperatorTok{+} +\StringTok{ }\KeywordTok{geom_tile}\NormalTok{() }\OperatorTok{+} +\StringTok{ }\KeywordTok{scale_x_date}\NormalTok{(}\DataTypeTok{date_breaks =} \StringTok{"6 months"}\NormalTok{, }\DataTypeTok{date_labels =} \StringTok{"%B %Y"}\NormalTok{) }\OperatorTok{+} +\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{axis.text.x=}\KeywordTok{element_text}\NormalTok{(}\DataTypeTok{angle=}\DecValTok{90}\NormalTok{, }\DataTypeTok{hjust=}\DecValTok{1}\NormalTok{)) }\OperatorTok{+} +\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{legend.position=}\StringTok{"bottom"}\NormalTok{) }\OperatorTok{+} +\StringTok{ }\KeywordTok{scale_fill_viridis_c}\NormalTok{(}\DataTypeTok{name=}\StringTok{"% expected"}\NormalTok{)} +\end{Highlighting} +\end{Shaded} + +\includegraphics{/home/ba1e12/git.Soton/ba1e12/datacleaning/docs/cleaningFeederData_allData_files/figure-latex/aggVisN-1.pdf} + +That really doesn't look too good. There are some very odd fluctuations +in there. And something changed after 2003\ldots{} + +What do the mean kw patterns look like per feeder per day? + +\begin{Shaded} +\begin{Highlighting}[] +\NormalTok{plotDT <-}\StringTok{ }\NormalTok{feederDT[, .(}\DataTypeTok{meankW =} \KeywordTok{mean}\NormalTok{(kW, }\DataTypeTok{na.rm =} \OtherTok{TRUE}\NormalTok{)), keyby =}\StringTok{ }\NormalTok{.(rDate, feeder_ID)]} + +\NormalTok{ggplot2}\OperatorTok{::}\KeywordTok{ggplot}\NormalTok{(plotDT, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ rDate, }\DataTypeTok{y =}\NormalTok{ feeder_ID, }\DataTypeTok{fill =}\NormalTok{ meankW)) }\OperatorTok{+} +\StringTok{ }\KeywordTok{geom_tile}\NormalTok{() }\OperatorTok{+} +\StringTok{ }\KeywordTok{scale_x_date}\NormalTok{(}\DataTypeTok{date_breaks =} \StringTok{"3 months"}\NormalTok{, }\DataTypeTok{date_labels =} \StringTok{"%B %Y"}\NormalTok{) }\OperatorTok{+} +\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{axis.text.x=}\KeywordTok{element_text}\NormalTok{(}\DataTypeTok{angle=}\DecValTok{90}\NormalTok{, }\DataTypeTok{hjust=}\DecValTok{1}\NormalTok{)) }\OperatorTok{+} +\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{legend.position=}\StringTok{"bottom"}\NormalTok{) }\OperatorTok{+} +\StringTok{ }\KeywordTok{scale_fill_viridis_c}\NormalTok{(}\DataTypeTok{name=}\StringTok{"Mean kW"}\NormalTok{)} +\end{Highlighting} +\end{Shaded} + +\includegraphics{/home/ba1e12/git.Soton/ba1e12/datacleaning/docs/cleaningFeederData_allData_files/figure-latex/basickWTile-1.pdf} + +Missing data is even more clearly visible. + +What about mean kw across all feeders? + +\begin{Shaded} +\begin{Highlighting}[] +\NormalTok{plotDT <-}\StringTok{ }\NormalTok{feederDT[, .(}\DataTypeTok{nObs =}\NormalTok{ .N,} + \DataTypeTok{meankW =} \KeywordTok{mean}\NormalTok{(kW)), keyby =}\StringTok{ }\NormalTok{.(rTime, rDate, season)]} + +\NormalTok{ggplot2}\OperatorTok{::}\KeywordTok{ggplot}\NormalTok{(plotDT, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ rDate, }\DataTypeTok{y =}\NormalTok{ rTime, }\DataTypeTok{fill =}\NormalTok{ meankW)) }\OperatorTok{+} +\StringTok{ }\KeywordTok{geom_tile}\NormalTok{() }\OperatorTok{+} +\StringTok{ }\KeywordTok{scale_x_date}\NormalTok{(}\DataTypeTok{date_breaks =} \StringTok{"6 months"}\NormalTok{, }\DataTypeTok{date_labels =} \StringTok{"%B %Y"}\NormalTok{) }\OperatorTok{+} +\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{axis.text.x=}\KeywordTok{element_text}\NormalTok{(}\DataTypeTok{angle=}\DecValTok{90}\NormalTok{, }\DataTypeTok{hjust=}\DecValTok{1}\NormalTok{)) }\OperatorTok{+} +\StringTok{ }\KeywordTok{theme}\NormalTok{(}\DataTypeTok{legend.position=}\StringTok{"bottom"}\NormalTok{) }\OperatorTok{+} +\StringTok{ }\KeywordTok{scale_fill_viridis_c}\NormalTok{(}\DataTypeTok{name=}\StringTok{"kW"}\NormalTok{)} +\end{Highlighting} +\end{Shaded} + +\includegraphics{/home/ba1e12/git.Soton/ba1e12/datacleaning/docs/cleaningFeederData_allData_files/figure-latex/aggViskW-1.pdf} + +\section{\texorpdfstring{Which days have the `least' +missing?}{Which days have the least missing?}}\label{which-days-have-the-least-missing} + +This is quite tricky as we may have completely missing dateTimes. But we +can test for this by counting the number of observations per dateTime +and then seeing if the dateTimes are contiguous. + +\begin{Shaded} +\begin{Highlighting}[] +\NormalTok{dateTimesDT <-}\StringTok{ }\NormalTok{feederDT[, .(}\DataTypeTok{nFeeders =} \KeywordTok{uniqueN}\NormalTok{(feeder_ID),} + \DataTypeTok{meankW =} \KeywordTok{mean}\NormalTok{(kW, }\DataTypeTok{na.rm =} \OtherTok{TRUE}\NormalTok{)), } +\NormalTok{ keyby =}\StringTok{ }\NormalTok{.(rDateTime, rTime, rDate, season)] }\CommentTok{# keep season} +\NormalTok{dateTimesDT[, dtDiff }\OperatorTok{:}\ErrorTok{=}\StringTok{ }\NormalTok{rDateTime }\OperatorTok{-}\StringTok{ }\KeywordTok{shift}\NormalTok{(rDateTime)] }\CommentTok{# should be 15 mins} + + +\KeywordTok{summary}\NormalTok{(dateTimesDT)} +\end{Highlighting} +\end{Shaded} + +\begin{verbatim} +## rDateTime rTime rDate +## Min. :2001-09-11 15:15:00 Length:549530 Min. :2001-09-11 +## 1st Qu.:2006-02-13 01:30:00 Class1:hms 1st Qu.:2006-02-13 +## Median :2010-01-20 06:00:00 Class2:difftime Median :2010-01-20 +## Mean :2010-01-05 16:51:28 Mode :numeric Mean :2010-01-05 +## 3rd Qu.:2013-12-22 21:30:00 3rd Qu.:2013-12-22 +## Max. :2020-12-31 07:15:00 Max. :2020-12-31 +## NA's :1 NA's :1 +## season nFeeders meankW dtDiff +## Spring:137919 Min. : 1.00 Min. : 0.00 Length:549530 +## Summer:132245 1st Qu.:31.00 1st Qu.: 80.40 Class :difftime +## Autumn:141490 Median :39.00 Median : 96.95 Mode :numeric +## Winter:137876 Mean :39.72 Mean : 97.00 +## 3rd Qu.:47.00 3rd Qu.:113.00 +## Max. :77.00 Max. :439.56 +## NA's :1 +\end{verbatim} + +Let's see how many unique feeders we have per dateTime. Surely we have +at least one sending data each half-hour? + +\begin{Shaded} +\begin{Highlighting}[] +\NormalTok{ggplot2}\OperatorTok{::}\KeywordTok{ggplot}\NormalTok{(dateTimesDT, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ rDate, }\DataTypeTok{y =}\NormalTok{ rTime, }\DataTypeTok{fill =}\NormalTok{ nFeeders)) }\OperatorTok{+} +\StringTok{ }\KeywordTok{geom_tile}\NormalTok{() }\OperatorTok{+} +\StringTok{ }\KeywordTok{scale_fill_viridis_c}\NormalTok{() }\OperatorTok{+} +\StringTok{ }\KeywordTok{labs}\NormalTok{(}\DataTypeTok{caption =} \StringTok{"Number of unique feeders in each dateTime"}\NormalTok{)} +\end{Highlighting} +\end{Shaded} + +\includegraphics{/home/ba1e12/git.Soton/ba1e12/datacleaning/docs/cleaningFeederData_allData_files/figure-latex/tileFeeders-1.pdf} + +No. As we suspected from the previous plots, we clearly have some +dateTimes where we have no data \emph{at all}! + +Are there time of day patterns? It looks like it\ldots{} + +\begin{Shaded} +\begin{Highlighting}[] +\NormalTok{dateTimesDT[, rYear }\OperatorTok{:}\ErrorTok{=}\StringTok{ }\NormalTok{lubridate}\OperatorTok{::}\KeywordTok{year}\NormalTok{(rDateTime)]} +\NormalTok{plotDT <-}\StringTok{ }\NormalTok{dateTimesDT[, .(}\DataTypeTok{meanN =} \KeywordTok{mean}\NormalTok{(nFeeders),} + \DataTypeTok{meankW =} \KeywordTok{mean}\NormalTok{(meankW)), keyby =}\StringTok{ }\NormalTok{.(rTime, season, rYear)]} + +\NormalTok{ggplot2}\OperatorTok{::}\KeywordTok{ggplot}\NormalTok{(plotDT, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{y =}\NormalTok{ meanN, }\DataTypeTok{x =}\NormalTok{ rTime, }\DataTypeTok{colour =}\NormalTok{ season)) }\OperatorTok{+} +\StringTok{ }\KeywordTok{geom_line}\NormalTok{() }\OperatorTok{+} +\StringTok{ }\KeywordTok{facet_wrap}\NormalTok{(rYear }\OperatorTok{~}\StringTok{ }\NormalTok{.) }\OperatorTok{+} +\StringTok{ }\KeywordTok{labs}\NormalTok{(}\DataTypeTok{y =} \StringTok{"Mean n feeders reporting"}\NormalTok{,} + \DataTypeTok{caption =} \StringTok{"Mean n feeders by time of day"}\NormalTok{)} +\end{Highlighting} +\end{Shaded} + +\includegraphics{/home/ba1e12/git.Soton/ba1e12/datacleaning/docs/cleaningFeederData_allData_files/figure-latex/missingProfiles-1.pdf} + +Oh yes. After 2003. Why? + +What about the kW? + +\begin{Shaded} +\begin{Highlighting}[] +\NormalTok{ggplot2}\OperatorTok{::}\KeywordTok{ggplot}\NormalTok{(plotDT, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{y =}\NormalTok{ meankW, }\DataTypeTok{x =}\NormalTok{ rTime, }\DataTypeTok{colour =}\NormalTok{ season)) }\OperatorTok{+} +\StringTok{ }\KeywordTok{geom_line}\NormalTok{() }\OperatorTok{+} +\StringTok{ }\KeywordTok{facet_wrap}\NormalTok{(rYear }\OperatorTok{~}\StringTok{ }\NormalTok{.) }\OperatorTok{+} +\StringTok{ }\KeywordTok{labs}\NormalTok{(}\DataTypeTok{y =} \StringTok{"Mean kw reporting"}\NormalTok{,} + \DataTypeTok{caption =} \StringTok{"Mean kw by time of day"}\NormalTok{)} +\end{Highlighting} +\end{Shaded} + +\includegraphics{/home/ba1e12/git.Soton/ba1e12/datacleaning/docs/cleaningFeederData_allData_files/figure-latex/kWProfiles-1.pdf} + +Those look as we'd expect. But do we see a correlation between the +number of observations per hour and the mean kW after 2003? There is a +suspicion that as mean kw goes up so do the number of observations per +hour\ldots{} although this could just be a correlation with low demand +periods (night time?) + +\begin{Shaded} +\begin{Highlighting}[] +\NormalTok{ggplot2}\OperatorTok{::}\KeywordTok{ggplot}\NormalTok{(plotDT, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{y =}\NormalTok{ meankW, }\DataTypeTok{x =}\NormalTok{ meanN, }\DataTypeTok{colour =}\NormalTok{ season)) }\OperatorTok{+} +\StringTok{ }\KeywordTok{geom_point}\NormalTok{() }\OperatorTok{+} +\StringTok{ }\KeywordTok{facet_wrap}\NormalTok{(rYear }\OperatorTok{~}\StringTok{ }\NormalTok{.) }\OperatorTok{+} +\StringTok{ }\KeywordTok{labs}\NormalTok{(}\DataTypeTok{y =} \StringTok{"Mean kw per quarter hour"}\NormalTok{,} + \DataTypeTok{x =} \StringTok{"Mean number feeders reporting"}\NormalTok{)} +\end{Highlighting} +\end{Shaded} + +\includegraphics{/home/ba1e12/git.Soton/ba1e12/datacleaning/docs/cleaningFeederData_allData_files/figure-latex/compareProfiles-1.pdf} + +Yes. The higher the kW, the more observations we get from 2004 onwards. +Why? + +It is distinctly odd that after 2003: + +\begin{itemize} +\tightlist +\item + we appear to have the most feeders reporting data at `peak' times +\item + we have a lot of missing dateTimes between 00:30 and 05:00 +\end{itemize} + +If the monitors were set to only collect data when the power (or Wh in a +given time frame) was above a given threshold then it would look like +this\ldots{} That wouldn't happen\ldots{} would it? + +\section{\texorpdfstring{Selecting the `best' +days}{Selecting the best days}}\label{selecting-the-best-days} + +Here we use a wide form of the feeder data which has each feeder as a +column. + +We should have 78 feeders. We want to find days when all of these +feeders have complete data. + +The wide dataset has a count of NAs per row (dateTime) from which we +infer how many feeders are reporting: + +\begin{Shaded} +\begin{Highlighting}[] +\NormalTok{wDT <-}\StringTok{ }\NormalTok{drake}\OperatorTok{::}\KeywordTok{readd}\NormalTok{(wideData) }\CommentTok{# back from the drake} +\KeywordTok{head}\NormalTok{(wDT)} +\end{Highlighting} +\end{Shaded} + +\begin{verbatim} +## rDateTime ARRN_ARRN BINS_C1T0 BINS_C2T0 BINS_E1L5 BINS_E2L5 +## 1: 2001-09-11 15:15:00 NA NA NA NA NA +## 2: 2001-09-11 15:30:00 NA NA NA NA NA +## 3: 2001-09-11 15:45:00 NA NA NA NA NA +## 4: 2001-09-21 07:30:00 NA 0 0 0 NA +## 5: 2001-09-21 08:00:00 NA 0 0 0 NA +## 6: 2001-09-21 08:30:00 NA NA NA NA NA +## BINS_E3L5 FFPV_FFPV FRES_E1L5 FRES_E1T0 FRES_E2L5 FRES_E2T0 FRES_E3L5 +## 1: NA NA NA 0 NA 0 NA +## 2: NA NA NA NA NA NA NA +## 3: NA NA NA NA NA NA NA +## 4: NA NA NA NA NA NA NA +## 5: NA NA NA NA NA NA NA +## 6: NA NA NA 0 NA NA NA +## FRES_E4L5 FRES_E6L5 NEWP_E11L5 NEWP_E13L5 NEWP_E15L5 NEWP_E17L5 NEWP_E19L5 +## 1: NA 0 NA NA NA NA NA +## 2: NA NA NA NA NA NA NA +## 3: NA NA NA NA NA NA NA +## 4: NA NA NA NA NA NA NA +## 5: NA NA NA NA NA NA NA +## 6: NA 0 NA NA NA NA NA +## NEWP_E1L5 NEWP_E1T0 NEWP_E2L5 NEWP_E2T0 NEWP_E3L5 NEWP_E3T0 NEWP_E4L5 +## 1: NA NA NA NA NA NA 0 +## 2: NA 0 0 NA NA NA NA +## 3: NA NA NA NA NA NA NA +## 4: NA NA NA NA NA NA NA +## 5: NA NA NA NA NA NA NA +## 6: NA NA NA NA NA NA NA +## NEWP_E5L5 NEWP_E6L5 NEWP_E7L5 NEWP_E8L5 NEWP_E9L5 RYDE_E1L5 RYDE_E1T0 &E1S0 +## 1: 0 NA NA NA NA NA NA +## 2: NA 0 NA NA NA NA NA +## 3: NA NA NA NA NA NA NA +## 4: NA NA NA NA NA NA NA +## 5: NA NA NA NA NA NA NA +## 6: NA NA NA NA NA NA NA +## RYDE_E2L5 RYDE_E2T0 RYDE_E3L5 RYDE_E4L5 RYDE_E5L5 RYDE_E6L5 RYDE_E7L5 +## 1: NA NA NA NA NA NA NA +## 2: NA NA NA NA NA NA NA +## 3: NA NA NA NA NA NA NA +## 4: NA NA NA NA NA NA NA +## 5: NA NA NA NA NA NA NA +## 6: NA NA NA NA NA NA NA +## RYDE_E8L5 RYDE_E9L5 SADO_E1L5 SADO_E1T0 SADO_E2L5 SADO_E2T0 SADO_E3L5 +## 1: NA NA NA NA NA NA NA +## 2: NA NA NA NA NA NA NA +## 3: NA NA NA NA NA NA NA +## 4: NA NA NA NA NA NA NA +## 5: NA NA NA NA NA NA NA +## 6: NA NA NA NA NA NA NA +## SADO_E4L5 SADO_E5L5 SADO_E6L5 SADO_E8L5 SHAL_C3L5 SHAL_C4L5 SHAL_C5L5 +## 1: NA NA NA NA NA NA NA +## 2: NA NA NA NA NA 0 NA +## 3: NA NA NA NA NA NA NA +## 4: NA NA NA NA NA NA NA +## 5: NA NA NA NA NA NA NA +## 6: NA NA NA NA NA NA NA +## SHAL_E1L5 SHAL_E1T0 SHAL_E2L5 SHAL_E2T0 SHAL_E3L5 SHAL_E4L5 SHAN_E1L5 +## 1: NA NA NA NA NA NA NA +## 2: NA NA NA NA NA 0 NA +## 3: NA NA NA NA NA NA NA +## 4: NA NA NA NA NA NA NA +## 5: NA NA NA NA NA NA NA +## 6: NA NA NA NA NA NA NA +## SHAN_E1T0 SHAN_E2L5 SHAN_E2T0 SHAN_E3L5 SHAN_E4L5 SHAN_E5L5 SHAN_E6L5 +## 1: NA NA NA NA NA NA NA +## 2: NA NA NA NA NA NA NA +## 3: NA NA NA NA NA NA NA +## 4: NA NA NA NA NA NA NA +## 5: NA NA NA NA NA NA NA +## 6: NA NA NA NA NA NA NA +## SHAN_E7L5 SHAN_E8L5 VENT_E1L5 VENT_E1T0 VENT_E2L5 VENT_E2T0 VENT_E3L5 +## 1: NA NA NA NA NA NA NA +## 2: NA NA NA 0 NA NA NA +## 3: NA NA 0 0 NA 0 0 +## 4: NA NA NA NA NA NA NA +## 5: NA NA NA NA NA NA NA +## 6: NA NA NA NA NA NA NA +## VENT_E4L5 VENT_E5L5 VENT_E6L5 nNA nFeedersReporting +## 1: NA NA NA 73 5 +## 2: NA NA NA 72 6 +## 3: 0 NA NA 73 5 +## 4: NA NA NA 75 3 +## 5: NA NA NA 75 3 +## 6: NA NA NA 76 2 +\end{verbatim} + +If we take the mean of the number of feeders reporting per day (date) +then a value of 25 will indicate a day when \emph{all} feeders have +\emph{all} data (since it would be the mean of all the '25's). + +\begin{Shaded} +\begin{Highlighting}[] +\NormalTok{wDT <-}\StringTok{ }\KeywordTok{addSeason}\NormalTok{(wDT, }\DataTypeTok{dateVar =} \StringTok{"rDateTime"}\NormalTok{, }\DataTypeTok{h =} \StringTok{"N"}\NormalTok{)} +\NormalTok{wDT[, rDoW }\OperatorTok{:}\ErrorTok{=}\StringTok{ }\NormalTok{lubridate}\OperatorTok{::}\KeywordTok{wday}\NormalTok{(rDateTime)]} +\NormalTok{wDT[, rDate }\OperatorTok{:}\ErrorTok{=}\StringTok{ }\NormalTok{lubridate}\OperatorTok{::}\KeywordTok{date}\NormalTok{(rDateTime)]} + +\CommentTok{# how many days have all feeders sending data in all dateTimes?} + +\NormalTok{aggDT <-}\StringTok{ }\NormalTok{wDT[, .(}\DataTypeTok{meanOK =} \KeywordTok{mean}\NormalTok{(nFeedersReporting),} + \DataTypeTok{minOk =} \KeywordTok{min}\NormalTok{(nFeedersReporting),} + \DataTypeTok{maxOk =} \KeywordTok{max}\NormalTok{(nFeedersReporting),} + \DataTypeTok{sumOK =} \KeywordTok{sum}\NormalTok{(nFeedersReporting) }\CommentTok{# will have a max of n feeders * 24 hours * 4 quarter hours} +\NormalTok{),} +\NormalTok{keyby =}\StringTok{ }\NormalTok{.(rDate, season)]} + +\NormalTok{aggDT[, propExpected }\OperatorTok{:}\ErrorTok{=}\StringTok{ }\NormalTok{sumOK}\OperatorTok{/}\NormalTok{(}\KeywordTok{uniqueN}\NormalTok{(feederDT}\OperatorTok{$}\NormalTok{feeder_ID)}\OperatorTok{*}\DecValTok{24}\OperatorTok{*}\DecValTok{4}\NormalTok{)] }\CommentTok{# we expect 25*24*4} + +\KeywordTok{summary}\NormalTok{(aggDT)} +\end{Highlighting} +\end{Shaded} + +\begin{verbatim} +## rDate season meanOK minOk +## Min. :2001-09-11 Spring:1531 Min. : 1.00 Min. : 0.0 +## 1st Qu.:2006-04-21 Summer:1471 1st Qu.:34.05 1st Qu.:14.0 +## Median :2010-06-23 Autumn:1568 Median :37.38 Median :18.0 +## Mean :2010-07-24 Winter:1525 Mean :37.54 Mean :21.2 +## 3rd Qu.:2014-08-24 3rd Qu.:40.85 3rd Qu.:22.0 +## Max. :2020-12-31 Max. :63.85 Max. :62.0 +## NA's :1 +## maxOk sumOK propExpected +## Min. : 1.00 Min. : 1 Min. :0.0001335 +## 1st Qu.:53.00 1st Qu.:3261 1st Qu.:0.4354968 +## Median :57.00 Median :3582 Median :0.4783654 +## Mean :54.27 Mean :3581 Mean :0.4782117 +## 3rd Qu.:61.00 3rd Qu.:3916 3rd Qu.:0.5229033 +## Max. :77.00 Max. :6130 Max. :0.8186432 +## +\end{verbatim} + +\begin{Shaded} +\begin{Highlighting}[] +\KeywordTok{message}\NormalTok{(}\StringTok{"How many days have 100%?"}\NormalTok{)} +\KeywordTok{nrow}\NormalTok{(aggDT[propExpected }\OperatorTok{==}\StringTok{ }\DecValTok{1}\NormalTok{])} +\end{Highlighting} +\end{Shaded} + +\begin{verbatim} +## [1] 0 +\end{verbatim} + +If we plot the mean then we will see which days get closest to having a +full dataset. + +\begin{Shaded} +\begin{Highlighting}[] +\NormalTok{ggplot2}\OperatorTok{::}\KeywordTok{ggplot}\NormalTok{(aggDT, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ rDate, }\DataTypeTok{colour =}\NormalTok{ season, }\DataTypeTok{y =}\NormalTok{ meanOK)) }\OperatorTok{+}\StringTok{ }\KeywordTok{geom_point}\NormalTok{()} +\end{Highlighting} +\end{Shaded} + +\includegraphics{/home/ba1e12/git.Soton/ba1e12/datacleaning/docs/cleaningFeederData_allData_files/figure-latex/bestDaysMean-1.pdf} + +Re-plot by the \% of expected if we assume we \emph{should} have 25 +feeders * 24 hours * 4 per hour (will be the same shape): + +\begin{Shaded} +\begin{Highlighting}[] +\NormalTok{ggplot2}\OperatorTok{::}\KeywordTok{ggplot}\NormalTok{(aggDT, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ rDate, }\DataTypeTok{colour =}\NormalTok{ season, }\DataTypeTok{y =} \DecValTok{100}\OperatorTok{*}\NormalTok{propExpected)) }\OperatorTok{+}\StringTok{ }\KeywordTok{geom_point}\NormalTok{() }\OperatorTok{+} +\StringTok{ }\KeywordTok{labs}\NormalTok{(}\DataTypeTok{y =} \StringTok{"%"}\NormalTok{)} +\end{Highlighting} +\end{Shaded} + +\includegraphics{/home/ba1e12/git.Soton/ba1e12/datacleaning/docs/cleaningFeederData_allData_files/figure-latex/bestDaysProp-1.pdf} + +This also tells us that there is some reason why we get fluctations in +the number of data points per hour after 2003. + +\section{Summary}\label{summary} + +So there are no days with 100\% data. We need a different approach. + +\section{Runtime}\label{runtime} + +Analysis completed in 211.44 seconds ( 3.52 minutes) using +\href{https://cran.r-project.org/package=knitr}{knitr} in +\href{http://www.rstudio.com}{RStudio} with R version 3.6.0 (2019-04-26) +running on x86\_64-redhat-linux-gnu. + +\section{R environment}\label{r-environment} + +\subsection{R packages used}\label{r-packages-used} + +\begin{itemize} +\tightlist +\item + base R (R Core Team 2016) +\item + bookdown (Xie 2018) +\item + data.table (Dowle et al. 2015) +\item + ggplot2 (Wickham 2009) +\item + kableExtra (Zhu 2019) +\item + knitr (Xie 2016) +\item + lubridate (Grolemund and Wickham 2011) +\item + rmarkdown (Allaire et al. 2020) +\item + skimr (Arino de la Rubia et al. 2017) +\end{itemize} + +\subsection{Session info}\label{session-info} + +\begin{verbatim} +## R version 3.6.0 (2019-04-26) +## Platform: x86_64-redhat-linux-gnu (64-bit) +## Running under: Red Hat Enterprise Linux +## +## Matrix products: default +## BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so +## +## locale: +## [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C +## [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 +## [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 +## [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C +## [9] LC_ADDRESS=C LC_TELEPHONE=C +## [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C +## +## attached base packages: +## [1] stats graphics grDevices utils datasets methods base +## +## other attached packages: +## [1] kableExtra_1.1.0 skimr_2.1.1 ggplot2_3.3.1 hms_0.5.3 +## [5] lubridate_1.7.9 here_0.1 drake_7.12.2 data.table_1.12.0 +## [9] dataCleaning_0.1.0 +## +## loaded via a namespace (and not attached): +## [1] storr_1.2.1 progress_1.2.2 tidyselect_1.1.0 xfun_0.14 +## [5] repr_1.1.0 purrr_0.3.4 colorspace_1.4-0 vctrs_0.3.1 +## [9] generics_0.0.2 viridisLite_0.3.0 htmltools_0.3.6 yaml_2.2.0 +## [13] base64enc_0.1-3 rlang_0.4.6 pillar_1.4.4 txtq_0.2.0 +## [17] glue_1.4.1 withr_2.1.2 lifecycle_0.2.0 stringr_1.4.0 +## [21] munsell_0.5.0 gtable_0.2.0 rvest_0.3.5 evaluate_0.14 +## [25] labeling_0.3 knitr_1.28 parallel_3.6.0 fansi_0.4.0 +## [29] highr_0.7 Rcpp_1.0.1 readr_1.3.1 scales_1.0.0 +## [33] backports_1.1.3 filelock_1.0.2 webshot_0.5.2 jsonlite_1.6 +## [37] digest_0.6.25 stringi_1.2.4 dplyr_1.0.0 grid_3.6.0 +## [41] rprojroot_1.3-2 cli_2.0.2 tools_3.6.0 magrittr_1.5 +## [45] base64url_1.4 tibble_3.0.1 crayon_1.3.4 pkgconfig_2.0.2 +## [49] ellipsis_0.3.1 xml2_1.3.2 prettyunits_1.0.2 httr_1.4.1 +## [53] assertthat_0.2.0 rmarkdown_2.2 rstudioapi_0.11 R6_2.3.0 +## [57] igraph_1.2.2 compiler_3.6.0 +\end{verbatim} + +\section{The raw data cleaning code}\label{the-raw-data-cleaning-code} + +\begin{enumerate} +\def\labelenumi{(\alph{enumi})} +\setcounter{enumi}{2} +\tightlist +\item + Mikey Harper :-) +\end{enumerate} + +Starts here: + +Scripts used clean and merge substation data. + +\subsection{Input files}\label{input-files} + +Analysis will first look at the primary data. There are different types +of files which refer to different paramters. Different search terms are +used to extract these: + +\begin{Shaded} +\begin{Highlighting}[] +\CommentTok{# Find files with AMPS. Exclude files which contain DI~CO} +\NormalTok{files_AMPS <-}\StringTok{ }\KeywordTok{list.files}\NormalTok{(}\StringTok{"../Primary"}\NormalTok{, }\DataTypeTok{recursive =}\NormalTok{ T, }\DataTypeTok{pattern =} \StringTok{"~AMPS"}\NormalTok{, }\DataTypeTok{full.names =}\NormalTok{ T) }\OperatorTok{%>%} +\StringTok{ }\NormalTok{.[}\OperatorTok{!}\NormalTok{stringr}\OperatorTok{::}\KeywordTok{str_detect}\NormalTok{ (., }\StringTok{"DI~CO"}\NormalTok{)]} + +\NormalTok{files_AMPS} +\end{Highlighting} +\end{Shaded} + +\subsection{Process Amps}\label{process-amps} + +\begin{Shaded} +\begin{Highlighting}[] +\CommentTok{# Show a sample} +\NormalTok{fileSelect <-}\StringTok{ }\NormalTok{files_AMPS[}\DecValTok{4}\NormalTok{]} +\KeywordTok{head}\NormalTok{(}\KeywordTok{read_csv}\NormalTok{(fileSelect, }\DataTypeTok{skip =} \DecValTok{3}\NormalTok{))} +\end{Highlighting} +\end{Shaded} + +Again a function is used to do all the processing on the input CSVs. +This is slightly amended from the \texttt{processkV} function. + +\begin{Shaded} +\begin{Highlighting}[] +\NormalTok{processAMPS <-}\StringTok{ }\ControlFlowTok{function}\NormalTok{(filePath, }\DataTypeTok{databaseCon =}\NormalTok{ con)\{} + + \KeywordTok{message}\NormalTok{(}\StringTok{"Processing "}\NormalTok{, filePath)} + + \CommentTok{# 1st Level} +\NormalTok{ dirName_}\DecValTok{1}\NormalTok{ <-}\StringTok{ }\NormalTok{filePath }\OperatorTok{%>%}\StringTok{ } +\StringTok{ }\KeywordTok{dirname}\NormalTok{() }\OperatorTok{%>%}\StringTok{ } +\StringTok{ }\NormalTok{basename} + + \CommentTok{# 2nd Level} +\NormalTok{ dirName_}\DecValTok{2}\NormalTok{ <-}\StringTok{ }\NormalTok{filePath }\OperatorTok{%>%}\StringTok{ } +\StringTok{ }\KeywordTok{dirname}\NormalTok{() }\OperatorTok{%>%}\StringTok{ } +\StringTok{ }\KeywordTok{dirname}\NormalTok{() }\OperatorTok{%>%}\StringTok{ } +\StringTok{ }\NormalTok{basename} + + \ControlFlowTok{if}\NormalTok{ (dirName_}\DecValTok{2} \OperatorTok{==}\StringTok{ "Primary"}\NormalTok{)\{} +\NormalTok{ dirName_}\DecValTok{2}\NormalTok{ <-}\StringTok{ }\NormalTok{dirName_}\DecValTok{1} +\NormalTok{ dirName_}\DecValTok{1}\NormalTok{ <-}\StringTok{ ""} +\NormalTok{ \}} + + \CommentTok{# Load the CSV. There were some tab seperated files which are saved as CSVs, which confuse the search. There if the data is loaded incorrectly (only having a single column), the code will try and load it as a TSV.} +\NormalTok{ dataLoaded <-}\StringTok{ }\KeywordTok{suppressWarnings}\NormalTok{(}\KeywordTok{read_csv}\NormalTok{(filePath, }\DataTypeTok{skip =} \DecValTok{3}\NormalTok{, }\DataTypeTok{col_types =} \KeywordTok{cols}\NormalTok{(}\DataTypeTok{Value =} \KeywordTok{col_number}\NormalTok{())))} + \ControlFlowTok{if}\NormalTok{(}\KeywordTok{ncol}\NormalTok{(dataLoaded) }\OperatorTok{==}\StringTok{ }\DecValTok{1}\NormalTok{)\{} +\NormalTok{ dataLoaded <-}\StringTok{ }\KeywordTok{suppressWarnings}\NormalTok{(}\KeywordTok{read_tsv}\NormalTok{(filePath, }\DataTypeTok{skip =} \DecValTok{3}\NormalTok{, }\DataTypeTok{col_types =} \KeywordTok{cols}\NormalTok{()))} +\NormalTok{ \}} + + \CommentTok{# Reformat data} +\NormalTok{ dataLoaded <-} +\StringTok{ }\NormalTok{dataLoaded }\OperatorTok{%>%} +\StringTok{ }\KeywordTok{mutate_at}\NormalTok{(}\KeywordTok{vars}\NormalTok{(Time), }\ControlFlowTok{function}\NormalTok{(x)\{}\KeywordTok{gsub}\NormalTok{(}\StringTok{'[^ -~]'}\NormalTok{, }\StringTok{''}\NormalTok{, x)\}) }\OperatorTok{%>%}\StringTok{ }\CommentTok{# Remove invalid UTF characters} +\StringTok{ }\KeywordTok{mutate}\NormalTok{(}\DataTypeTok{Time =}\NormalTok{ lubridate}\OperatorTok{::}\KeywordTok{dmy_hms}\NormalTok{(Time),} + \DataTypeTok{Time =}\NormalTok{ lubridate}\OperatorTok{::}\KeywordTok{floor_date}\NormalTok{(Time, }\DataTypeTok{unit =} \StringTok{"15 minutes"}\NormalTok{)) }\OperatorTok{%>%}\StringTok{ } +\StringTok{ }\KeywordTok{group_by}\NormalTok{(Time) }\OperatorTok{%>%} +\StringTok{ }\KeywordTok{summarise}\NormalTok{(}\DataTypeTok{Value =} \KeywordTok{mean}\NormalTok{(Value, }\DataTypeTok{na.rm =}\NormalTok{ T)) }\OperatorTok{%>%} +\StringTok{ }\KeywordTok{mutate}\NormalTok{(}\DataTypeTok{region =}\NormalTok{ dirName_}\DecValTok{2}\NormalTok{,} + \DataTypeTok{sub_region =}\NormalTok{ dirName_}\DecValTok{1} +\NormalTok{ )} + + \CommentTok{# There are some datasets which contain no values, whch can cause errors in running} + \CommentTok{# If this happens, return NULL} + \ControlFlowTok{if}\NormalTok{(}\KeywordTok{is.character}\NormalTok{(dataLoaded}\OperatorTok{$}\NormalTok{Value)) }\KeywordTok{return}\NormalTok{(}\OtherTok{NULL}\NormalTok{)} + + \KeywordTok{return}\NormalTok{(dataLoaded)} +\NormalTok{\}} +\end{Highlighting} +\end{Shaded} + +Run the function below: + +\begin{Shaded} +\begin{Highlighting}[] +\NormalTok{Amps <-}\StringTok{ }\NormalTok{purrr}\OperatorTok{::}\KeywordTok{map_df}\NormalTok{(files_AMPS, processAMPS)} +\end{Highlighting} +\end{Shaded} + +\begin{Shaded} +\begin{Highlighting}[] +\NormalTok{Amps_stats <-}\StringTok{ }\NormalTok{Amps }\OperatorTok{%>%} +\StringTok{ }\KeywordTok{group_by}\NormalTok{(region) }\OperatorTok{%>%} +\StringTok{ }\KeywordTok{summarise}\NormalTok{(}\DataTypeTok{mean =}\NormalTok{ (}\KeywordTok{mean}\NormalTok{(Value, }\DataTypeTok{na.rm =}\NormalTok{ T)),} + \DataTypeTok{n =} \KeywordTok{n}\NormalTok{(),} + \DataTypeTok{sd =} \KeywordTok{sd}\NormalTok{(Value, }\DataTypeTok{na.rm =}\NormalTok{ T),} + \DataTypeTok{var =} \KeywordTok{var}\NormalTok{(Value, }\DataTypeTok{na.rm =}\NormalTok{ T))} + +\NormalTok{Amps_stats} + +\NormalTok{readr}\OperatorTok{::}\KeywordTok{write_csv}\NormalTok{(Amps_stats, }\DataTypeTok{path =} \StringTok{"../Amps_stats.csv"}\NormalTok{)} +\end{Highlighting} +\end{Shaded} + +\begin{Shaded} +\begin{Highlighting}[] +\KeywordTok{ggplot}\NormalTok{(Amps) }\OperatorTok{+} +\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =}\NormalTok{ Time, }\DataTypeTok{y =}\NormalTok{ Value, }\DataTypeTok{colour =}\NormalTok{ region)) }\OperatorTok{+} +\StringTok{ }\KeywordTok{facet_grid}\NormalTok{(region}\OperatorTok{~}\NormalTok{., }\DataTypeTok{scales =} \StringTok{"free_y"}\NormalTok{) }\OperatorTok{+} +\StringTok{ }\KeywordTok{labs}\NormalTok{(}\DataTypeTok{title =} \StringTok{"Cleaned data for Amps"}\NormalTok{)} +\end{Highlighting} +\end{Shaded} + +\subsection{Processing data}\label{processing-data} + +\begin{Shaded} +\begin{Highlighting}[] +\NormalTok{readr}\OperatorTok{::}\KeywordTok{write_csv}\NormalTok{(Amps, }\DataTypeTok{path =} \StringTok{"amps_all_substations.csv"}\NormalTok{)} +\end{Highlighting} +\end{Shaded} + +\begin{Shaded} +\begin{Highlighting}[] +\KeywordTok{library}\NormalTok{(odbc)} + +\KeywordTok{library}\NormalTok{(DBI)} +\CommentTok{# Create an ephemeral in-memory RSQLite database} +\NormalTok{con <-}\StringTok{ }\KeywordTok{dbConnect}\NormalTok{(RSQLite}\OperatorTok{::}\KeywordTok{SQLite}\NormalTok{(), }\StringTok{"amps.sqlite"}\NormalTok{)} + +\KeywordTok{dbListTables}\NormalTok{(con)} + + +\KeywordTok{dbWriteTable}\NormalTok{(con, }\StringTok{"amps"}\NormalTok{, Amps)} +\KeywordTok{dbListTables}\NormalTok{(con)} +\end{Highlighting} +\end{Shaded} + +\subsection{Querying the data}\label{querying-the-data} + +\begin{Shaded} +\begin{Highlighting}[] +\NormalTok{con <-}\StringTok{ }\KeywordTok{dbConnect}\NormalTok{(RSQLite}\OperatorTok{::}\KeywordTok{SQLite}\NormalTok{(), }\StringTok{"amps.sqlite"}\NormalTok{)} + + +\KeywordTok{library}\NormalTok{(dbplyr)} + +\NormalTok{Amps_db <-}\StringTok{ }\KeywordTok{tbl}\NormalTok{(con, }\StringTok{"amps"}\NormalTok{)} + + +\NormalTok{flights_db }\OperatorTok{%>%} +\StringTok{ }\KeywordTok{group_by}\NormalTok{(region) }\OperatorTok{%>%} +\StringTok{ }\KeywordTok{summarise}\NormalTok{(}\DataTypeTok{mean =}\NormalTok{ (}\KeywordTok{mean}\NormalTok{(Value, }\DataTypeTok{na.rm =}\NormalTok{ T)),} + \DataTypeTok{n =} \KeywordTok{n}\NormalTok{(),} + \DataTypeTok{sd =} \KeywordTok{sd}\NormalTok{(Value, }\DataTypeTok{na.rm =}\NormalTok{ T),} + \DataTypeTok{var =} \KeywordTok{var}\NormalTok{(Value, }\DataTypeTok{na.rm =}\NormalTok{ T))} +\end{Highlighting} +\end{Shaded} + +\subsection{Round to Nearest N +minutes}\label{round-to-nearest-n-minutes} + +\begin{Shaded} +\begin{Highlighting}[] +\NormalTok{processAMPS_5mins <-}\StringTok{ }\ControlFlowTok{function}\NormalTok{(filePath)\{} + + \KeywordTok{message}\NormalTok{(}\StringTok{"Processing "}\NormalTok{, filePath)} + + \CommentTok{# 1st Level} +\NormalTok{ dirName_}\DecValTok{1}\NormalTok{ <-}\StringTok{ }\NormalTok{filePath }\OperatorTok{%>%}\StringTok{ } +\StringTok{ }\KeywordTok{dirname}\NormalTok{() }\OperatorTok{%>%}\StringTok{ } +\StringTok{ }\NormalTok{basename} + + \CommentTok{# 2nd Level} +\NormalTok{ dirName_}\DecValTok{2}\NormalTok{ <-}\StringTok{ }\NormalTok{filePath }\OperatorTok{%>%}\StringTok{ } +\StringTok{ }\KeywordTok{dirname}\NormalTok{() }\OperatorTok{%>%}\StringTok{ } +\StringTok{ }\KeywordTok{dirname}\NormalTok{() }\OperatorTok{%>%}\StringTok{ } +\StringTok{ }\NormalTok{basename} + + \ControlFlowTok{if}\NormalTok{ (dirName_}\DecValTok{2} \OperatorTok{==}\StringTok{ "Primary"}\NormalTok{)\{} +\NormalTok{ dirName_}\DecValTok{2}\NormalTok{ <-}\StringTok{ }\NormalTok{dirName_}\DecValTok{1} +\NormalTok{ dirName_}\DecValTok{1}\NormalTok{ <-}\StringTok{ ""} +\NormalTok{ \}} + + \CommentTok{# Load the CSV. There were some tab seperated files which are saved as CSVs, which confuse the search. There if the data is loaded incorrectly (only having a single column), the code will try and load it as a TSV.} +\NormalTok{ dataLoaded <-}\StringTok{ }\KeywordTok{suppressWarnings}\NormalTok{(}\KeywordTok{read_csv}\NormalTok{(filePath, }\DataTypeTok{skip =} \DecValTok{3}\NormalTok{, }\DataTypeTok{col_types =} \KeywordTok{cols}\NormalTok{()))} + \ControlFlowTok{if}\NormalTok{(}\KeywordTok{ncol}\NormalTok{(dataLoaded) }\OperatorTok{==}\StringTok{ }\DecValTok{1}\NormalTok{)\{} +\NormalTok{ dataLoaded <-}\StringTok{ }\KeywordTok{suppressWarnings}\NormalTok{(}\KeywordTok{read_tsv}\NormalTok{(filePath, }\DataTypeTok{skip =} \DecValTok{3}\NormalTok{, }\DataTypeTok{col_types =} \KeywordTok{cols}\NormalTok{()))} +\NormalTok{ \}} + + \CommentTok{# Reformat data} +\NormalTok{ dataLoaded <-} +\StringTok{ }\NormalTok{dataLoaded }\OperatorTok{%>%} +\StringTok{ }\KeywordTok{mutate_at}\NormalTok{(}\KeywordTok{vars}\NormalTok{(Time), }\ControlFlowTok{function}\NormalTok{(x)\{}\KeywordTok{gsub}\NormalTok{(}\StringTok{'[^ -~]'}\NormalTok{, }\StringTok{''}\NormalTok{, x)\}) }\OperatorTok{%>%}\StringTok{ }\CommentTok{# Remove invalid UTF characters} +\StringTok{ }\KeywordTok{mutate}\NormalTok{(}\DataTypeTok{Time =}\NormalTok{ lubridate}\OperatorTok{::}\KeywordTok{dmy_hms}\NormalTok{(Time),} + \DataTypeTok{region =}\NormalTok{ dirName_}\DecValTok{2}\NormalTok{,} + \DataTypeTok{sub_region =}\NormalTok{ dirName_}\DecValTok{1}\NormalTok{,} + \DataTypeTok{code =} \KeywordTok{paste}\NormalTok{(region, sub_region, }\DataTypeTok{sep =} \StringTok{"_"}\NormalTok{),} +\NormalTok{ ) }\OperatorTok{%>%} +\StringTok{ }\KeywordTok{mutate}\NormalTok{(}\DataTypeTok{Time =}\NormalTok{ lubridate}\OperatorTok{::}\KeywordTok{floor_date}\NormalTok{(Time, }\DataTypeTok{unit =} \StringTok{"5 minutes"}\NormalTok{)) }\OperatorTok{%>%} +\StringTok{ }\KeywordTok{group_by}\NormalTok{(Time, region, code) }\OperatorTok{%>%} +\StringTok{ }\KeywordTok{summarise}\NormalTok{(}\DataTypeTok{Value =} \KeywordTok{mean}\NormalTok{(Value)) }\OperatorTok{%>%} +\StringTok{ }\KeywordTok{arrange}\NormalTok{(Time)} + + \CommentTok{# There are some datasets which contain no values, whch can cause errors in running} + \CommentTok{# If this happens, return NULL} + \ControlFlowTok{if}\NormalTok{(}\KeywordTok{is.character}\NormalTok{(dataLoaded}\OperatorTok{$}\NormalTok{Value)) }\KeywordTok{return}\NormalTok{(}\OtherTok{NULL}\NormalTok{)} + + \CommentTok{# Returns the loaded and cleaned dataframe} + \KeywordTok{return}\NormalTok{(dataLoaded)} +\NormalTok{\}} +\end{Highlighting} +\end{Shaded} + +Nearest 5 minutes: + +\begin{Shaded} +\begin{Highlighting}[] +\NormalTok{Amps_5mins <<-}\StringTok{ }\NormalTok{purrr}\OperatorTok{::}\KeywordTok{map_df}\NormalTok{(files_AMPS[}\DecValTok{1}\OperatorTok{:}\DecValTok{4}\NormalTok{], processAMPS_5mins)} +\end{Highlighting} +\end{Shaded} + +\section*{References}\label{references} +\addcontentsline{toc}{section}{References} + +\hypertarget{refs}{} +\hypertarget{ref-rmarkdown}{} +Allaire, JJ, Yihui Xie, Jonathan McPherson, Javier Luraschi, Kevin +Ushey, Aron Atkins, Hadley Wickham, Joe Cheng, Winston Chang, and +Richard Iannone. 2020. \emph{Rmarkdown: Dynamic Documents for R}. +\url{https://github.com/rstudio/rmarkdown}. + +\hypertarget{ref-skimr}{} +Arino de la Rubia, Eduardo, Hao Zhu, Shannon Ellis, Elin Waring, and +Michael Quinn. 2017. \emph{Skimr: Skimr}. +\url{https://github.com/ropenscilabs/skimr}. + +\hypertarget{ref-data.table}{} +Dowle, M, A Srinivasan, T Short, S Lianoglou with contributions from R +Saporta, and E Antonyan. 2015. \emph{Data.table: Extension of +Data.frame}. \url{https://CRAN.R-project.org/package=data.table}. + +\hypertarget{ref-lubridate}{} +Grolemund, Garrett, and Hadley Wickham. 2011. ``Dates and Times Made +Easy with lubridate.'' \emph{Journal of Statistical Software} 40 (3): +1--25. \url{http://www.jstatsoft.org/v40/i03/}. + +\hypertarget{ref-baseR}{} +R Core Team. 2016. \emph{R: A Language and Environment for Statistical +Computing}. Vienna, Austria: R Foundation for Statistical Computing. +\url{https://www.R-project.org/}. + +\hypertarget{ref-ggplot2}{} +Wickham, Hadley. 2009. \emph{Ggplot2: Elegant Graphics for Data +Analysis}. Springer-Verlag New York. \url{http://ggplot2.org}. + +\hypertarget{ref-knitr}{} +Xie, Yihui. 2016. \emph{Knitr: A General-Purpose Package for Dynamic +Report Generation in R}. \url{https://CRAN.R-project.org/package=knitr}. + +\hypertarget{ref-bookdown}{} +---------. 2018. \emph{Bookdown: Authoring Books and Technical Documents +with R Markdown}. \url{https://github.com/rstudio/bookdown}. + +\hypertarget{ref-kableExtra}{} +Zhu, Hao. 2019. \emph{KableExtra: Construct Complex Table with 'Kable' +and Pipe Syntax}. \url{https://CRAN.R-project.org/package=kableExtra}. + +\end{document}