Commit 4d734d62 authored by B.Anderson's avatar B.Anderson
Browse files

fixed bib (was absent); html no longer self-contained so plots exist...

fixed bib (was absent); html no longer self-contained so plots exist seperately; floating ToC still broken in html (why?); pdf fails (probabky table too big)
parent c11387e4
......@@ -9,25 +9,24 @@ author: '`r params$authors`'
date: 'Last run at: `r Sys.time()`'
output:
bookdown::html_document2:
code_folding: hide
self_contained: TRUE
fig_caption: yes
code_folding: hide
number_sections: yes
self_contained: yes
toc: yes
toc_depth: 3
toc_float: yes
bookdown::word_document2:
fig_caption: yes
toc: yes
toc_depth: 2
toc_float: TRUE
bookdown::pdf_document2:
fig_caption: yes
keep_tex: yes
number_sections: yes
bookdown::word_document2:
fig_caption: yes
number_sections: yes
toc: yes
toc_depth: 2
fig_width: 5
always_allow_html: yes
bibliography: '`r path.expand("~/bibliography.bib")`'
bibliography: '`r paste0(here::here(), "/bibliography.bib")`'
---
```{r setup}
......@@ -73,8 +72,12 @@ Loaded data from `r dFile`... (using drake)
```{r loadData}
origDataDT <- drake::readd(origData) # readd the drake object
head(origDataDT)
uniqDataDT <- drake::readd(uniqData) # readd the drake object
kableExtra::kable(head(origDataDT), digits = 2,
caption = "Counts per feeder (long table)") %>%
kable_styling()
```
Check data prep worked OK.
......@@ -82,8 +85,8 @@ Check data prep worked OK.
```{r dataPrep}
# check
t <- origDataDT[, .(nObs = .N,
firstDate = min(rDateTime),
lastDate = max(rDateTime),
firstDate = min(rDateTime, na.rm = TRUE),
lastDate = max(rDateTime, na.rm = TRUE),
meankW = mean(kW, na.rm = TRUE)
), keyby = .(region, feeder_ID)]
......@@ -104,7 +107,7 @@ message("So we have ", tidyNum(nrow(origDataDT) - nrow(uniqDataDT)), " duplicate
pc <- 100*((nrow(origDataDT) - nrow(uniqDataDT))/nrow(origDataDT))
message("That's ", round(pc,2), "%")
feederDT <- uniqDataDT # use dt with no duplicates
feederDT <- uniqDataDT[!is.na(rDateTime)] # use dt with no duplicates
origDataDT <- NULL # save memory
```
......@@ -114,7 +117,7 @@ So we remove the duplicates...
Try aggregated demand profiles of mean kW by season and feeder and day of the week... Remove the legend so we can see the plot.
```{r kwProfiles}
```{r kwProfiles, fig.width=8}
plotDT <- feederDT[, .(meankW = mean(kW),
nObs = .N), keyby = .(rTime, season, feeder_ID, rDoW)]
......@@ -131,7 +134,7 @@ Is that what we expect?
Number of observations per feeder per day - gaps will be visible (totally missing days) as will low counts (partially missing days) - we would expect 24 * 4... Convert this to a % of expected...
```{r basicCountTile, fig.height=10}
```{r basicCountTile, fig.height=10, fig.width=8}
plotDT <- feederDT[, .(nObs = .N), keyby = .(rDate, feeder_ID)]
plotDT[, propExpected := nObs/(24*4)]
......@@ -148,7 +151,7 @@ This is not good. There are both gaps (missing days) and partial days. **Lots**
What does it look like if we aggregate across all feeders by time? There are `r uniqueN(feederDT$feeder_ID)` feeders so we should get this many at best How close do we get?
```{r aggVisN}
```{r aggVisN, fig.width=8}
plotDT <- feederDT[, .(nObs = .N,
meankW = mean(kW)), keyby = .(rTime, rDate, season)]
......@@ -167,7 +170,7 @@ That really doesn't look too good. There are some very odd fluctuations in there
What do the mean kw patterns look like per feeder per day?
```{r basickWTile, fig.height=10}
```{r basickWTile, fig.height=10, fig.width=8}
plotDT <- feederDT[, .(meankW = mean(kW, na.rm = TRUE)), keyby = .(rDate, feeder_ID)]
ggplot2::ggplot(plotDT, aes(x = rDate, y = feeder_ID, fill = meankW)) +
......@@ -183,7 +186,7 @@ Missing data is even more clearly visible.
What about mean kw across all feeders?
```{r aggViskW}
```{r aggViskW, fig.width=8}
plotDT <- feederDT[, .(nObs = .N,
meankW = mean(kW)), keyby = .(rTime, rDate, season)]
......@@ -213,7 +216,7 @@ summary(dateTimesDT)
Let's see how many unique feeders we have per dateTime. Surely we have at least one sending data each half-hour?
```{r tileFeeders}
```{r tileFeeders, fig.width=8}
ggplot2::ggplot(dateTimesDT, aes(x = rDate, y = rTime, fill = nFeeders)) +
geom_tile() +
scale_fill_viridis_c() +
......@@ -224,7 +227,7 @@ No. As we suspected from the previous plots, we clearly have some dateTimes wher
Are there time of day patterns? It looks like it...
```{r missingProfiles}
```{r missingProfiles, fig.width=8}
dateTimesDT[, rYear := lubridate::year(rDateTime)]
plotDT <- dateTimesDT[, .(meanN = mean(nFeeders),
meankW = mean(meankW)), keyby = .(rTime, season, rYear)]
......@@ -240,7 +243,7 @@ Oh yes. After 2003. Why?
What about the kW?
```{r kWProfiles}
```{r kWProfiles, fig.width=8}
ggplot2::ggplot(plotDT, aes(y = meankW, x = rTime, colour = season)) +
geom_line() +
......@@ -251,7 +254,7 @@ ggplot2::ggplot(plotDT, aes(y = meankW, x = rTime, colour = season)) +
Those look as we'd expect. But do we see a correlation between the number of observations per hour and the mean kW after 2003? There is a suspicion that as mean kw goes up so do the number of observations per hour... although this could just be a correlation with low demand periods (night time?)
```{r compareProfiles}
```{r compareProfiles, fig.width=8}
ggplot2::ggplot(plotDT, aes(y = meankW, x = meanN, colour = season)) +
geom_point() +
facet_wrap(rYear ~ .) +
......@@ -278,7 +281,7 @@ The wide dataset has a count of NAs per row (dateTime) from which we infer how m
```{r}
wDT <- drake::readd(wideData) # back from the drake
head(wDT)
names(wDT)
```
If we take the mean of the number of feeders reporting per day (date) then a value of 25 will indicate a day when _all_ feeders have _all_ data (since it would be the mean of all the '25's).
......@@ -307,13 +310,13 @@ nrow(aggDT[propExpected == 1])
If we plot the mean then we will see which days get closest to having a full dataset.
```{r bestDaysMean}
```{r bestDaysMean, fig.width=8}
ggplot2::ggplot(aggDT, aes(x = rDate, colour = season, y = meanOK)) + geom_point()
```
Re-plot by the % of expected if we assume we _should_ have 25 feeders * 24 hours * 4 per hour (will be the same shape):
```{r bestDaysProp}
```{r bestDaysProp, fig.width=8}
ggplot2::ggplot(aggDT, aes(x = rDate, colour = season, y = 100*propExpected)) + geom_point() +
labs(y = "%")
```
......
This diff is collapsed.
......@@ -154,8 +154,9 @@ my_plan <- drake::drake_plan(
wideData = toWide(uniqData),
saveLong = saveData(uniqData, "L"), # doesn't actually return anything
saveWide = saveData(wideData, "W"), # doesn't actually return anything
htmlOut = makeReport(rmdFile, version, "html"), # html output
pdfOut = makeReport(rmdFile, version, "pdf") # pdf - must be some way to do this without re-running the whole thing
# pdf output fails
#pdfOut = makeReport(rmdFile, version, "pdf"), # pdf - must be some way to do this without re-running the whole thing
htmlOut = makeReport(rmdFile, version, "html") # html output
)
# see https://books.ropensci.org/drake/projects.html#usage
......
##############
# R packages
@Manual{baseR,
title = {R: A Language and Environment for Statistical Computing},
author = {{R Core Team}},
organization = {R Foundation for Statistical Computing},
address = {Vienna, Austria},
year = {2016},
url = {https://www.R-project.org/},
}
@Manual{bookdown,
title = {bookdown: Authoring Books and Technical Documents with R Markdown},
author = {Yihui Xie},
year = {2018},
note = {R package version 0.9},
url = {https://github.com/rstudio/bookdown},
}
@Manual{data.table,
title = {data.table: Extension of Data.frame},
author = {M Dowle and A Srinivasan and T Short and S Lianoglou with contributions from R Saporta and E Antonyan},
year = {2015},
note = {R package version 1.9.6},
url = {https://CRAN.R-project.org/package=data.table},
}
@Article{drake,
title = {The drake R package: a pipeline toolkit for reproducibility and high-performance computing},
author = {William Michael Landau},
journal = {Journal of Open Source Software},
year = {2018},
volume = {3},
number = {21},
url = {https://doi.org/10.21105/joss.00550},
}
@Book{ggplot2,
author = {Hadley Wickham},
title = {ggplot2: Elegant Graphics for Data Analysis},
publisher = {Springer-Verlag New York},
year = {2009},
isbn = {978-0-387-98140-6},
url = {http://ggplot2.org},
}
@Manual{here,
title = {here: A Simpler Way to Find Your Files},
author = {Kirill Müller},
year = {2017},
note = {R package version 0.1},
url = {https://CRAN.R-project.org/package=here},
}
@Manual{kableExtra,
title = {kableExtra: Construct Complex Table with 'kable' and Pipe Syntax},
author = {Hao Zhu},
year = {2019},
note = {R package version 1.0.1},
url = {https://CRAN.R-project.org/package=kableExtra},
}
@Manual{knitr,
title = {knitr: A General-Purpose Package for Dynamic Report Generation in R},
author = {Yihui Xie},
year = {2016},
url = {https://CRAN.R-project.org/package=knitr},
}
@Article{lubridate,
title = {Dates and Times Made Easy with {lubridate}},
author = {Garrett Grolemund and Hadley Wickham},
journal = {Journal of Statistical Software},
year = {2011},
volume = {40},
number = {3},
pages = {1--25},
url = {http://www.jstatsoft.org/v40/i03/},
}
@Manual{rmarkdown,
title = {rmarkdown: Dynamic Documents for R},
author = {JJ Allaire and Yihui Xie and Jonathan McPherson and Javier Luraschi and Kevin Ushey and Aron Atkins and Hadley Wickham and Joe Cheng and Winston Chang and Richard Iannone},
year = {2020},
note = {R package version 2.1},
url = {https://github.com/rstudio/rmarkdown},
}
@Book{rmarkdownBook,
title = {R Markdown: The Definitive Guide},
author = {Yihui Xie and J.J. Allaire and Garrett Grolemund},
publisher = {Chapman and Hall/CRC},
address = {Boca Raton, Florida},
year = {2018},
note = {ISBN 9781138359338},
url = {https://bookdown.org/yihui/rmarkdown},
}
@Manual{skimr,
title = {skimr: skimr},
author = {Eduardo {Arino de la Rubia} and Hao Zhu and Shannon Ellis and Elin Waring and Michael Quinn},
year = {2017},
note = {R package version 1.0},
url = {https://github.com/ropenscilabs/skimr},
}
@Manual{tidyverse,
title = {tidyverse: Easily Install and Load 'Tidyverse' Packages},
author = {Hadley Wickham},
year = {2017},
note = {R package version 1.1.1},
url = {https://CRAN.R-project.org/package=tidyverse},
}
@Manual{viridis,
title = {viridis: Default Color Maps from 'matplotlib'},
author = {Simon Garnier},
year = {2018},
note = {R package version 0.5.1},
url = {https://CRAN.R-project.org/package=viridis},
}
\ No newline at end of file
This diff is collapsed.
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment