Skip to content
Snippets Groups Projects
Commit 4d734d62 authored by B.Anderson's avatar B.Anderson
Browse files

fixed bib (was absent); html no longer self-contained so plots exist...

fixed bib (was absent); html no longer self-contained so plots exist seperately; floating ToC still broken in html (why?); pdf fails (probabky table too big)
parent c11387e4
No related branches found
No related tags found
3 merge requests!3merge a few edits,!2fixed pdf build,!1Re run ellis full data
This commit is part of merge request !3. Comments created here will be created in the context of that merge request.
...@@ -9,25 +9,24 @@ author: '`r params$authors`' ...@@ -9,25 +9,24 @@ author: '`r params$authors`'
date: 'Last run at: `r Sys.time()`' date: 'Last run at: `r Sys.time()`'
output: output:
bookdown::html_document2: bookdown::html_document2:
code_folding: hide self_contained: TRUE
fig_caption: yes fig_caption: yes
code_folding: hide
number_sections: yes number_sections: yes
self_contained: yes
toc: yes
toc_depth: 3
toc_float: yes
bookdown::word_document2:
fig_caption: yes
toc: yes toc: yes
toc_depth: 2 toc_depth: 2
toc_float: TRUE
bookdown::pdf_document2: bookdown::pdf_document2:
fig_caption: yes fig_caption: yes
keep_tex: yes number_sections: yes
bookdown::word_document2:
fig_caption: yes
number_sections: yes number_sections: yes
toc: yes toc: yes
toc_depth: 2 toc_depth: 2
fig_width: 5
always_allow_html: yes always_allow_html: yes
bibliography: '`r path.expand("~/bibliography.bib")`' bibliography: '`r paste0(here::here(), "/bibliography.bib")`'
--- ---
```{r setup} ```{r setup}
...@@ -73,8 +72,12 @@ Loaded data from `r dFile`... (using drake) ...@@ -73,8 +72,12 @@ Loaded data from `r dFile`... (using drake)
```{r loadData} ```{r loadData}
origDataDT <- drake::readd(origData) # readd the drake object origDataDT <- drake::readd(origData) # readd the drake object
head(origDataDT)
uniqDataDT <- drake::readd(uniqData) # readd the drake object uniqDataDT <- drake::readd(uniqData) # readd the drake object
kableExtra::kable(head(origDataDT), digits = 2,
caption = "Counts per feeder (long table)") %>%
kable_styling()
``` ```
Check data prep worked OK. Check data prep worked OK.
...@@ -82,8 +85,8 @@ Check data prep worked OK. ...@@ -82,8 +85,8 @@ Check data prep worked OK.
```{r dataPrep} ```{r dataPrep}
# check # check
t <- origDataDT[, .(nObs = .N, t <- origDataDT[, .(nObs = .N,
firstDate = min(rDateTime), firstDate = min(rDateTime, na.rm = TRUE),
lastDate = max(rDateTime), lastDate = max(rDateTime, na.rm = TRUE),
meankW = mean(kW, na.rm = TRUE) meankW = mean(kW, na.rm = TRUE)
), keyby = .(region, feeder_ID)] ), keyby = .(region, feeder_ID)]
...@@ -104,7 +107,7 @@ message("So we have ", tidyNum(nrow(origDataDT) - nrow(uniqDataDT)), " duplicate ...@@ -104,7 +107,7 @@ message("So we have ", tidyNum(nrow(origDataDT) - nrow(uniqDataDT)), " duplicate
pc <- 100*((nrow(origDataDT) - nrow(uniqDataDT))/nrow(origDataDT)) pc <- 100*((nrow(origDataDT) - nrow(uniqDataDT))/nrow(origDataDT))
message("That's ", round(pc,2), "%") message("That's ", round(pc,2), "%")
feederDT <- uniqDataDT # use dt with no duplicates feederDT <- uniqDataDT[!is.na(rDateTime)] # use dt with no duplicates
origDataDT <- NULL # save memory origDataDT <- NULL # save memory
``` ```
...@@ -114,7 +117,7 @@ So we remove the duplicates... ...@@ -114,7 +117,7 @@ So we remove the duplicates...
Try aggregated demand profiles of mean kW by season and feeder and day of the week... Remove the legend so we can see the plot. Try aggregated demand profiles of mean kW by season and feeder and day of the week... Remove the legend so we can see the plot.
```{r kwProfiles} ```{r kwProfiles, fig.width=8}
plotDT <- feederDT[, .(meankW = mean(kW), plotDT <- feederDT[, .(meankW = mean(kW),
nObs = .N), keyby = .(rTime, season, feeder_ID, rDoW)] nObs = .N), keyby = .(rTime, season, feeder_ID, rDoW)]
...@@ -131,7 +134,7 @@ Is that what we expect? ...@@ -131,7 +134,7 @@ Is that what we expect?
Number of observations per feeder per day - gaps will be visible (totally missing days) as will low counts (partially missing days) - we would expect 24 * 4... Convert this to a % of expected... Number of observations per feeder per day - gaps will be visible (totally missing days) as will low counts (partially missing days) - we would expect 24 * 4... Convert this to a % of expected...
```{r basicCountTile, fig.height=10} ```{r basicCountTile, fig.height=10, fig.width=8}
plotDT <- feederDT[, .(nObs = .N), keyby = .(rDate, feeder_ID)] plotDT <- feederDT[, .(nObs = .N), keyby = .(rDate, feeder_ID)]
plotDT[, propExpected := nObs/(24*4)] plotDT[, propExpected := nObs/(24*4)]
...@@ -148,7 +151,7 @@ This is not good. There are both gaps (missing days) and partial days. **Lots** ...@@ -148,7 +151,7 @@ This is not good. There are both gaps (missing days) and partial days. **Lots**
What does it look like if we aggregate across all feeders by time? There are `r uniqueN(feederDT$feeder_ID)` feeders so we should get this many at best How close do we get? What does it look like if we aggregate across all feeders by time? There are `r uniqueN(feederDT$feeder_ID)` feeders so we should get this many at best How close do we get?
```{r aggVisN} ```{r aggVisN, fig.width=8}
plotDT <- feederDT[, .(nObs = .N, plotDT <- feederDT[, .(nObs = .N,
meankW = mean(kW)), keyby = .(rTime, rDate, season)] meankW = mean(kW)), keyby = .(rTime, rDate, season)]
...@@ -167,7 +170,7 @@ That really doesn't look too good. There are some very odd fluctuations in there ...@@ -167,7 +170,7 @@ That really doesn't look too good. There are some very odd fluctuations in there
What do the mean kw patterns look like per feeder per day? What do the mean kw patterns look like per feeder per day?
```{r basickWTile, fig.height=10} ```{r basickWTile, fig.height=10, fig.width=8}
plotDT <- feederDT[, .(meankW = mean(kW, na.rm = TRUE)), keyby = .(rDate, feeder_ID)] plotDT <- feederDT[, .(meankW = mean(kW, na.rm = TRUE)), keyby = .(rDate, feeder_ID)]
ggplot2::ggplot(plotDT, aes(x = rDate, y = feeder_ID, fill = meankW)) + ggplot2::ggplot(plotDT, aes(x = rDate, y = feeder_ID, fill = meankW)) +
...@@ -183,7 +186,7 @@ Missing data is even more clearly visible. ...@@ -183,7 +186,7 @@ Missing data is even more clearly visible.
What about mean kw across all feeders? What about mean kw across all feeders?
```{r aggViskW} ```{r aggViskW, fig.width=8}
plotDT <- feederDT[, .(nObs = .N, plotDT <- feederDT[, .(nObs = .N,
meankW = mean(kW)), keyby = .(rTime, rDate, season)] meankW = mean(kW)), keyby = .(rTime, rDate, season)]
...@@ -213,7 +216,7 @@ summary(dateTimesDT) ...@@ -213,7 +216,7 @@ summary(dateTimesDT)
Let's see how many unique feeders we have per dateTime. Surely we have at least one sending data each half-hour? Let's see how many unique feeders we have per dateTime. Surely we have at least one sending data each half-hour?
```{r tileFeeders} ```{r tileFeeders, fig.width=8}
ggplot2::ggplot(dateTimesDT, aes(x = rDate, y = rTime, fill = nFeeders)) + ggplot2::ggplot(dateTimesDT, aes(x = rDate, y = rTime, fill = nFeeders)) +
geom_tile() + geom_tile() +
scale_fill_viridis_c() + scale_fill_viridis_c() +
...@@ -224,7 +227,7 @@ No. As we suspected from the previous plots, we clearly have some dateTimes wher ...@@ -224,7 +227,7 @@ No. As we suspected from the previous plots, we clearly have some dateTimes wher
Are there time of day patterns? It looks like it... Are there time of day patterns? It looks like it...
```{r missingProfiles} ```{r missingProfiles, fig.width=8}
dateTimesDT[, rYear := lubridate::year(rDateTime)] dateTimesDT[, rYear := lubridate::year(rDateTime)]
plotDT <- dateTimesDT[, .(meanN = mean(nFeeders), plotDT <- dateTimesDT[, .(meanN = mean(nFeeders),
meankW = mean(meankW)), keyby = .(rTime, season, rYear)] meankW = mean(meankW)), keyby = .(rTime, season, rYear)]
...@@ -240,7 +243,7 @@ Oh yes. After 2003. Why? ...@@ -240,7 +243,7 @@ Oh yes. After 2003. Why?
What about the kW? What about the kW?
```{r kWProfiles} ```{r kWProfiles, fig.width=8}
ggplot2::ggplot(plotDT, aes(y = meankW, x = rTime, colour = season)) + ggplot2::ggplot(plotDT, aes(y = meankW, x = rTime, colour = season)) +
geom_line() + geom_line() +
...@@ -251,7 +254,7 @@ ggplot2::ggplot(plotDT, aes(y = meankW, x = rTime, colour = season)) + ...@@ -251,7 +254,7 @@ ggplot2::ggplot(plotDT, aes(y = meankW, x = rTime, colour = season)) +
Those look as we'd expect. But do we see a correlation between the number of observations per hour and the mean kW after 2003? There is a suspicion that as mean kw goes up so do the number of observations per hour... although this could just be a correlation with low demand periods (night time?) Those look as we'd expect. But do we see a correlation between the number of observations per hour and the mean kW after 2003? There is a suspicion that as mean kw goes up so do the number of observations per hour... although this could just be a correlation with low demand periods (night time?)
```{r compareProfiles} ```{r compareProfiles, fig.width=8}
ggplot2::ggplot(plotDT, aes(y = meankW, x = meanN, colour = season)) + ggplot2::ggplot(plotDT, aes(y = meankW, x = meanN, colour = season)) +
geom_point() + geom_point() +
facet_wrap(rYear ~ .) + facet_wrap(rYear ~ .) +
...@@ -278,7 +281,7 @@ The wide dataset has a count of NAs per row (dateTime) from which we infer how m ...@@ -278,7 +281,7 @@ The wide dataset has a count of NAs per row (dateTime) from which we infer how m
```{r} ```{r}
wDT <- drake::readd(wideData) # back from the drake wDT <- drake::readd(wideData) # back from the drake
head(wDT) names(wDT)
``` ```
If we take the mean of the number of feeders reporting per day (date) then a value of 25 will indicate a day when _all_ feeders have _all_ data (since it would be the mean of all the '25's). If we take the mean of the number of feeders reporting per day (date) then a value of 25 will indicate a day when _all_ feeders have _all_ data (since it would be the mean of all the '25's).
...@@ -307,13 +310,13 @@ nrow(aggDT[propExpected == 1]) ...@@ -307,13 +310,13 @@ nrow(aggDT[propExpected == 1])
If we plot the mean then we will see which days get closest to having a full dataset. If we plot the mean then we will see which days get closest to having a full dataset.
```{r bestDaysMean} ```{r bestDaysMean, fig.width=8}
ggplot2::ggplot(aggDT, aes(x = rDate, colour = season, y = meanOK)) + geom_point() ggplot2::ggplot(aggDT, aes(x = rDate, colour = season, y = meanOK)) + geom_point()
``` ```
Re-plot by the % of expected if we assume we _should_ have 25 feeders * 24 hours * 4 per hour (will be the same shape): Re-plot by the % of expected if we assume we _should_ have 25 feeders * 24 hours * 4 per hour (will be the same shape):
```{r bestDaysProp} ```{r bestDaysProp, fig.width=8}
ggplot2::ggplot(aggDT, aes(x = rDate, colour = season, y = 100*propExpected)) + geom_point() + ggplot2::ggplot(aggDT, aes(x = rDate, colour = season, y = 100*propExpected)) + geom_point() +
labs(y = "%") labs(y = "%")
``` ```
......
This diff is collapsed.
...@@ -154,8 +154,9 @@ my_plan <- drake::drake_plan( ...@@ -154,8 +154,9 @@ my_plan <- drake::drake_plan(
wideData = toWide(uniqData), wideData = toWide(uniqData),
saveLong = saveData(uniqData, "L"), # doesn't actually return anything saveLong = saveData(uniqData, "L"), # doesn't actually return anything
saveWide = saveData(wideData, "W"), # doesn't actually return anything saveWide = saveData(wideData, "W"), # doesn't actually return anything
htmlOut = makeReport(rmdFile, version, "html"), # html output # pdf output fails
pdfOut = makeReport(rmdFile, version, "pdf") # pdf - must be some way to do this without re-running the whole thing #pdfOut = makeReport(rmdFile, version, "pdf"), # pdf - must be some way to do this without re-running the whole thing
htmlOut = makeReport(rmdFile, version, "html") # html output
) )
# see https://books.ropensci.org/drake/projects.html#usage # see https://books.ropensci.org/drake/projects.html#usage
......
##############
# R packages
@Manual{baseR,
title = {R: A Language and Environment for Statistical Computing},
author = {{R Core Team}},
organization = {R Foundation for Statistical Computing},
address = {Vienna, Austria},
year = {2016},
url = {https://www.R-project.org/},
}
@Manual{bookdown,
title = {bookdown: Authoring Books and Technical Documents with R Markdown},
author = {Yihui Xie},
year = {2018},
note = {R package version 0.9},
url = {https://github.com/rstudio/bookdown},
}
@Manual{data.table,
title = {data.table: Extension of Data.frame},
author = {M Dowle and A Srinivasan and T Short and S Lianoglou with contributions from R Saporta and E Antonyan},
year = {2015},
note = {R package version 1.9.6},
url = {https://CRAN.R-project.org/package=data.table},
}
@Article{drake,
title = {The drake R package: a pipeline toolkit for reproducibility and high-performance computing},
author = {William Michael Landau},
journal = {Journal of Open Source Software},
year = {2018},
volume = {3},
number = {21},
url = {https://doi.org/10.21105/joss.00550},
}
@Book{ggplot2,
author = {Hadley Wickham},
title = {ggplot2: Elegant Graphics for Data Analysis},
publisher = {Springer-Verlag New York},
year = {2009},
isbn = {978-0-387-98140-6},
url = {http://ggplot2.org},
}
@Manual{here,
title = {here: A Simpler Way to Find Your Files},
author = {Kirill Müller},
year = {2017},
note = {R package version 0.1},
url = {https://CRAN.R-project.org/package=here},
}
@Manual{kableExtra,
title = {kableExtra: Construct Complex Table with 'kable' and Pipe Syntax},
author = {Hao Zhu},
year = {2019},
note = {R package version 1.0.1},
url = {https://CRAN.R-project.org/package=kableExtra},
}
@Manual{knitr,
title = {knitr: A General-Purpose Package for Dynamic Report Generation in R},
author = {Yihui Xie},
year = {2016},
url = {https://CRAN.R-project.org/package=knitr},
}
@Article{lubridate,
title = {Dates and Times Made Easy with {lubridate}},
author = {Garrett Grolemund and Hadley Wickham},
journal = {Journal of Statistical Software},
year = {2011},
volume = {40},
number = {3},
pages = {1--25},
url = {http://www.jstatsoft.org/v40/i03/},
}
@Manual{rmarkdown,
title = {rmarkdown: Dynamic Documents for R},
author = {JJ Allaire and Yihui Xie and Jonathan McPherson and Javier Luraschi and Kevin Ushey and Aron Atkins and Hadley Wickham and Joe Cheng and Winston Chang and Richard Iannone},
year = {2020},
note = {R package version 2.1},
url = {https://github.com/rstudio/rmarkdown},
}
@Book{rmarkdownBook,
title = {R Markdown: The Definitive Guide},
author = {Yihui Xie and J.J. Allaire and Garrett Grolemund},
publisher = {Chapman and Hall/CRC},
address = {Boca Raton, Florida},
year = {2018},
note = {ISBN 9781138359338},
url = {https://bookdown.org/yihui/rmarkdown},
}
@Manual{skimr,
title = {skimr: skimr},
author = {Eduardo {Arino de la Rubia} and Hao Zhu and Shannon Ellis and Elin Waring and Michael Quinn},
year = {2017},
note = {R package version 1.0},
url = {https://github.com/ropenscilabs/skimr},
}
@Manual{tidyverse,
title = {tidyverse: Easily Install and Load 'Tidyverse' Packages},
author = {Hadley Wickham},
year = {2017},
note = {R package version 1.1.1},
url = {https://CRAN.R-project.org/package=tidyverse},
}
@Manual{viridis,
title = {viridis: Default Color Maps from 'matplotlib'},
author = {Simon Garnier},
year = {2018},
note = {R package version 0.5.1},
url = {https://CRAN.R-project.org/package=viridis},
}
\ No newline at end of file
This diff is collapsed.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment