Administrator approval is now required for registering new accounts. If you are registering a new account, and are external to the University, please ask the repository owner to contact ServiceLine to request your account be approved. Repository owners must include the newly registered email address, and specific repository in the request for approval.

Commit 4d734d62 authored by B.Anderson's avatar B.Anderson
Browse files

fixed bib (was absent); html no longer self-contained so plots exist...

fixed bib (was absent); html no longer self-contained so plots exist seperately; floating ToC still broken in html (why?); pdf fails (probabky table too big)
parent c11387e4
......@@ -9,25 +9,24 @@ author: '`r params$authors`'
date: 'Last run at: `r Sys.time()`'
output:
bookdown::html_document2:
code_folding: hide
self_contained: TRUE
fig_caption: yes
code_folding: hide
number_sections: yes
self_contained: yes
toc: yes
toc_depth: 3
toc_float: yes
bookdown::word_document2:
fig_caption: yes
toc: yes
toc_depth: 2
toc_float: TRUE
bookdown::pdf_document2:
fig_caption: yes
keep_tex: yes
number_sections: yes
bookdown::word_document2:
fig_caption: yes
number_sections: yes
toc: yes
toc_depth: 2
fig_width: 5
always_allow_html: yes
bibliography: '`r path.expand("~/bibliography.bib")`'
bibliography: '`r paste0(here::here(), "/bibliography.bib")`'
---
```{r setup}
......@@ -73,8 +72,12 @@ Loaded data from `r dFile`... (using drake)
```{r loadData}
origDataDT <- drake::readd(origData) # readd the drake object
head(origDataDT)
uniqDataDT <- drake::readd(uniqData) # readd the drake object
kableExtra::kable(head(origDataDT), digits = 2,
caption = "Counts per feeder (long table)") %>%
kable_styling()
```
Check data prep worked OK.
......@@ -82,8 +85,8 @@ Check data prep worked OK.
```{r dataPrep}
# check
t <- origDataDT[, .(nObs = .N,
firstDate = min(rDateTime),
lastDate = max(rDateTime),
firstDate = min(rDateTime, na.rm = TRUE),
lastDate = max(rDateTime, na.rm = TRUE),
meankW = mean(kW, na.rm = TRUE)
), keyby = .(region, feeder_ID)]
......@@ -104,7 +107,7 @@ message("So we have ", tidyNum(nrow(origDataDT) - nrow(uniqDataDT)), " duplicate
pc <- 100*((nrow(origDataDT) - nrow(uniqDataDT))/nrow(origDataDT))
message("That's ", round(pc,2), "%")
feederDT <- uniqDataDT # use dt with no duplicates
feederDT <- uniqDataDT[!is.na(rDateTime)] # use dt with no duplicates
origDataDT <- NULL # save memory
```
......@@ -114,7 +117,7 @@ So we remove the duplicates...
Try aggregated demand profiles of mean kW by season and feeder and day of the week... Remove the legend so we can see the plot.
```{r kwProfiles}
```{r kwProfiles, fig.width=8}
plotDT <- feederDT[, .(meankW = mean(kW),
nObs = .N), keyby = .(rTime, season, feeder_ID, rDoW)]
......@@ -131,7 +134,7 @@ Is that what we expect?
Number of observations per feeder per day - gaps will be visible (totally missing days) as will low counts (partially missing days) - we would expect 24 * 4... Convert this to a % of expected...
```{r basicCountTile, fig.height=10}
```{r basicCountTile, fig.height=10, fig.width=8}
plotDT <- feederDT[, .(nObs = .N), keyby = .(rDate, feeder_ID)]
plotDT[, propExpected := nObs/(24*4)]
......@@ -148,7 +151,7 @@ This is not good. There are both gaps (missing days) and partial days. **Lots**
What does it look like if we aggregate across all feeders by time? There are `r uniqueN(feederDT$feeder_ID)` feeders so we should get this many at best How close do we get?
```{r aggVisN}
```{r aggVisN, fig.width=8}
plotDT <- feederDT[, .(nObs = .N,
meankW = mean(kW)), keyby = .(rTime, rDate, season)]
......@@ -167,7 +170,7 @@ That really doesn't look too good. There are some very odd fluctuations in there
What do the mean kw patterns look like per feeder per day?
```{r basickWTile, fig.height=10}
```{r basickWTile, fig.height=10, fig.width=8}
plotDT <- feederDT[, .(meankW = mean(kW, na.rm = TRUE)), keyby = .(rDate, feeder_ID)]
ggplot2::ggplot(plotDT, aes(x = rDate, y = feeder_ID, fill = meankW)) +
......@@ -183,7 +186,7 @@ Missing data is even more clearly visible.
What about mean kw across all feeders?
```{r aggViskW}
```{r aggViskW, fig.width=8}
plotDT <- feederDT[, .(nObs = .N,
meankW = mean(kW)), keyby = .(rTime, rDate, season)]
......@@ -213,7 +216,7 @@ summary(dateTimesDT)
Let's see how many unique feeders we have per dateTime. Surely we have at least one sending data each half-hour?
```{r tileFeeders}
```{r tileFeeders, fig.width=8}
ggplot2::ggplot(dateTimesDT, aes(x = rDate, y = rTime, fill = nFeeders)) +
geom_tile() +
scale_fill_viridis_c() +
......@@ -224,7 +227,7 @@ No. As we suspected from the previous plots, we clearly have some dateTimes wher
Are there time of day patterns? It looks like it...
```{r missingProfiles}
```{r missingProfiles, fig.width=8}
dateTimesDT[, rYear := lubridate::year(rDateTime)]
plotDT <- dateTimesDT[, .(meanN = mean(nFeeders),
meankW = mean(meankW)), keyby = .(rTime, season, rYear)]
......@@ -240,7 +243,7 @@ Oh yes. After 2003. Why?
What about the kW?
```{r kWProfiles}
```{r kWProfiles, fig.width=8}
ggplot2::ggplot(plotDT, aes(y = meankW, x = rTime, colour = season)) +
geom_line() +
......@@ -251,7 +254,7 @@ ggplot2::ggplot(plotDT, aes(y = meankW, x = rTime, colour = season)) +
Those look as we'd expect. But do we see a correlation between the number of observations per hour and the mean kW after 2003? There is a suspicion that as mean kw goes up so do the number of observations per hour... although this could just be a correlation with low demand periods (night time?)
```{r compareProfiles}
```{r compareProfiles, fig.width=8}
ggplot2::ggplot(plotDT, aes(y = meankW, x = meanN, colour = season)) +
geom_point() +
facet_wrap(rYear ~ .) +
......@@ -278,7 +281,7 @@ The wide dataset has a count of NAs per row (dateTime) from which we infer how m
```{r}
wDT <- drake::readd(wideData) # back from the drake
head(wDT)
names(wDT)
```
If we take the mean of the number of feeders reporting per day (date) then a value of 25 will indicate a day when _all_ feeders have _all_ data (since it would be the mean of all the '25's).
......@@ -307,13 +310,13 @@ nrow(aggDT[propExpected == 1])
If we plot the mean then we will see which days get closest to having a full dataset.
```{r bestDaysMean}
```{r bestDaysMean, fig.width=8}
ggplot2::ggplot(aggDT, aes(x = rDate, colour = season, y = meanOK)) + geom_point()
```
Re-plot by the % of expected if we assume we _should_ have 25 feeders * 24 hours * 4 per hour (will be the same shape):
```{r bestDaysProp}
```{r bestDaysProp, fig.width=8}
ggplot2::ggplot(aggDT, aes(x = rDate, colour = season, y = 100*propExpected)) + geom_point() +
labs(y = "%")
```
......
This diff is collapsed.
......@@ -154,8 +154,9 @@ my_plan <- drake::drake_plan(
wideData = toWide(uniqData),
saveLong = saveData(uniqData, "L"), # doesn't actually return anything
saveWide = saveData(wideData, "W"), # doesn't actually return anything
htmlOut = makeReport(rmdFile, version, "html"), # html output
pdfOut = makeReport(rmdFile, version, "pdf") # pdf - must be some way to do this without re-running the whole thing
# pdf output fails
#pdfOut = makeReport(rmdFile, version, "pdf"), # pdf - must be some way to do this without re-running the whole thing
htmlOut = makeReport(rmdFile, version, "html") # html output
)
# see https://books.ropensci.org/drake/projects.html#usage
......
##############
# R packages
@Manual{baseR,
title = {R: A Language and Environment for Statistical Computing},
author = {{R Core Team}},
organization = {R Foundation for Statistical Computing},
address = {Vienna, Austria},
year = {2016},
url = {https://www.R-project.org/},
}
@Manual{bookdown,
title = {bookdown: Authoring Books and Technical Documents with R Markdown},
author = {Yihui Xie},
year = {2018},
note = {R package version 0.9},
url = {https://github.com/rstudio/bookdown},
}
@Manual{data.table,
title = {data.table: Extension of Data.frame},
author = {M Dowle and A Srinivasan and T Short and S Lianoglou with contributions from R Saporta and E Antonyan},
year = {2015},
note = {R package version 1.9.6},
url = {https://CRAN.R-project.org/package=data.table},
}
@Article{drake,
title = {The drake R package: a pipeline toolkit for reproducibility and high-performance computing},
author = {William Michael Landau},
journal = {Journal of Open Source Software},
year = {2018},
volume = {3},
number = {21},
url = {https://doi.org/10.21105/joss.00550},
}
@Book{ggplot2,
author = {Hadley Wickham},
title = {ggplot2: Elegant Graphics for Data Analysis},
publisher = {Springer-Verlag New York},
year = {2009},
isbn = {978-0-387-98140-6},
url = {http://ggplot2.org},
}
@Manual{here,
title = {here: A Simpler Way to Find Your Files},
author = {Kirill Müller},
year = {2017},
note = {R package version 0.1},
url = {https://CRAN.R-project.org/package=here},
}
@Manual{kableExtra,
title = {kableExtra: Construct Complex Table with 'kable' and Pipe Syntax},
author = {Hao Zhu},
year = {2019},
note = {R package version 1.0.1},
url = {https://CRAN.R-project.org/package=kableExtra},
}
@Manual{knitr,
title = {knitr: A General-Purpose Package for Dynamic Report Generation in R},
author = {Yihui Xie},
year = {2016},
url = {https://CRAN.R-project.org/package=knitr},
}
@Article{lubridate,
title = {Dates and Times Made Easy with {lubridate}},
author = {Garrett Grolemund and Hadley Wickham},
journal = {Journal of Statistical Software},
year = {2011},
volume = {40},
number = {3},
pages = {1--25},
url = {http://www.jstatsoft.org/v40/i03/},
}
@Manual{rmarkdown,
title = {rmarkdown: Dynamic Documents for R},
author = {JJ Allaire and Yihui Xie and Jonathan McPherson and Javier Luraschi and Kevin Ushey and Aron Atkins and Hadley Wickham and Joe Cheng and Winston Chang and Richard Iannone},
year = {2020},
note = {R package version 2.1},
url = {https://github.com/rstudio/rmarkdown},
}
@Book{rmarkdownBook,
title = {R Markdown: The Definitive Guide},
author = {Yihui Xie and J.J. Allaire and Garrett Grolemund},
publisher = {Chapman and Hall/CRC},
address = {Boca Raton, Florida},
year = {2018},
note = {ISBN 9781138359338},
url = {https://bookdown.org/yihui/rmarkdown},
}
@Manual{skimr,
title = {skimr: skimr},
author = {Eduardo {Arino de la Rubia} and Hao Zhu and Shannon Ellis and Elin Waring and Michael Quinn},
year = {2017},
note = {R package version 1.0},
url = {https://github.com/ropenscilabs/skimr},
}
@Manual{tidyverse,
title = {tidyverse: Easily Install and Load 'Tidyverse' Packages},
author = {Hadley Wickham},
year = {2017},
note = {R package version 1.1.1},
url = {https://CRAN.R-project.org/package=tidyverse},
}
@Manual{viridis,
title = {viridis: Default Color Maps from 'matplotlib'},
author = {Simon Garnier},
year = {2018},
note = {R package version 0.5.1},
url = {https://CRAN.R-project.org/package=viridis},
}
\ No newline at end of file
This diff is collapsed.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment