Skip to content
Snippets Groups Projects

Re-run ellis full data

Merged Ben Anderson requested to merge reRunEllisFullData into master
6 files
+ 1206
30
Compare changes
  • Side-by-side
  • Inline
Files
6
+ 11
14
@@ -27,7 +27,6 @@ output:
@@ -27,7 +27,6 @@ output:
toc: yes
toc: yes
toc_depth: 2
toc_depth: 2
fig_width: 5
fig_width: 5
always_allow_html: yes
bibliography: '`r paste0(here::here(), "/bibliography.bib")`'
bibliography: '`r paste0(here::here(), "/bibliography.bib")`'
---
---
@@ -64,7 +63,7 @@ We have some electricity substation feeder data that has been cleaned to give me
@@ -64,7 +63,7 @@ We have some electricity substation feeder data that has been cleaned to give me
There seem to be some NA kW values and a lot of missing time stamps. We want to select the 'best' (i.e most complete) days within a day-of-the-week/season/year sampling frame. If we can't do that we may have to resort to seasonal mean kW profiles by hour & day of the week...
There seem to be some NA kW values and a lot of missing time stamps. We want to select the 'best' (i.e most complete) days within a day-of-the-week/season/year sampling frame. If we can't do that we may have to resort to seasonal mean kW profiles by hour & day of the week...
Code used to generate this report: https://git.soton.ac.uk/ba1e12/spatialec/-/blob/master/isleOfWight/cleaningFeederData.Rmd
The code used to generate this report is in: https://git.soton.ac.uk/ba1e12/dataCleaning/Rmd/
# Data prep
# Data prep
@@ -78,8 +77,7 @@ origDataDT <- drake::readd(origData) # readd the drake object
@@ -78,8 +77,7 @@ origDataDT <- drake::readd(origData) # readd the drake object
uniqDataDT <- drake::readd(uniqData) # readd the drake object
uniqDataDT <- drake::readd(uniqData) # readd the drake object
kableExtra::kable(head(origDataDT), digits = 2,
kableExtra::kable(head(origDataDT), digits = 2,
caption = "Counts per feeder (long table)") %>%
caption = "First 6 rows of data")
kable_styling()
```
```
Do a duplicate check by feeder_ID, dateTime & kW. In theory there should not be any.
Do a duplicate check by feeder_ID, dateTime & kW. In theory there should not be any.
@@ -89,16 +87,19 @@ message("Original data nrows: ", tidyNum(nrow(origDataDT)))
@@ -89,16 +87,19 @@ message("Original data nrows: ", tidyNum(nrow(origDataDT)))
message("Unique data nrows: ", tidyNum(nrow(uniqDataDT)))
message("Unique data nrows: ", tidyNum(nrow(uniqDataDT)))
message("So we have ", tidyNum(nrow(origDataDT) - nrow(uniqDataDT)), " duplicates...")
nDups <- tidyNum(nrow(origDataDT) - nrow(uniqDataDT))
 
 
message("So we have ", tidyNum(nDups), " duplicates...")
pc <- 100*((nrow(origDataDT) - nrow(uniqDataDT))/nrow(origDataDT))
pc <- 100*((nrow(origDataDT) - nrow(uniqDataDT))/nrow(origDataDT))
message("That's ", round(pc,2), "%")
message("That's ", round(pc,2), "%")
feederDT <- uniqDataDT[!is.na(rDateTime)] # use dt with no duplicates
feederDT <- uniqDataDT[!is.na(rDateTime)] # use dt with no duplicates
origDataDT <- NULL # save memory
origDataDT <- NULL # save memory
 
```
```
There were `r tidyNum(nrow(origDataDT) - nrow(uniqDataDT))` duplicates - that's `r round(pc,2)` % of the observations loaded.
There were `r tidyNum(nDups)` duplicates - that's ~ `r round(pc,2)` % of the observations loaded.
So we remove the duplicates...
So we remove the duplicates...
@@ -321,23 +322,19 @@ ggplot2::ggplot(aggDT, aes(x = rDate, colour = season,
@@ -321,23 +322,19 @@ ggplot2::ggplot(aggDT, aes(x = rDate, colour = season,
aggDT[, rDoW := lubridate::wday(rDate, lab = TRUE)]
aggDT[, rDoW := lubridate::wday(rDate, lab = TRUE)]
h <- head(aggDT[season == "Spring"][order(-propExpected)])
h <- head(aggDT[season == "Spring"][order(-propExpected)])
kableExtra::kable(h, caption = "Best Spring days overall",
kableExtra::kable(h, caption = "Best Spring days overall",
digits = 3) %>%
digits = 3)
kable_styling()
h <- head(aggDT[season == "Summer"][order(-propExpected)])
h <- head(aggDT[season == "Summer"][order(-propExpected)])
kableExtra::kable(h, caption = "Best Summer days overall",
kableExtra::kable(h, caption = "Best Summer days overall",
digits = 3) %>%
digits = 3)
kable_styling()
h <- head(aggDT[season == "Autumn"][order(-propExpected)])
h <- head(aggDT[season == "Autumn"][order(-propExpected)])
kableExtra::kable(h, caption = "Best Autumn days overall",
kableExtra::kable(h, caption = "Best Autumn days overall",
digits = 3) %>%
digits = 3)
kable_styling()
h <- head(aggDT[season == "Winter"][order(-propExpected)])
h <- head(aggDT[season == "Winter"][order(-propExpected)])
kableExtra::kable(h, caption = "Best Winter days overall",
kableExtra::kable(h, caption = "Best Winter days overall",
digits = 3) %>%
digits = 3)
kable_styling()
```
```
# Summary
# Summary
Loading