diff --git a/Rmd/cleaningFeederData.Rmd b/Rmd/cleaningFeederData.Rmd index 0e3e22755e3abeaf866a305344e3e67e0ee0643d..2099f7207ea3afcefdd361cd155b3c63eca7299a 100644 --- a/Rmd/cleaningFeederData.Rmd +++ b/Rmd/cleaningFeederData.Rmd @@ -27,7 +27,6 @@ output: toc: yes toc_depth: 2 fig_width: 5 -always_allow_html: yes bibliography: '`r paste0(here::here(), "/bibliography.bib")`' --- @@ -64,7 +63,7 @@ We have some electricity substation feeder data that has been cleaned to give me There seem to be some NA kW values and a lot of missing time stamps. We want to select the 'best' (i.e most complete) days within a day-of-the-week/season/year sampling frame. If we can't do that we may have to resort to seasonal mean kW profiles by hour & day of the week... -Code used to generate this report: https://git.soton.ac.uk/ba1e12/spatialec/-/blob/master/isleOfWight/cleaningFeederData.Rmd +The code used to generate this report is in: https://git.soton.ac.uk/ba1e12/dataCleaning/Rmd/ # Data prep @@ -78,7 +77,7 @@ origDataDT <- drake::readd(origData) # readd the drake object uniqDataDT <- drake::readd(uniqData) # readd the drake object kableExtra::kable(head(origDataDT), digits = 2, - caption = "Counts per feeder (long table)") %>% + caption = "First 6 rows of data") %>% kable_styling() ``` @@ -89,16 +88,19 @@ message("Original data nrows: ", tidyNum(nrow(origDataDT))) message("Unique data nrows: ", tidyNum(nrow(uniqDataDT))) -message("So we have ", tidyNum(nrow(origDataDT) - nrow(uniqDataDT)), " duplicates...") +nDups <- tidyNum(nrow(origDataDT) - nrow(uniqDataDT)) + +message("So we have ", tidyNum(nDups), " duplicates...") pc <- 100*((nrow(origDataDT) - nrow(uniqDataDT))/nrow(origDataDT)) message("That's ", round(pc,2), "%") feederDT <- uniqDataDT[!is.na(rDateTime)] # use dt with no duplicates origDataDT <- NULL # save memory + ``` -There were `r tidyNum(nrow(origDataDT) - nrow(uniqDataDT))` duplicates - that's `r round(pc,2)` % of the observations loaded. +There were `r tidyNum(nDups)` duplicates - that's ~ `r round(pc,2)` % of the observations loaded. So we remove the duplicates...