Skip to content
Snippets Groups Projects
Commit cd3ccc9d authored by B.Anderson's avatar B.Anderson
Browse files

added reporting of duplicates; trying to fix tables for pdf output. Vanilla...

added reporting of duplicates; trying to fix tables for pdf output. Vanilla kableExtra::kable() should do it?
parent d45c1667
No related branches found
No related tags found
2 merge requests!3merge a few edits,!2fixed pdf build
...@@ -27,7 +27,6 @@ output: ...@@ -27,7 +27,6 @@ output:
toc: yes toc: yes
toc_depth: 2 toc_depth: 2
fig_width: 5 fig_width: 5
always_allow_html: yes
bibliography: '`r paste0(here::here(), "/bibliography.bib")`' bibliography: '`r paste0(here::here(), "/bibliography.bib")`'
--- ---
...@@ -64,7 +63,7 @@ We have some electricity substation feeder data that has been cleaned to give me ...@@ -64,7 +63,7 @@ We have some electricity substation feeder data that has been cleaned to give me
There seem to be some NA kW values and a lot of missing time stamps. We want to select the 'best' (i.e most complete) days within a day-of-the-week/season/year sampling frame. If we can't do that we may have to resort to seasonal mean kW profiles by hour & day of the week... There seem to be some NA kW values and a lot of missing time stamps. We want to select the 'best' (i.e most complete) days within a day-of-the-week/season/year sampling frame. If we can't do that we may have to resort to seasonal mean kW profiles by hour & day of the week...
Code used to generate this report: https://git.soton.ac.uk/ba1e12/spatialec/-/blob/master/isleOfWight/cleaningFeederData.Rmd The code used to generate this report is in: https://git.soton.ac.uk/ba1e12/dataCleaning/Rmd/
# Data prep # Data prep
...@@ -78,7 +77,7 @@ origDataDT <- drake::readd(origData) # readd the drake object ...@@ -78,7 +77,7 @@ origDataDT <- drake::readd(origData) # readd the drake object
uniqDataDT <- drake::readd(uniqData) # readd the drake object uniqDataDT <- drake::readd(uniqData) # readd the drake object
kableExtra::kable(head(origDataDT), digits = 2, kableExtra::kable(head(origDataDT), digits = 2,
caption = "Counts per feeder (long table)") %>% caption = "First 6 rows of data") %>%
kable_styling() kable_styling()
``` ```
...@@ -89,16 +88,19 @@ message("Original data nrows: ", tidyNum(nrow(origDataDT))) ...@@ -89,16 +88,19 @@ message("Original data nrows: ", tidyNum(nrow(origDataDT)))
message("Unique data nrows: ", tidyNum(nrow(uniqDataDT))) message("Unique data nrows: ", tidyNum(nrow(uniqDataDT)))
message("So we have ", tidyNum(nrow(origDataDT) - nrow(uniqDataDT)), " duplicates...") nDups <- tidyNum(nrow(origDataDT) - nrow(uniqDataDT))
message("So we have ", tidyNum(nDups), " duplicates...")
pc <- 100*((nrow(origDataDT) - nrow(uniqDataDT))/nrow(origDataDT)) pc <- 100*((nrow(origDataDT) - nrow(uniqDataDT))/nrow(origDataDT))
message("That's ", round(pc,2), "%") message("That's ", round(pc,2), "%")
feederDT <- uniqDataDT[!is.na(rDateTime)] # use dt with no duplicates feederDT <- uniqDataDT[!is.na(rDateTime)] # use dt with no duplicates
origDataDT <- NULL # save memory origDataDT <- NULL # save memory
``` ```
There were `r tidyNum(nrow(origDataDT) - nrow(uniqDataDT))` duplicates - that's `r round(pc,2)` % of the observations loaded. There were `r tidyNum(nDups)` duplicates - that's ~ `r round(pc,2)` % of the observations loaded.
So we remove the duplicates... So we remove the duplicates...
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment