Administrator approval is now required for registering new accounts. If you are registering a new account, and are external to the University, please ask the repository owner to contact ServiceLine to request your account be approved. Repository owners must include the newly registered email address, and specific repository in the request for approval.

Commit cd3ccc9d authored by B.Anderson's avatar B.Anderson
Browse files

added reporting of duplicates; trying to fix tables for pdf output. Vanilla...

added reporting of duplicates; trying to fix tables for pdf output. Vanilla kableExtra::kable() should do it?
parent d45c1667
......@@ -27,7 +27,6 @@ output:
toc: yes
toc_depth: 2
fig_width: 5
always_allow_html: yes
bibliography: '`r paste0(here::here(), "/bibliography.bib")`'
---
......@@ -64,7 +63,7 @@ We have some electricity substation feeder data that has been cleaned to give me
There seem to be some NA kW values and a lot of missing time stamps. We want to select the 'best' (i.e most complete) days within a day-of-the-week/season/year sampling frame. If we can't do that we may have to resort to seasonal mean kW profiles by hour & day of the week...
Code used to generate this report: https://git.soton.ac.uk/ba1e12/spatialec/-/blob/master/isleOfWight/cleaningFeederData.Rmd
The code used to generate this report is in: https://git.soton.ac.uk/ba1e12/dataCleaning/Rmd/
# Data prep
......@@ -78,7 +77,7 @@ origDataDT <- drake::readd(origData) # readd the drake object
uniqDataDT <- drake::readd(uniqData) # readd the drake object
kableExtra::kable(head(origDataDT), digits = 2,
caption = "Counts per feeder (long table)") %>%
caption = "First 6 rows of data") %>%
kable_styling()
```
......@@ -89,16 +88,19 @@ message("Original data nrows: ", tidyNum(nrow(origDataDT)))
message("Unique data nrows: ", tidyNum(nrow(uniqDataDT)))
message("So we have ", tidyNum(nrow(origDataDT) - nrow(uniqDataDT)), " duplicates...")
nDups <- tidyNum(nrow(origDataDT) - nrow(uniqDataDT))
message("So we have ", tidyNum(nDups), " duplicates...")
pc <- 100*((nrow(origDataDT) - nrow(uniqDataDT))/nrow(origDataDT))
message("That's ", round(pc,2), "%")
feederDT <- uniqDataDT[!is.na(rDateTime)] # use dt with no duplicates
origDataDT <- NULL # save memory
```
There were `r tidyNum(nrow(origDataDT) - nrow(uniqDataDT))` duplicates - that's `r round(pc,2)` % of the observations loaded.
There were `r tidyNum(nDups)` duplicates - that's ~ `r round(pc,2)` % of the observations loaded.
So we remove the duplicates...
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment