README.md 2.03 KB
Newer Older
Ben Anderson's avatar
Ben Anderson committed
1
2
# dataCleaning

Ben Anderson's avatar
Ben Anderson committed
3
A place to store hints, tips and examples for data cleaning. We use a lot of very dirty data which often has outliers and missing observations. Since most of this data is large scale 'sensor' data with time stamps we make a lot of use of these R packages to process and visualise the data so we can see what is odd and what is missing:
Ben Anderson's avatar
Ben Anderson committed
4

Ben Anderson's avatar
Ben Anderson committed
5
6
7
 * [data.table](https://rdatatable.gitlab.io/data.table/) - very fast data loading and wrangling
 * [lubridate](https://lubridate.tidyverse.org/) - _the_ way to do dates and dateTimes in R
 * [hms](https://hms.tidyverse.org/) - deals with time (HH:MM:SS)
Ben Anderson's avatar
Ben Anderson committed
8
 * [ggplot2](https://ggplot2.tidyverse.org/) - plots, especially using [geom_tile()](https://ggplot2.tidyverse.org/reference/geom_tile.html) with date on the x axis, time of day on the y and 'fill' set to the sensor value that _should_ be there. This shows up non-random (and random) data holes like [these](https://git.soton.ac.uk/SERG/datacleaning/-/blob/master/rmd/cleaningFeederData_files/figure-latex/missingVis-1.pdf) very nicely.
9

Ben Anderson's avatar
Ben Anderson committed
10
11
12
13
This repo is an R package. This means:

 * package functions are kept in /R
 * help files auto-created by roxygen are in /man
Ben Anderson's avatar
Ben Anderson committed
14
 * if you clone it you can build it and use the functions
Ben Anderson's avatar
Ben Anderson committed
15
16
17
 * drake::rmake(source = "_drake_XX.R") is run from inside a make_XX.R file
 * the drake plan is kept inside _drake_XX.R along with the functions and package loading. This is not quite what the drake book recommends but it works for us
 * Rmd scripts called by the drake plan in _drake_XX.R and used for reporting the results of drake plans are in /Rmd
18
 * outputs are kept in /docs (reports, plots etc)
Ben Anderson's avatar
Ben Anderson committed
19
 * you (and we) keep your data out of it!
20
21
22

We'd love your contributions - feel free to:

Ben Anderson's avatar
Ben Anderson committed
23
24
 * [fork & go](https://happygitwithr.com/fork-and-clone.html)
 * make a [new branch](https://git.soton.ac.uk/SERG/workflow/-/blob/master/howTo/gitBranches.md) in your fork
25
 * make some improvements
Ben Anderson's avatar
Ben Anderson committed
26
 * send us a pull request (just code, no data please, keep your data [elsewhere](https://git.soton.ac.uk/SERG/workflow/-/blob/master/howTo/otherResources.md)!)