A place to store hints, tips and examples for data cleaning. We use a lot of very dirty data.
A place to store hints, tips and examples for data cleaning. We use a lot of very dirty data which often has outliers and missing observations. Since most of this data is large scale 'sensor' data with time stamps we make a lot of use of t hese R packages:
*[ggplot2](https://ggplot2.tidyverse.org/)'s [geom_tile()](https://ggplot2.tidyverse.org/reference/geom_tile.html) with time of day on the date on the x axis, time on the y and 'fill' set to the sensor value that _should_ be there. This shows up non-random (and random) data holes like [these](https://git.soton.ac.uk/SERG/datacleaning/-/blob/master/rmd/cleaningFeederData_files/figure-latex/missingVis-1.pdf) very nicely.