From f60b011638555bc700c09783548ae6b73950689c Mon Sep 17 00:00:00 2001 From: Ben Anderson <b.anderson@soton.ac.uk> Date: Wed, 8 Jul 2020 11:23:12 +0100 Subject: [PATCH] Update README.md --- README.md | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index e42c183..b4ee2c9 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,11 @@ # dataCleaning -A place to store hints, tips and examples for data cleaning. We use a lot of very dirty data. +A place to store hints, tips and examples for data cleaning. We use a lot of very dirty data which often has outliers and missing observations. Since most of this data is large scale 'sensor' data with time stamps we make a lot of use of t hese R packages: + + * [data.table](https://rdatatable.gitlab.io/data.table/) + * [lubridate](https://lubridate.tidyverse.org/) + * [hms](https://hms.tidyverse.org/) + * [ggplot2](https://ggplot2.tidyverse.org/)'s [geom_tile()](https://ggplot2.tidyverse.org/reference/geom_tile.html) with time of day on the date on the x axis, time on the y and 'fill' set to the sensor value that _should_ be there. This shows up non-random (and random) data holes like [these](https://git.soton.ac.uk/SERG/datacleaning/-/blob/master/rmd/cleaningFeederData_files/figure-latex/missingVis-1.pdf) very nicely. This repo is an R package. This means: -- GitLab