diff --git a/README.md b/README.md index 3c63fa818f5a78dddbdbf3d64690c98c15cef1ab..9b155d54189eec347bcdc50fb31565b06ceba698 100644 --- a/README.md +++ b/README.md @@ -1,12 +1,18 @@ +# dirtyData + +Similar to [Dark Data](https://press.princeton.edu/books/hardcover/9780691182377/dark-data) but possibly even nastier... + # dataCleaning -A place to store hints, tips and examples for data cleaning. We use a lot of very dirty data which often has outliers and missing observations. Since most of this data is large scale 'sensor' data with time stamps we make a lot of use of these R packages to process and visualise the data so we can see what is odd and what is missing: +We use a lot of very dirty data which often has outliers and missing observations. Since most of this data is large scale 'sensor' data with time stamps we make a lot of use of these R packages to process and visualise the data so we can see what is odd and what is missing: * [data.table](https://rdatatable.gitlab.io/data.table/) - very fast data loading and wrangling * [lubridate](https://lubridate.tidyverse.org/) - _the_ way to do dates and dateTimes in R * [hms](https://hms.tidyverse.org/) - deals with time (HH:MM:SS) * [ggplot2](https://ggplot2.tidyverse.org/) - plots, especially using [geom_tile()](https://ggplot2.tidyverse.org/reference/geom_tile.html) with date on the x axis, time of day on the y and 'fill' set to the sensor value that _should_ be there. This shows up non-random (and random) data holes like [these](https://git.soton.ac.uk/SERG/datacleaning/-/blob/master/docs/report_cleanFeeders_allData.pdf) very nicely. +# useIt + This repo is an R package. This means: * package functions are kept in /R @@ -20,6 +26,8 @@ This repo is an R package. This means: * if you can, **run Rscript ./make_cleanFeeders.R in a terminal not at the RStudio console** <- this stops RStudio from locking up * you (and we) keep your data out of it! +# contributeToIt + We'd love your contributions - feel free to: * [fork & go](https://happygitwithr.com/fork-and-clone.html)