Update README.md

e0bba631 · Ben Anderson · f8cc41ce · e0bba631
Commit e0bba631 authored 4 years ago by Ben Anderson
--- a/README.md
+++ b/README.md
+# dirtyData
+
+Similar to [Dark Data](https://press.princeton.edu/books/hardcover/9780691182377/dark-data) but possibly even nastier...
+
 # dataCleaning

-A place to store hints, tips and examples for data cleaning. We use a lot of very dirty data which often has outliers and missing observations. Since most of this data is large scale 'sensor' data with time stamps we make a lot of use of these R packages to process and visualise the data so we can see what is odd and what is missing:
+We use a lot of very dirty data which often has outliers and missing observations. Since most of this data is large scale 'sensor' data with time stamps we make a lot of use of these R packages to process and visualise the data so we can see what is odd and what is missing:

 * [data.table](https://rdatatable.gitlab.io/data.table/) - very fast data loading and wrangling
 * [lubridate](https://lubridate.tidyverse.org/) - _the_ way to do dates and dateTimes in R
 * [hms](https://hms.tidyverse.org/) - deals with time (HH:MM:SS)
 * [ggplot2](https://ggplot2.tidyverse.org/) - plots, especially using [geom_tile()](https://ggplot2.tidyverse.org/reference/geom_tile.html) with date on the x axis, time of day on the y and 'fill' set to the sensor value that _should_ be there. This shows up non-random (and random) data holes like [these](https://git.soton.ac.uk/SERG/datacleaning/-/blob/master/docs/report_cleanFeeders_allData.pdf) very nicely.

+# useIt
+
 This repo is an R package. This means:

 * package functions are kept in /R
@@ -20,6 +26,8 @@ This repo is an R package. This means:
     * if you can, **run Rscript ./make_cleanFeeders.R in a terminal not at the RStudio console** <- this stops RStudio from locking up
 * you (and we) keep your data out of it!

+# contributeToIt
+
 We'd love your contributions - feel free to:

 * [fork & go](https://happygitwithr.com/fork-and-clone.html)