Administrator approval is now required for registering new accounts. If you are registering a new account, and are external to the University, please ask the repository owner to contact ServiceLine to request your account be approved. Repository owners must include the newly registered email address, and specific repository in the request for approval.

Commit e0bba631 authored by Ben Anderson's avatar Ben Anderson
Browse files

Update README.md

parent f8cc41ce
# dirtyData
Similar to [Dark Data](https://press.princeton.edu/books/hardcover/9780691182377/dark-data) but possibly even nastier...
# dataCleaning
A place to store hints, tips and examples for data cleaning. We use a lot of very dirty data which often has outliers and missing observations. Since most of this data is large scale 'sensor' data with time stamps we make a lot of use of these R packages to process and visualise the data so we can see what is odd and what is missing:
We use a lot of very dirty data which often has outliers and missing observations. Since most of this data is large scale 'sensor' data with time stamps we make a lot of use of these R packages to process and visualise the data so we can see what is odd and what is missing:
* [data.table](https://rdatatable.gitlab.io/data.table/) - very fast data loading and wrangling
* [lubridate](https://lubridate.tidyverse.org/) - _the_ way to do dates and dateTimes in R
* [hms](https://hms.tidyverse.org/) - deals with time (HH:MM:SS)
* [ggplot2](https://ggplot2.tidyverse.org/) - plots, especially using [geom_tile()](https://ggplot2.tidyverse.org/reference/geom_tile.html) with date on the x axis, time of day on the y and 'fill' set to the sensor value that _should_ be there. This shows up non-random (and random) data holes like [these](https://git.soton.ac.uk/SERG/datacleaning/-/blob/master/docs/report_cleanFeeders_allData.pdf) very nicely.
# useIt
This repo is an R package. This means:
* package functions are kept in /R
......@@ -20,6 +26,8 @@ This repo is an R package. This means:
* if you can, **run Rscript ./make_cleanFeeders.R in a terminal not at the RStudio console** <- this stops RStudio from locking up
* you (and we) keep your data out of it!
# contributeToIt
We'd love your contributions - feel free to:
* [fork & go](https://happygitwithr.com/fork-and-clone.html)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment