Let's start by offering some advice on where _not_ to keep your data:
* in your git/hub/lab repo because:
* your repo will bloat
* you may accidentally publish it via github/gitlab
* every time you make new or save new data git will try to synch it with your repo. This will _hammer_ your internet connection and make your git commit process almost unusuable
* github/gitlab will refuse to store data of any useful size
* on Dropbox/Sharepoint/oneCloud/whatever or similar because:
* every time you make new or save new data your Dropbox/Sharepoint/oneCloud/whatever will try to synch it. This will _hammer_ your internet connection
* only on your laptop/PC because:
* they crash and you'll lose it
* you'll lose it and someone could find/steal/disclose it
* on a usb drive because
* see previous
OK, so where _should_ you keep your data? There are basically two types of places:
* an institutional file store to which you have access
* a cloud data service to which you can send your code such as AWS, google etc
For most of you the first option may be the only one available for institutional policy reasons. In the case of the University of Southampton you should [read the policy on data storage](https://library.soton.ac.uk/researchdata/storage) including the advice on how to store and transfer data securely. Our suggested options are:
* your [personal filestore](https://knowledgenow.soton.ac.uk/Articles/KB0011651) which, by default, _only_ allows 50GB or
* the research filestore (**preferred**) which is accessible via the [web](https://fwa.soton.ac.uk/), via SMB (i.e. `J:\` drive (AKA `\\soton.ac.uk\resource\`) - use the VPN) and, in the case of SERG data, via `/mnt/SERG_data` on the University's RStudio server.
We recommend you use the research filestore because it can hold much larger data volumes and can be made accessible to your colleagues/supervisors if required. For reasons of speed this implies you either:
* mount the filestore/J: drive on your laptop/PC and use a local version of (e.g.) RStudio to load the data. If you are doing this you might want to learn about [drake][drake.md] so the data is only loaded over the network the first time you run your code, not each time.