diff --git a/howTo/keepingData.md b/howTo/keepingData.md new file mode 100644 index 0000000000000000000000000000000000000000..51e780fea272e38bf450abdc9497c1ffc31bb0f5 --- /dev/null +++ b/howTo/keepingData.md @@ -0,0 +1,31 @@ +# Where to keep your data + +Let's start by offering some advice on where _not_ to keep your data: + + * in your git/hub/lab repo because: + * your repo will bloat + * you may accidentally publish it via github/gitlab + * every time you make new or save new data git will try to synch it with your repo. This will _hammer_ your internet connection and make your git commit process almost unusuable + * github/gitlab will refuse to store data of any useful size + * on Dropbox/Sharepoint/oneCloud/whatever or similar because: + * every time you make new or save new data your Dropbox/Sharepoint/oneCloud/whatever will try to synch it. This will _hammer_ your internet connection + * only on your laptop/PC because: + * they crash and you'll lose it + * you'll lose it and someone could find/steal/disclose it + * on a usb drive because + * see previous + +OK, so where _should_ you keep your data? There are basically two types of places: + + * an institutional file store to which you have access + * a cloud data service to which you can send your code such as AWS, google etc + +For most of you the first option may be the only one available for institutional policy reasons. In the case of the University of Southampton you should [read the policy on data storage](https://library.soton.ac.uk/researchdata/storage) including the advice on how to store and transfer data securely. Our suggested options are: + + * your [personal filestore](https://knowledgenow.soton.ac.uk/Articles/KB0011651) which, by default, _only_ allows 50GB or + * the research filestore (**preferred**) which is accessible via the [web](https://fwa.soton.ac.uk/), via SMB (i.e. `J:\` drive (AKA `\\soton.ac.uk\resource\`) - use the VPN) and, in the case of SERG data, via `/mnt/SERG_data` on the University's RStudio server. + + We recommend you use the research filestore because it can hold much larger data volumes and can be made accessible to your colleagues/supervisors if required. For reasons of speed this implies you either: + + * mount the filestore/J: drive on your laptop/PC and use a local version of (e.g.) RStudio to load the data. If you are doing this you might want to learn about [drake][drake.md] so the data is only loaded over the network the first time you run your code, not each time. + * \ No newline at end of file