uosdocs.dtx
Where to keep your data
Let's start by offering some advice on where not to keep your data:
- in your git/hub/lab repo because:
- your repo will bloat
- you may accidentally publish it via github/gitlab
- every time you make new or save new data git will try to synch it with your repo. This will hammer your internet connection and make your git commit process almost unusuable
- github/gitlab will refuse to store data of any useful size
- on Dropbox/Sharepoint/oneCloud/whatever or similar because:
- it may breach your institutional policy on data storage
- every time you make new or save new data your Dropbox/Sharepoint/oneCloud/whatever will try to synch it. This will hammer your internet connection
- only on your laptop/PC because:
- they crash and you'll lose it
- you'll lose it and someone could find/steal/disclose it
- on a usb drive because
- see previous
OK, so where should you keep your data? There are basically two types of places:
- an institutional file store to which you have access
- a cloud data service to which you can send your code such as AWS, google etc
For most of you the first option may be the only one available for institutional policy reasons. In the case of the University of Southampton you should read the policy on data storage including the advice on how to store and transfer data securely. Our suggested options are:
- your personal filestore
AKA \\filestore.soton.ac.uk\Users\<username>\, AKA “My Documents”)
which, by default, only allows 50GB or -
preferred -> the resource drive (AKA
J:\
or\\soton.ac.uk\resource\
) which is accessible via the web, via SMB (use the VPN) and, in the case of SERG data, via/mnt/SERG_data
on the University's RStudio server.
We recommend you use the J:\
drive because it can hold much larger data volumes and can be made accessible to your colleagues/supervisors if required. For reasons of speed this implies you either:
- use the University SVE to run RStudio 'on campus' and thus close to the data. We have found some problems with persistence of installed packages in between SVE sessions if you try this
- mount the
J:\
drive on your laptop/PC and use a local version of (e.g.) RStudio to load the data. If you are doing this you might want to learn about drake so the data is only loaded over the network the first time you run your code, not each time. - get access to and use the University's RStudio server <- best option, this will let you run your code on the research filestore directly
As far as we know the University's Research Filestore
(AKA \\xxx.files.soton.ac.uk\<SHARE>\
) does not allow direct access so it can only be used to archive data you are not actively using. See https://library.soton.ac.uk/researchdata/storage for more info.
Update: we understand that the University is looking to transition from the J drive to the use of oneDrive/sharepoint:
_"Over the course of this year, and next, we will be looking to move the University off J:Drive and utilise OneDrive for Business, as part of our Filestore Migration Project.
Filestore Migration will:
-
Move personal files from ‘My Documents’ and ‘Desktop’ to One Drive for Business, a Microsoft cloud service that connects you to all your files.
-
Move shared files e.g. J: drive to SharePoint Online.
Further information can be found here: Storing files in O365
Please note: For people and processes that have a business need to maintain traditional filestore (such as Linux desktop users), there will be a process to request this instead of One Drive for Business or SharePoint Online."_
We have no idea what this will mean for accessing data from the Unviersity's RStudio server service. We currently have am open ticket with iSolutions on this.