Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found
Select Git revision
  • master
1 result

Target

Select target project
No results found
Select Git revision
  • master
1 result
Show changes
29 files
+ 1144
1510
Compare changes
  • Side-by-side
  • Inline

Files

+3 −1
Original line number Diff line number Diff line
@@ -6,3 +6,5 @@
# OS X stuff - https://gist.github.com/adamgit/3786883
.DS_Store
.Trashes
# sensitive files
/howTo/r-with-aws/access_keys/aws_access.R
 No newline at end of file
+1 −1
Original line number Diff line number Diff line
@@ -5,6 +5,6 @@ Be nice:
    * work in a branch of your fork
    * make a pull request to merge your branch to the master
 
Don't know how to do this? [Read our guide](gitBranches.md).
Don't know how to do this? [Read our guide](howTo/gitBranches.md).
 
Need more info? Read the excellent [contributing to opensource code](https://opensource.guide/how-to-contribute/) guidance.
 No newline at end of file
+29 −6
Original line number Diff line number Diff line
@@ -4,10 +4,33 @@ How we do collaborative reproducible data analysis and reporting. Mostly (but no

This repo does three things:

 * it is a collection of R [how-to resources](resources.md) including some notes on:
   * how to [use git branches](gitBranches.md) 
   * how to use [drake](https://docs.ropensci.org/drake/) to massively speed up and [manage your workflow](https://milesmcbain.xyz/the-drake-post/)
 * it is a [template](template.md) repo that illustrates how we work and which you can copy;
 * it is an R package that you can build if you want to using 'install and restart' from the RStudio Build menu. If you do you will then be able to use its functions viz: `woRkflow::functionName()` (not that it has many).
 * it is a collection of R [how-to resources](howTo/) including some notes on:
   * how to [use git branches](howTo/gitBranches.md) 
   * how to use [drake](howTo/drake.md) to massively speed up and [manage your workflow](https://milesmcbain.xyz/the-drake-post/) (NB: drake has been superseded by [targets](https://books.ropensci.org/targets/) - update on the notes soon)
   * how to access the University [Iridis HPC](howTo/iridis.md)
   * how to use R/RStudio on the University [SVE (remote desktop) service](howTo/sve.md)
   * where to [keep your data](howTo/keepingData.md)
   * how to use [renv](howTo/renv.md) to manage your R environment - including packages
   * how to access Amazon Web Services S3 buckets directly from R using [aws.s3](howTo/aws-s3.md)
 * it is a [template](repoAsATemplate.md) repo that illustrates how we work and which you can copy;
 * it is an R package. This means:
     * package functions are kept in /R
     * help files auto-created by roxygen are in /man
     * if you clone it you can build it using 'install and restart' from the RStudio Build menu and use the functions viz: `woRkflow::functionName()` (not that it has many)
 
If you want to [contribute to the repo](CONTRIBUTING.md) or like how we work and want to use it as a template for your project or package, just [fork and go](https://happygitwithr.com/fork-and-clone.html).
Using drake:
 
 * make_XX.R contains a call to drake::r_make(source = "_drake_XX.R")
 * _drake_XX.R contains the drake plan and the functions & package loading. This is not quite what the [drake book](https://books.ropensci.org/drake/projects.html#usage) recommends but it works for us
 * Rmd scripts called by the drake plan to report results are kept in /Rmd
 * outputs are kept in /docs (reports, plots etc)
 * if you can, **run Rscript ./make_cleanFeeders.R in a terminal not at the RStudio console** <- this stops RStudio from locking up

We'd love your [contributions](CONTRIBUTING.md) - feel free to:

 * [fork & go](https://happygitwithr.com/fork-and-clone.html)
 * make a [new branch](https://git.soton.ac.uk/SERG/workflow/-/blob/master/howTo/gitBranches.md) in your fork
 * make some improvements
 * send us a pull request (just code, no data please, keep your data [elsewhere](https://git.soton.ac.uk/SERG/workflow/-/blob/master/howTo/keepingData.md)!)

As a number of people have pointed out [fork & go](https://happygitwithr.com/fork-and-clone.html) only works if you have an account on git.soton.ac.uk. If you don't, you can import the repo to (for example) your [github.com](https://github.com/new/import) account and go from there. And presumably on gitlab.com too...
Original line number Diff line number Diff line
@@ -29,6 +29,7 @@ bibliography: '`r here::here("bibliography.bib")`'
---

```{r knitrSetup, include=FALSE}
startTime <- proc.time()
knitr::opts_chunk$set(echo = FALSE) # by default turn off code echo
```

@@ -56,17 +57,16 @@ There's quite a lot of data...
drake::readd(gWPlot)
```

# Runtime
# R environment

```{r check runtime, include=FALSE}
# within Rmd timing
t <- proc.time() - startTime
elapsed <- t[[3]]
```

Report generated in `r round(elapsed,2)` seconds ( `r round(elapsed/60,2)` minutes) using [knitr](https://cran.r-project.org/package=knitr) in [RStudio](http://www.rstudio.com) with `r R.version.string` running on `r R.version$platform`.

# R environment

## R packages used

 * base R [@baseR]
Original line number Diff line number Diff line
@@ -9,7 +9,8 @@ reqLibs <- c("data.table", # data munching
             "here", # here
             "lubridate", # dates and times
             "ggplot2", # plots
             "skimr" # for skim
             "skimr", # for skim
             "bookdown" # for making reports (should also install knittr etc)
)
# load them
woRkflow::loadLibraries(reqLibs)
@@ -37,7 +38,7 @@ getData <- function(f,update){
makeGWPlot <- function(dt){
  # expects the eso data as a data.table
  # draws a plot
  dt[, rDateTime := lubridate::ymd_hms(DATETIME)]
  dt[, rDateTime := lubridate::ymd_hms(DATETIME)] # hurrah, somebody read https://speakerdeck.com/jennybc/how-to-name-files?slide=21
  dt[, weekDay := lubridate::wday(rDateTime, label = TRUE)]
  # draw a megaplot for illustrative purposes
  p <- ggplot2::ggplot(dt, aes(x = rDateTime, 
@@ -50,7 +51,7 @@ makeGWPlot <- function(dt){
         caption = "Source: UK Grid ESO (http://data.nationalgrideso.com)")
  return(p)
}

version <- 10
makeReport <- function(f){
  # default = html
  rmarkdown::render(input = paste0(here::here("Rmd", f),".Rmd"), # we love here:here() - it helps us find the .Rmd to use
@@ -60,19 +61,21 @@ makeReport <- function(f){
                    output_file = paste0(here::here("docs", f),".html") # where the output goes
  )
}

# Set up ----
startTime <- proc.time()

# Set the drake plan ----
# Clearly this will fail if you do not have internet access...
plan <- drake::drake_plan(
  esoData = getData(urlToGet, update), # returns data as data.table. If you edit update in any way it will reload - drake is watching you!
  esoData = getData(urlToGet, update), # returns data as data.table. If you edit 'update' in any way it will reload - drake is watching you!
  skimTable = skimr::skim(esoData), # create a data description table
  gWPlot = makeGWPlot(esoData) # make a plot
)

# Run drake plan ----
plan # test the plan

make(plan) # run the plan, re-loading data if needed

# Run the report ----
@@ -80,6 +83,12 @@ make(plan) # run the plan, re-loading data if needed
# drake can't seem to track the .rmd file if it is not explicitly named
makeReport(rmdFile)

# Just to show we can bring spirits back from the deep (i.e. from wherever drake hid them)
dt <- drake::readd(esoData)
dt[, rDateTimeUTC := lubridate::as_datetime(DATETIME)]

message("Data covers ", min(dt$rDateTimeUTC), " to ", max(dt$rDateTimeUTC))

# Finish off ----

t <- proc.time() - startTime # how long did it take?

_drake_basicReport.R

0 → 100644
+77 −0
Original line number Diff line number Diff line
# basic drake makefile

# Libraries ----
library(woRkflow) # remember to build it first :-)
woRkflow::setup() # load env.R set up the default paths etc

reqLibs <- c("data.table", # data munching
             "drake", # what's done stays done
             "here", # here
             "lubridate", # dates and times
             "ggplot2", # plots
             "skimr" # for skim
)
# load them
woRkflow::loadLibraries(reqLibs)

# Parameters ----

# Some data to play with:
# https://data.nationalgrideso.com/carbon-intensity1/historic-generation-mix/r/historic_gb_generation_mix

urlToGet <- "http://data.nationalgrideso.com/backend/dataset/88313ae5-94e4-4ddc-a790-593554d8c6b9/resource/7b41ea4d-cada-491e-8ad6-7b62f6a63193/download/df_fuel_ckan.csv"
update <- "please" # edit this in any way (at all) to get drake to re-load the data from the url
rmdFile <- "basicReport" # <- name of the .Rmd file to run at the end 
title <- "UK Electricity Generation"
subtitle <- "UK ESO grid data"
authors <- "Ben Anderson"

# Functions ----
# for use in drake
getData <- function(f,update){
  # gets the data
  message("Getting data from: ", f)
  dt <- data.table::fread(f)
  return(dt)
}

makeGWPlot <- function(dt){
  message("Rebuilding plot")
  # expects the eso data as a data.table
  # draws a plot
  dt[, rDateTime := lubridate::ymd_hms(DATETIME)] # hurrah, somebody read https://speakerdeck.com/jennybc/how-to-name-files?slide=21
  dt[, weekDay := lubridate::wday(rDateTime, label = TRUE)]
  # draw a megaplot for illustrative purposes
  p <- ggplot2::ggplot(dt, aes(x = rDateTime, 
                               y = GENERATION/1000,
                               colour = weekDay)) +
    geom_point() +
    theme(legend.position = "bottom") +
    labs(x = "Time",
         y = "Generation (GW - mean per halfhour?)",
         caption = "Source: UK Grid ESO (http://data.nationalgrideso.com)")
  return(p)
}

makeReport <- function(f){
  message("Re-running report: ", f)
  # default = html
  rmarkdown::render(input = paste0(here::here("Rmd", f),".Rmd"), # we love here:here() - it helps us find the .Rmd to use
                    params = list(title = title,
                                  subtitle = subtitle,
                                  authors = authors),
                    output_file = paste0(here::here("docs", f),".html") # where the output goes
  )
}

# Set the drake plan ----
# Clearly this will fail if you do not have internet access...
plan <- drake::drake_plan(
  esoData = getData(urlToGet, update), # returns data as data.table. If you edit 'update' in any way it will reload - drake is watching you!
  skimTable = skimr::skim(esoData), # create a data description table
  gWPlot = makeGWPlot(esoData), # make a plot
  out = makeReport(rmdFile)
)

drake::drake_config(plan, verbose = 2)
+210 −1486

File changed.

Preview size limit exceeded, changes collapsed.

howTo/.gitkeep

0 → 100644
+0 −0
Original line number Diff line number Diff line

howTo/aws-s3.md

0 → 100644
+29 −0
Original line number Diff line number Diff line
# Guide to accessing data from Amazon Web Services (AWS) S3 buckets using R

This guide provides details of how to set up and access files/data stored within an AWS S3 bucket directly from an R session using the [aws.s3 package](https://github.com/cloudyr/aws.s3).

Prerequisite: access to the AWS account where the S3 bucket is located in order to create a user access policy.

## Creating a user access policy (in the AWS console)

Following guidance here: https://www.gormanalysis.com/blog/connecting-to-aws-s3-with-r/

Create user 'rconnector' ... and create user policy 'test-bucket-connector' (see [example access policy](howTo/r-with-aws/rconnector-access-policy).

Make sure to save access key ID and secret access key to use with S3 API client.

Use these details to set the following environment variable (see below for code) and store the credentials in an R script e.g. in your project folder (in this example in a subfolder called [access keys](howTo/r-with-aws/access_keys). Note, for security exclude this file from the project repository by adding to your .gitignore file). The R script will look something like the following ...

```
Sys.setenv(
  "AWS_ACCESS_KEY_ID" = "mykey",
  "AWS_SECRET_ACCESS_KEY" = "mysecretkey",
  "AWS_DEFAULT_REGION" = "eu-west-2"
)
```

An example script can be found [here](howTo/r-with-aws/access_keys/example_credentials_script.R).

## Connecting to the S3 bucket with R

You're ready to go! See [example code](howTo/r-with-aws/using_aws-s3_example.R) showing some commands to authenticate R with AWS and read and write files from/to AWS S3 buckets.

howTo/drake.md

0 → 100644
+11 −0
Original line number Diff line number Diff line
NB: drake has been superseded by [targets](https://books.ropensci.org/targets/) - more soon

# drake:

 * use [drake](https://docs.ropensci.org/drake/) to massively speed up and [manage your workflow](https://milesmcbain.xyz/the-drake-post/). This includes always:
    * loading and processing all your data inside a drake plan in a .R file. _So it only gets re-run if the code or data changes_
    * creating each of your output objects inside the drake plan. _So they only get re-created if the code or data changes_
    * rendering your .Rmd report at the end of the drake plan. _So you can pass the params in and report the output objects_
    * => the first time you run the plan it will build everything. The second time, e.g. after you fix a .Rmd typo, _only the bits that have changed get re-built_. **Warning: drake can reduce the time it takes to run your code by an order of magnitude. This could seriously damage your tea & cake in-take...**
 
We have an example of [using drake](https://git.soton.ac.uk/SERG/workflow/-/blob/master/Rmd/make_basicReport.R)
 No newline at end of file

howTo/iridis.md

0 → 100644
+74 −0
Original line number Diff line number Diff line
# How to use the UoS Iridis HPC

HPC = [High Performance Computer](https://www.southampton.ac.uk/isolutions/staff/iridis.page). Lots of memory, lots of processors, good if you can parallelise your code. Can you?

## Which Iridis?

We have:

 * _Iridis 4: https://hpc.soton.ac.uk/redmine/projects/iridis-4-support/wiki (**retiring February 2022**)_
 * Iridis 5: https://hpc.soton.ac.uk/redmine/projects/iridis-5-support/wiki

5 is newer, bigger & better etc etc but new projects start on 4 (_before February 2022_) and, presumably when you prove you need 5, you can transfer.

## How to access
Fill in the form on the [iridis page](https://www.southampton.ac.uk/isolutions/staff/iridis.page).

## How to login

OK, so you are going to need to [get a bit friendly with unix](https://www.dummies.com/computers/operating-systems/unix/) because the Iridis service runs unix (linux). In general you will be interacting with it using a command line tool like terminal/console etc. Back to the good old days. 

Assuming you want iridis 5 to start with you can login to any of:

 * iridis5_a.soton.ac.uk
 * iridis5_b.soton.ac.uk
 * iridis5_c.soton.ac.uk

To do this:

 * make sure you are running the UoS vpn
 * open a terminal/console window on your desktop
 * type `ssh YourUsername@iridis5_a.soton.ac.uk` (or which ever you want) & hit return. YourUsername = your UoS username obvs
 * type `yes` to the fingerprint question (DO NOT PANIC!). This should only happen the first time you do this
 * type your UoS password & hit return
 * you're in!

## Now what?

Read these first! 

 * https://hpc.soton.ac.uk/redmine/projects/iridis-5-support/wiki/New_User_Warnings <- **especially this one!**
 * https://hpc.soton.ac.uk/redmine/projects/iridis-5-support/wiki/Getting_started

## R

https://hpc.soton.ac.uk/redmine/projects/iridis-5-support/wiki/R

After you've [loaded the R module](https://hpc.soton.ac.uk/redmine/projects/iridis-5-support/wiki/R), you can interactively run R scripts by running R. Or you can run them non-interactively using the `Rscript <yourScript.R>` command. But you probably shouldn't. 

Instead you should submit your commands (e.g. `Rscript myScript.R`) to the Iridis scheduling system so it can balance everyone's needs. You need to [learn how to do this](https://hpc.soton.ac.uk/redmine/projects/iridis-5-support/wiki/Job_Submission)...

## How to get your code on Iridis

If you've set up a repo on git.soton it's easy - Iridis has git installed. You will have to use git at the command line but it's [not so bad](https://docs.gitlab.com/ee/gitlab-basics/start-using-git.html)...

If you haven't then do so.

## Where to put your data

Sadly Iridis cannot see `J:\ or \\soton.ac.uk\resource\` so you you can't follow [usual advice](https://git.soton.ac.uk/SERG/workflow/-/blob/master/howTo/keepingData.md). You will need to copy the data you need to your home directory on Iridis. Do this using sftp - [CyberDuck](https://cyberduck.io/) works well on OSX, no doubt there are others for [Windoze](https://hpc.soton.ac.uk/redmine/projects/iridis-4-support/wiki/Transferring_files_to_and_from_Iridis_4).

If you [run out of space](https://hpc.soton.ac.uk/redmine/projects/iridis-4-support/wiki/User_account_information_and_limits) in your home folder/directory (100 GB max) you can also put it in a scratch folder which will take 1 TB. But this is not backed-up - so data in there is at risk.

## Adding packages

Yes [you can do this](https://hpc.soton.ac.uk/redmine/projects/iridis-4-support/wiki/R#Installing-R-packages-locally). It's just a bit more involved than doing it in RStudio because you have to use the R command line.

## What you cannot do

Run RStudio. Well OK, you can. But it needs the ssh tunneling and XWindow thing mentioned above. Yeahnah.

## The wish list

 * Iridis supports RStudio Server - so you can login via a web UI
 * Iridis can see `J:\ or \\soton.ac.uk\resource\` so all roads lead to the same data

howTo/keepingData.md

0 → 100644
+56 −0
Original line number Diff line number Diff line
# Where to keep your data

Let's start by offering some advice on where _not_ to keep your data:

 * in your git/hub/lab repo because:
   * your repo will bloat
   * you may accidentally publish it via github/gitlab
   * unless you're smart with `.gitignore` every time you make new or save new data git will try to synch it with your repo. This will _hammer_ your internet connection and make your git commit process almost unusuable
   * github/gitlab will refuse to store data of any useful size
 * on Dropbox/Sharepoint/oneCloud/whatever or similar because:
   * it may breach your [institutional policy](https://library.soton.ac.uk/researchdata/storage) on data storage
   * unless you're smart with `.gitignore` every time you make new or save new data your Dropbox/Sharepoint/oneCloud/whatever will try to synch it. This will _hammer_ your internet connection
 * only on your laptop/PC because:
   * they crash and you'll lose it
   * you'll lose the laptop and someone could find/steal/disclose the data
 * on a usb drive because
   * see previous
 
OK, so where _should_ you keep your data? There are basically two types of places:
 
  * an institutional file store to which you have access
  * a cloud data service to which you can send your code such as AWS, google etc
  
For most of you the first option may be the only one available for institutional policy reasons. In the case of the University of Southampton you should [read the policy on data storage](https://library.soton.ac.uk/researchdata/storage) including the advice on how to store and transfer data securely. Our suggested options are:
 
 * your [personal filestore](https://knowledgenow.soton.ac.uk/Articles/KB0011651) `AKA \\filestore.soton.ac.uk\Users\<username>\, AKA “My Documents”)` which, by default, _only_ allows 50GB or 
 * **preferred ->** the resource drive (AKA `J:\` or `\\soton.ac.uk\resource\`) which is accessible via the [web](https://fwa.soton.ac.uk/), via SMB (use the VPN), from the [University SVE](sve.md) and, in the case of SERG data, via `/mnt/SERG_data` on the University's [RStudio server](https://rstudio.soton.ac.uk/).
 
 We recommend you use the `J:\` drive because it can hold much larger data volumes and can be made accessible to your colleagues/supervisors if required. For reasons of speed this implies you either:
 
  * use the [University SVE](https://sotonac.sharepoint.com/teams/IT/SitePages/Services/SouthamptonVirtualEnvironment.aspx/) to run RStudio 'on campus' and thus close to the data. We have found some problems with persistence of installed packages in between SVE sessions if you try this
  * mount the `J:\` drive on your laptop/PC and use a local version of (e.g.) RStudio to load the data. If you are doing this you might want to learn about [drake](drake.md) so the data is only loaded over the network the first time you run your code, not each time.
  * [get access to and use](https://git.soton.ac.uk/SERG/uosrstudioserver) the University's [RStudio server](https://rstudio.soton.ac.uk/)  **<- best option**, this will let you run your code on the research filestore directly
 
As far as we know the University's `Research Filestore` (AKA `\\xxx.files.soton.ac.uk\<SHARE>\`) does not allow direct access so it can only be used to archive data you are not actively using. See https://library.soton.ac.uk/researchdata/storage for more info.

> Update: we understand that the University is looking to transition from the J drive to the use of oneDrive/sharepoint:

<hr>

"Over the course of this year, and next, we will be looking to move the University off J:Drive and utilise OneDrive for Business, as part of our Filestore Migration Project.

Filestore Migration will:

1) Move personal files from ‘My Documents’ and ‘Desktop’ to [One Drive for Business](https://support.microsoft.com/en-gb/office/what-is-onedrive-for-work-or-school-187f90af-056f-47c0-9656-cc0ddca7fdc2?ui=en-us&rs=en-gb&ad=gb), a Microsoft cloud service that connects you to all your files.  

2) Move shared files e.g. J: drive to SharePoint Online. 

Further information can be found here: [Storing files in O365](https://sotonac.sharepoint.com/teams/Office365/SitePages/Storing-files-in-Office-365.aspx)

Please note: For people and processes that have a business need to maintain traditional filestore 
(such as Linux desktop users), there will be a process to request this instead of One Drive for Business or SharePoint Online."

</hr>

> We have been assured by iSolutions that data will be migrated on a case by case basis and it might not suit all cases. They agreed that our use of rstudio.soton and J:\\ is a very good case for _not_ migrating. So watch out for that!
+24 −0
Original line number Diff line number Diff line
# Mirroring repositories in Gitlab

Author: Tom Rushby (@tom_rushby)

Gitlab has a useful page on mirroring: https://docs.gitlab.com/ee/user/project/repository/mirror/

## Gitlab to GitHub mirror

i.e. you want to create a copy of a repository in the University Git service on GitLab.

In this case we create a *push mirror*: a downstream repository (on GitHub) that mirrors the commits made to the upstream repository (on Gitlab).

Creating a mirror allows a repository with restricted access to be publically available (and backed-up on another platform), as well as taking advantage of functionality on GitHub such as pages (currently not available on the Univeristy implementation of GitLab).

It's not terribly clear on the page linked above, but you will need to generate an Access Token within your GutHub account. This will be use as the password in the mirroring section of the repo on GitLab (git.soton.ac.uk). https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token

## Step-by-step

1. Create a clean (target, or 'downstream') repository in your GitHub account. I use the identical name as the repo in GitLab and add (mirror) in the description to indicate that it is a mirrored copy.
2. Copy the https URL to the repo e.g. `https://github.com/mygitaccount/myrepo`
3. In the **Gitlab** (source, or 'upstream') repository to be mirrored, goto > `Settings` > `Repository` and expand the `Mirrroring repositories` section. Enter the GitHub repo URL but in the format `https://gitusername@github.com/gitusername/reponame`.
4. If you haven't already got one, go back to **GitHub** and and [create a personal access token](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token). In the top right corner. click your profile photo, and click **Settings**. On the Settings page, click **Developer settings** (at the bottom of the left-hand sidebar). Click **Personal access tokens** in the left sidebar and then **Generate new token**. When generating the access token you will need to tick the boxes to allow "repo" and "workflow" permissions. Once complete, copy the access token to the clipboard.
5. Now in **Gitlab**, paste the access token into the **Password** field and click the **Mirror repository** button. The repository should now appear in the Mirrored repositories list with 'Push' as the direction. Each time a commit is made to your Gitlab repository  it will be automatically pushed to the mirrored repository downstream. Clicking the 'Update now' button in the right-hand column of the list will force the push and is useful to test the process when setting up. If the process fails, errors will be shown in the list under 'Last successful update'.
6. Commit to your Gitlab repo and watch your GitHub copy update automatically. Magic!
Original line number Diff line number Diff line
## 'How to' resources:
# Other 'How to' resources:

 * excellent [guidance for collaborative project teams (especially team leads)](https://opensource.guide/) even if they're not open and not R
 * [What they forgot to teach you](https://rstats.wtf/) about R including some required reading:
    * why you should use here::here() and **not setwd()** to make sure your code works _anywhere_
    * why you should use (RStudio) p/Projects to _manage your code_
    * how you should name data files to _stay sane_
    * [how you should name things](https://speakerdeck.com/jennybc/how-to-name-files) to _stay sane_
    * why you should not add anything to .Renviron or Rprofile unless you want to _irritate team members_
    * and much more, although:
        *  we don't agree with [keeping your data in your project](https://rstats.wtf/project-oriented-workflow.html#work-in-a-project). Data should be somewhere else, _unless you're a .gitignore wizard_ and your data is small (and non-sensitive/non-commercial/public etc)
        *  we don't agree with [keeping your data in your project](https://rstats.wtf/project-oriented-workflow.html#work-in-a-project). Data should be [somewhere else](keepingData.md), _unless you're a .gitignore wizard_ and your data is small (and non-sensitive/non-commercial/public etc)
 * using [git(hub/lab)](https://happygitwithr.com/) for version control (perhaps via [usethis](https://usethis.r-lib.org/) and knowing about [ohshitgit](https://ohshitgit.com/) just in case)
 * using [git branches](https://git-scm.com/book/en/v2/Git-Branching-Branches-in-a-Nutshell) as a way for different people to work on the same project without clashing 
    * Tom has [blogged](https://twrushby.wordpress.com/2017/03/27/collaboration-with-rstudio-and-git-using-branches/) about this
 * using git forks and [branches](https://git-scm.com/book/en/v2/Git-Branching-Branches-in-a-Nutshell) as a way for different people to work on the same project without clashing 
    * Tom has written a [setup guide](https://git.soton.ac.uk/SERG/feeg6025support/feeg6025-2020-2021/-/blob/master/Guides/SETUP.md) for getting started
    * we have written a [short guide](gitBranches.md)
    * [HappyGit](https://happygitwithr.com/fork-and-clone.html) gives you the details
    * [ohshitgit](https://ohshitgit.com/) may be required here too (but not if you've followed the instructions above)
 * using [git(hub/lab) issues](https://guides.github.com/features/issues/) as a way to manage your project - just like we did for the [new ECCD website](https://git.soton.ac.uk/SERG/sergwebsite/-/issues)
 * how to use [drake](https://docs.ropensci.org/drake/) to massively speed up and [manage your workflow](https://milesmcbain.xyz/the-drake-post/). This includes always:
    * loading and processing all your data inside a drake plan in a .R file. _So it only gets re-run if the code or data changes_
    * creating each of your output objects inside the drake plan. _So they only get re-created if the code or data changes_
    * rendering your .Rmd report at the end of the drake plan. _So you can pass the params in and report the output objects_
    * => the first time you run the plan it will build everything. The second time, e.g. after you fix a .Rmd typo, _only the bits that have changed get re-built_. **Warning: drake can reduce the time it takes to run your code by an order of magnitude. This could seriously damage your tea & cake in-take...**
 
Even more ...

 * on [naming files (tidyverse)](https://style.tidyverse.org/files.html)
 * coding style, syntax etc. with [lintr](https://github.com/jimhester/lintr)
 * set up projects as packages with help from [usethis](https://usethis.r-lib.org/)
 * tools for reproducibility
    * Is your project file reproducible? [fertile](https://github.com/baumer-lab/fertile) might help
    * Or try [rrrpkg](https://github.com/ropensci/rrrpkg) to create a research compendium
    
Original line number Diff line number Diff line
# Set environment variables to authenticate access to AWS S3 bucket
# Use in conjunction with aws.s3 package

Sys.setenv(
  "AWS_ACCESS_KEY_ID" = "mykey",
  "AWS_SECRET_ACCESS_KEY" = "mysecretkey",
  "AWS_DEFAULT_REGION" = "eu-west-2"
)
 No newline at end of file
+33 −0
Original line number Diff line number Diff line
{
    "Version": "2012-10-17",
    "Id": "PolicyForDestinationBucket",
    "Statement": [
        {
            "Sid": "Permissions on objects and buckets",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::000000000000:role/cross-account-bucket-replication-role"
            },
            "Action": [
                "s3:List*",
                "s3:GetBucketVersioning",
                "s3:PutBucketVersioning",
                "s3:ReplicateDelete",
                "s3:ReplicateObject"
            ],
            "Resource": [
                "arn:aws:s3:::my-s3-bucket-name",
                "arn:aws:s3:::my-s3-bucket-name/*"
            ]
        },
        {
            "Sid": "Permission to override bucket owner",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::999999999999:root"
            },
            "Action": "s3:ObjectOwnerOverrideToBucketOwner",
            "Resource": "arn:aws:s3:::my-s3-bucket-name/*"
        }
    ]
}
 No newline at end of file
Original line number Diff line number Diff line
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "s3:PutAnalyticsConfiguration",
                "s3:GetObjectVersionTagging",
                "s3:DeleteAccessPoint",
                "s3:CreateBucket",
                "s3:ReplicateObject",
                "s3:GetObjectAcl",
                "s3:GetBucketObjectLockConfiguration",
                "s3:DeleteBucketWebsite",
                "s3:GetIntelligentTieringConfiguration",
                "s3:DeleteJobTagging",
                "s3:PutLifecycleConfiguration",
                "s3:GetObjectVersionAcl",
                "s3:PutObjectTagging",
                "s3:DeleteObject",
                "s3:DeleteObjectTagging",
                "s3:GetBucketPolicyStatus",
                "s3:GetObjectRetention",
                "s3:GetBucketWebsite",
                "s3:GetJobTagging",
                "s3:PutReplicationConfiguration",
                "s3:GetObjectAttributes",
                "s3:DeleteObjectVersionTagging",
                "s3:PutObjectLegalHold",
                "s3:InitiateReplication",
                "s3:GetObjectLegalHold",
                "s3:GetBucketNotification",
                "s3:PutBucketCORS",
                "s3:GetReplicationConfiguration",
                "s3:ListMultipartUploadParts",
                "s3:PutObject",
                "s3:GetObject",
                "s3:PutBucketNotification",
                "s3:DescribeJob",
                "s3:PutBucketLogging",
                "s3:GetAnalyticsConfiguration",
                "s3:PutBucketObjectLockConfiguration",
                "s3:GetObjectVersionForReplication",
                "s3:CreateAccessPoint",
                "s3:GetLifecycleConfiguration",
                "s3:GetInventoryConfiguration",
                "s3:GetBucketTagging",
                "s3:PutAccelerateConfiguration",
                "s3:DeleteObjectVersion",
                "s3:GetBucketLogging",
                "s3:ListBucketVersions",
                "s3:ReplicateTags",
                "s3:RestoreObject",
                "s3:ListBucket",
                "s3:GetAccelerateConfiguration",
                "s3:GetObjectVersionAttributes",
                "s3:GetBucketPolicy",
                "s3:PutEncryptionConfiguration",
                "s3:GetEncryptionConfiguration",
                "s3:GetObjectVersionTorrent",
                "s3:AbortMultipartUpload",
                "s3:PutBucketTagging",
                "s3:GetBucketRequestPayment",
                "s3:GetAccessPointPolicyStatus",
                "s3:UpdateJobPriority",
                "s3:GetObjectTagging",
                "s3:GetMetricsConfiguration",
                "s3:GetBucketOwnershipControls",
                "s3:DeleteBucket",
                "s3:PutBucketVersioning",
                "s3:GetBucketPublicAccessBlock",
                "s3:ListBucketMultipartUploads",
                "s3:PutIntelligentTieringConfiguration",
                "s3:PutMetricsConfiguration",
                "s3:PutBucketOwnershipControls",
                "s3:PutObjectVersionTagging",
                "s3:PutJobTagging",
                "s3:UpdateJobStatus",
                "s3:GetBucketVersioning",
                "s3:GetBucketAcl",
                "s3:PutInventoryConfiguration",
                "s3:GetObjectTorrent",
                "s3:PutBucketWebsite",
                "s3:PutBucketRequestPayment",
                "s3:PutObjectRetention",
                "s3:GetBucketCORS",
                "s3:GetBucketLocation",
                "s3:GetAccessPointPolicy",
                "s3:ReplicateDelete",
                "s3:GetObjectVersion"
            ],
            "Resource": [
                "arn:aws:s3:::my-aws-bucket",
                "arn:aws:s3:*:999999999999:accesspoint/*",
                "arn:aws:s3:::my-aws-bucket/*",
                "arn:aws:s3:*:999999999999:job/*"
            ]
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": [
                "s3:ListStorageLensConfigurations",
                "s3:ListAccessPointsForObjectLambda",
                "s3:GetAccessPoint",
                "s3:GetAccountPublicAccessBlock",
                "s3:ListAllMyBuckets",
                "s3:ListAccessPoints",
                "s3:ListJobs",
                "s3:PutStorageLensConfiguration",
                "s3:ListMultiRegionAccessPoints",
                "s3:CreateJob"
            ],
            "Resource": "*"
        }
    ]
}
 No newline at end of file
+48 −0
Original line number Diff line number Diff line
# Requires aws.s3 package install if required
# install.packages("aws.s3")

# Set environment variables to use AWS access keys
source("./howTo/r-with-aws/access_keys/aws_access.R") # Replace with your credentials e.g. next line
# source("./howTo/r-with-aws/access_keys/example_credentials_script.R")

# Get list of buckets
aws.s3::bucketlist()

# set bucket name (less typing) - this is the name of your s3 bucket
my_bucket <- "twr-test-bucket-r"

# write a file to temp dir - using a built in data frame
write.csv(iris, file.path(tempdir(), "iris.csv"))

# save an object (file from the temp dir) to the bucket
aws.s3::put_object(
  file = file.path(tempdir(), "iris.csv"), 
  object = "iris.csv", 
  bucket = my_bucket
)

# list objects in the bucket
aws.s3::get_bucket(
  bucket = my_bucket
)

# provide a nice table of objects in the bucket
data.table::rbindlist(aws.s3::get_bucket(bucket = my_bucket))

# read an object from s3 bucket, three ways ...

# 1. bucket and object specified separately
aws.s3::s3read_using(
  FUN = read.csv, bucket = my_bucket, object = "iris.csv"
  )

# 2. use the s3 URI
aws.s3::s3read_using(
  FUN = read.csv, object = "s3://twr-test-bucket-r/iris.csv"
  )

# 3. use data.table's fread() function for fast CSV reading
aws.s3::s3read_using(
  FUN = data.table::fread, object = "s3://twr-test-bucket-r/iris.csv"
  )

howTo/renv.md

0 → 100644
+249 −0
Original line number Diff line number Diff line
# Using renv to create reproducible environments for R projects

## What is renv?

An environment manager for R projects. Meaning that it organises the package dependencies within an R project, recording the versions of each package used in analysis and allowing simply transport of projects from one computer to another.

This is achieved through the creation of package 'snapshots' which can be (re)installed (or 'restored') on different computers with one simple command.

`renv` provides an alternative solution to our [workflow/loadLibraries function](https://git.soton.ac.uk/SERG/workflow/-/blob/master/R/loadLibraries.R) and can tackle package persistence problems when [using RStudio within the University SVE](https://git.soton.ac.uk/SERG/workflow/-/blob/master/howTo/sve.md).

Advantages of using `renv` over `woRkflow::loadLibraries()` is that `renv` automatically scans the code in a project to compile a list of packages used. `renv::snapshot()` also stores information on package versions. Nice!

Using `renv` might help to make collaboration that little bit simpler.

### Install

Start by installing the [renv](https://rstudio.github.io/renv/) package.

```
install.packages('renv')
```

Open your project and initialise renv to create a project specific local environment and R library.

```
renv::init()
```

If this is the first use of renv, running the init() command will generate output similar to below:

```
Welcome to renv!
It looks like this is your first time using renv. This is a one-time message,
briefly describing some of renv's functionality.

renv maintains a local cache of data on the filesystem, located at:

  - "C:/Users/twr1m15/AppData/Local/R/cache/R/renv"

This path can be customized: please see the documentation in `?renv::paths`.

renv will also write to files within the active project folder, including:

  - A folder 'renv' in the project directory, and
  - A lockfile called 'renv.lock' in the project directory.

In particular, projects using renv will normally use a private, per-project
R library, in which new packages will be installed. This project library is
isolated from other R libraries on your system.

In addition, renv will update files within your project directory, including:

  - .gitignore
  - .Rbuildignore
  - .Rprofile

Please read the introduction vignette with `vignette("renv")` for more information.
You can browse the package documentation online at https://rstudio.github.io/renv/.
```

If the project already has a lockfile the following message will be displayed ...

```
This project already has a lockfile. What would you like to do? 

1: Restore the project from the lockfile.
2: Discard the lockfile and re-initialize the project.
3: Activate the project without snapshotting or installing any packages.
4: Abort project initialization.
```

The initialisation command ensures that any time the project is opened, a check is performed to ensure that the `renv` package is installed on the system and that the package is loaded to give access to the `renv::restore()` command (see 'Restore' below).

The use of `renv` is confirmed on opening a project by feedback in the console, for example:

```
* Project 'H:/SVE/git.soton/rtools' loaded. [renv 0.15.2]
```

### Lock file

The file `renv.lock` contains a description of the state of the project's library.

For example:

```{
  "R": {
    "Version": "4.1.1",
    "Repositories": [
      {
        "Name": "CRAN",
        "URL": "https://cran.rstudio.com"
      }
    ]
  },
  "Packages": {
    "renv": {
      "Package": "renv",
      "Version": "0.15.2",
      "Source": "Repository",
      "Repository": "CRAN",
      "Hash": "206c4ef8b7ad6fb1060d69aa7b9dfe69",
      "Requirements": []
    }
  }
}
```

## Updating the lock file

When we add some code requiring another package to the repo (in this script), for example ...

```
# install.packages("ggplot2")
library(ggplot2)
```

To create simple plot using built-in dataset `cars` ...

```
ggplot(data = cars, mapping = aes(x = speed, y = dist)) +
  geom_point()
```

The package(s) can then be added to the project library by running another snapshot ...

```{r}
renv::snapshot()
```

Running renv::snapshot() updates the `renv.lock` file (and local library) with the new packages. Feedback is generated, for example (looks at the tonne of dependencies for the ggplot2 package) ...

```
The following package(s) will be updated in the lockfile:

# CRAN ===============================
- MASS           [* -> 7.3-54]
- Matrix         [* -> 1.3-4]
- R6             [* -> 2.5.1]
- RColorBrewer   [* -> 1.1-2]
- base64enc      [* -> 0.1-3]
- cli            [* -> 3.1.1]
- colorspace     [* -> 2.0-2]
- crayon         [* -> 1.4.2]
- digest         [* -> 0.6.29]
- ellipsis       [* -> 0.3.2]
- evaluate       [* -> 0.14]
- fansi          [* -> 1.0.2]
- farver         [* -> 2.1.0]
- fastmap        [* -> 1.1.0]
- ggplot2        [* -> 3.3.5]
- glue           [* -> 1.6.1]
- gtable         [* -> 0.3.0]
- highr          [* -> 0.9]
- htmltools      [* -> 0.5.2]
- isoband        [* -> 0.2.5]
- jquerylib      [* -> 0.1.4]
- jsonlite       [* -> 1.7.3]
- knitr          [* -> 1.37]
- labeling       [* -> 0.4.2]
- lattice        [* -> 0.20-44]
- lifecycle      [* -> 1.0.1]
- magrittr       [* -> 2.0.2]
- mgcv           [* -> 1.8-36]
- munsell        [* -> 0.5.0]
- nlme           [* -> 3.1-152]
- pillar         [* -> 1.7.0]
- pkgconfig      [* -> 2.0.3]
- rlang          [* -> 1.0.0]
- rmarkdown      [* -> 2.11]
- scales         [* -> 1.1.1]
- stringi        [* -> 1.7.6]
- stringr        [* -> 1.4.0]
- tibble         [* -> 3.1.6]
- tinytex        [* -> 0.36]
- utf8           [* -> 1.2.2]
- vctrs          [* -> 0.3.8]
- viridisLite    [* -> 0.4.0]
- withr          [* -> 2.4.3]
- xfun           [* -> 0.29]
- yaml           [* -> 2.2.2]
```
The packages now appear in the contents of the lock file as shown below ...

```
{
  "R": {
    "Version": "4.1.1",
    "Repositories": [
      {
        "Name": "CRAN",
        "URL": "https://cran.rstudio.com"
      }
    ]
  },
  "Packages": {
    "MASS": {
      "Package": "MASS",
      "Version": "7.3-54",
      "Source": "Repository",
      "Repository": "CRAN",
      "Hash": "0e59129db205112e3963904db67fd0dc",
      "Requirements": []
    },
    "Matrix": {
      "Package": "Matrix",
      "Version": "1.3-4",
      "Source": "Repository",
      "Repository": "CRAN",
      "Hash": "4ed05e9c9726267e4a5872e09c04587c",
      "Requirements": [
        "lattice"
      ]
    },
    "R6": {
      "Package": "R6",
      "Version": "2.5.1",
      "Source": "Repository",
      "Repository": "CRAN",
      "Hash": "470851b6d5d0ac559e9d01bb352b4021",
      "Requirements": []
    },
    "RColorBrewer": {
      "Package": "RColorBrewer",
      "Version": "1.1-2",
      "Source": "Repository",
      "Repository": "CRAN",
      "Hash": "e031418365a7f7a766181ab5a41a5716",
      "Requirements": []
    }
    
    XXXX_CURTAILED_FOR_SPACE_XXXX
  }
}
```

## Restore

The `renv::restore()` command allows a previous snapshot of packages (including package versions) to be installed.

At [SERG](https://energy.soton.ac.uk) we like to work in a collaborative manner ... this often means sharing analysis and code across platforms. To that end we need packages/libraries used within RStudio in any piece of analysis to be restored easily on another system. 

The `renv` package allows all of the packages used in a specific project to be (re)installed using a single command - very useful for porting projects across different computers/contributors.

## Resources

* [renv for R projects](https://rstudio.github.io/renv/index.html)
* [Introduction to renv](https://rstudio.github.io/renv/articles/renv.html) by Kevin Ushey
* [Collaborating with renv](https://rstudio.github.io/renv/articles/collaborating.html)

howTo/sve.md

0 → 100644
+70 −0
Original line number Diff line number Diff line
# Using R/RStudio on the SVE service

## What is the SVE?

A windows [virtual desktop](https://sotonac.sharepoint.com/teams/IT/SitePages/Services/SouthamptonVirtualEnvironment.aspx/)

The SVE offers two services:

 * Win 10 Student service - this one has research & academic software suites such as RStudio etc. But it does **not** have persistence. This means that any packages you install or any repos you clone into your local space will vanish when you log out. The only exception is if you:
    * clone the repos to your MyDocuments/OneDrive account
    * work out how to use your Windows profile to 'host' the packages (or do some nifty work with the [`renv`](howTo/renv.md) package)
 * Win 10 Staff service - this is a generic staff service intended for admin & professional staff use (for now). While it **does** have persistence, it does **not** have RStudio installed...
 
## Why would I use the SVE?
 
1. The 'Student' service hosts most of the applications you'll need including RStudio etc
1. It offers easy access to [data folders such as J://](keepingData.md) as you are effectively working 'on campus'. This makes data loading fast. Well... faster than doing it over your home broadband.
1. You can easily access your oneDrive folders.
1. The virtual PC instance you log in to has reasonable memory allocation so unless you have huuuge datasets you should be OK.

But:

1. No R package or repo persistence (see above)
1. It does not seem to be able to 'mount' Sharepoint/Teams folders - so you cannot easily load data held in them

## Git

Git is installed on the SVE - yay! You do know how to use Git, right?  No? [try starting here](https://happygitwithr.com/index.html).

### Git authentication

HTTPS vs SSH? It's up to you but [some argue](https://happygitwithr.com/https-pat.html#https-vs-ssh) that HTTPS is easy to get you going.

Whatever, you will need to authenticate the SVE (local machine) with Gitlab/GitHub. You can do this via HTTPS, entering your username and password in each session on the SVE (not a big deal)... or by using an RSA key. This RSA key should now be persistent on the SVE (it didn't used to be before the SVE upgrade). 

## RStudio in SVE

### Packages
See note above re persistence.

If you need to add new packages use the install.packages() function. This seems to be able to bypass a permissons issue in C:/Apps/RLibraries which causes the normal the RStudio GUI/tab method to fail.

Better yet use the [loadPackages() function](https://git.soton.ac.uk/SERG/workflow/-/blob/master/R/loadLibraries.R) developed at SERG :-)

Even better, consider using the [`renv`](howTo/renv.md) package.

### Using Git within RStudio

We recommend storing your local (working) project repositories within your `Documents` folder on the university filestore as [storing Git repositories within cloud-synced folders may cause problems.](https://andreashandel.github.io/MADAcourse/Tools_Github_Introduction.html#GitGitHub_and_other_cloud_based_sync_options)

Git does seem to have some weird behaviour on the SVE to be aware of ... the correct operation of Git witin RStudio on the University network requires some careful working practices with respect to file paths.

Using the University filestore for project (working) files requires use of the mapped file path `filestore (H:)` _<u>not</u>_ the `Documents` shortcut in the `Quick access` group of Windows Explorer.

While these two paths refer to the same physical location for your documents folder, they are resolved differently i.e. `H:\` vs `\\filestore.soton.ac.uk\users\xxmyusernamexx\mydocuments`. When a project is started in RStudio from the latter location, the Git repository is not recognised correctly. As a result, the `Git` tab will be missing from the Environment pane.

Loading a project from `Documents` in `Quick access` group, the figure shows no `Git` tab and thus no access to Git processes thru the RStudio IDE:

![Environment Pane, My Documents](img/rtools_env_pane_mydocs.png)

Loading project from the same folder via `H` (mapped drive), figure shows the `Git` tab is now present and Git commands now accessible:

![Environment Pane, H](img/rtools_env_pane_h.png)

This problem seems to be limited to the Environment pane within the RStudio IDE as running Git commands through the Terminal (Console pane) pick up the repository correctly (as shown in image below).

![Terminal Pane](img/rtools_terminal_pane.png)

So using Git through the Terminal is unaffected by the path used to open the project.
+19 −0
Original line number Diff line number Diff line
Yes it's possible. I'm not sure why you would but we're a broad church so...

You need to set the interpreter...

On a mac:

 * open Terminal
 * type `which python`
 * I get "/opt/anaconda3/bin/python"
 * you might get other flavours and locations
 * it needs to be the one that get updated when you install modules using `pip install numpy pandas matplotlib` etc
 * in RStudio go to Project Options -> Python. If you are not using an RStudio project do this in Global Options instead but remember this will become the default...
 * paste the results of `which python` into the 'select' box. You can try to use the auto-find feature but it didn't find conda for me
 * hit OK. RStudio will want to restart 
 * you may have to do this twice for it to work

 If you open a .py file and try to run it, RStudio will want to install [reticulate](https://rstudio.github.io/reticulate/) which is R's interface to Python (apparently).

it's all getting a bit serpentine...
Original line number Diff line number Diff line
@@ -4,7 +4,15 @@

## To do before you come

 * install [RStudio](https://rstudio.com/products/rstudio/)
 * install [RStudio](https://rstudio.com/products/rstudio/) and/or get yourself set up on https://rstudio.soton.ac.uk
 * then install:
    * here
    * drake
    * data.table
    * lubridate
    * hms
    * ggplot2 (or just install the whole tidyverse)
    * skimr
 * make sure you can log in to [git.soton.ac.uk](git.soton.ac.uk/)
 * check you have an ssh key set up on git.soton for the laptop/PC/server where you are going to use RStudio - see [rstudio's help](https://support.rstudio.com/hc/en-us/articles/200532077-Version-Control-with-Git-and-SVN) or [this one](https://happygitwithr.com/ssh-keys.html)
 * Make sure you have Teams - we will be using the screen sharing a lot

make_basicReport.R

0 → 100644
+20 −0
Original line number Diff line number Diff line
# make file for drake
# see https://books.ropensci.org/drake/projects.html#usage

# Set up ----
startTime <- proc.time()

# use r_make to run the plan inside a clean R session so nothing gets contaminated
drake::r_make(source = "_drake_basicReport.R") # where we keep the drake plan etc

# we don't keep this in /R because that's where the package functions live
# we don't use "_drake.R" because we have lots of different plans

# Finish off ----

t <- proc.time() - startTime # how long did it take?
elapsed <- t[[3]]

print("Done")
print(paste0("Completed in ", round(elapsed/60,3), " minutes using ",
             R.version.string, " running on ", R.version$platform))
 No newline at end of file
Original line number Diff line number Diff line
@@ -10,19 +10,24 @@ Things you should touch:

| Item        | Description  |
| --- | --- |
| **[R/](R/)** | Where we store functions that get built by the package - these are then available for use in any project|
| **[Rmd/](Rmd/)** | Where we store .Rmd files and the .R scripts that call them (usually using a `drake` plan) |
| **[docs/](docs/)** | Where we put output generated by the .R/.Rmd code. This is helpful if you are using [github/lab pages](https://guides.github.com/features/pages/). Unfortunately the University of Southampton gitlab service does not currently support this. |
| **[howTo/](howTo/)** | Our collection of guides and `how-tos` |
| **[man/](man/)** | Where roxygen puts the package man(ual) files |
| **[notData/](notData/)** | Where we do _not_ store [data](/howTo/keepingData.md). R packages expect certain kinds of data in their 'data/' folders. Do not put your data in it. |
| **[.gitignore](.gitignore)** | A place to tell git what _not_ to synchronise e.g. `.csv` or [weird OS files](https://gist.github.com/adamgit/3786883)|
| **[analysis/](analysis/)** | Where we store .Rmd files and the .R scripts that call them (usually using a `drake` plan) |
| **[CONTRIBUTING.md](CONTRIBUTING.md)** | How to contribute (nicely)|
| **[DESCRIPTION](DESCRIPTION)** | But only if you use this as a template for your own repo - it is a special file for packages |
| **[docs/](docs/)** | Where we put output generated by the .R/.Rmd code. This is helpful if you are using [github/lab pages](https://guides.github.com/features/pages/). Unfortunately the University of Southampton gitlab service does not currently support this. |
| **[env.R](env.R)**  | Where we store all the parameters that might be re-used across our repo. Such as colour defaults, data paths etc. We avoid using a project/repo level .Rprofile because it can lead to [a **lot** of confusion](https://support.rstudio.com/hc/en-us/articles/360047157094-Managing-R-with-Rprofile-Renviron-Rprofile-site-Renviron-site-rsession-conf-and-repos-conf). |
| **[LICENSE](LICENSE)** | Edit to suit your needs |
| **[notData/](notData/)** | Where we do not store data. R packages expect certain kinds of data in their 'data/' folders. Do not put your data in it. |
| **[R/](R/)** | Where we store functions that get built |
| **[README.md](README.md)** | Repo readme |
| **[resources.md](resources.md)** | Our collection of guides and `how-tos` |
| **[template.md](template.md)** | This file |
| **[_drake_basicReport.R](_drake_basicReport.R)** | basic drake plan |
| **[bibliography.bib](bibliography.bib)** | a place to keep references (in bibtex style)|
| **[env.R](env.R)**  | Where we store all the parameters that might be re-used across our repo. Such as colour defaults, data paths etc. We avoid using a project/repo level .Rprofile because it can lead to [a **lot** of confusion](https://support.rstudio.com/hc/en-us/articles/360047157094-Managing-R-with-Rprofile-Renviron-Rprofile-site-Renviron-site-rsession-conf-and-repos-conf). |
| **[make_basicReport.R](make_basicReport.R)** | basic reporting script which sources the drake plan |
| **[repoAsATemplate.md](repoAsATemplate.md)** | This file |

More on data:
More on [data](/howTo/keepingData.md):

> We recommend **not** putting your data in your repo at all.