Compare revisions

B.Anderson · B.Anderson · Ben Anderson · Ben Anderson · Ben Anderson · Ben Anderson
--- a/.gitignore
+++ b/.gitignore
@@ -5,4 +5,6 @@
 *.Rproj
 # OS X stuff - https://gist.github.com/adamgit/3786883
 .DS_Store
 .Trashes
\ No newline at end of file
+# sensitive files
+/howTo/r-with-aws/access_keys/aws_access.R
\ No newline at end of file
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -5,6 +5,6 @@ Be nice:
    * work in a branch of your fork
    * make a pull request to merge your branch to the master
-Don't know how to do this? [Read our guide](gitBranches.md).
+Don't know how to do this? [Read our guide](howTo/gitBranches.md).
 Need more info? Read the excellent [contributing to opensource code](https://opensource.guide/how-to-contribute/) guidance.
\ No newline at end of file
--- a/README.md
+++ b/README.md
@@ -4,10 +4,33 @@ How we do collaborative reproducible data analysis and reporting. Mostly (but no
 This repo does three things:
- * it is a collection of R [how-to resources](resources.md) including some notes on:
+ * it is a collection of R [how-to resources](howTo/) including some notes on:
-   * how to [use git branches](gitBranches.md) 
+   * how to [use git branches](howTo/gitBranches.md) 
-   * how to use [drake](https://docs.ropensci.org/drake/) to massively speed up and [manage your workflow](https://milesmcbain.xyz/the-drake-post/)
+   * how to use [drake](howTo/drake.md) to massively speed up and [manage your workflow](https://milesmcbain.xyz/the-drake-post/) (NB: drake has been superseded by [targets](https://books.ropensci.org/targets/) - update on the notes soon)
- * it is a [template](template.md) repo that illustrates how we work and which you can copy;
+   * how to access the University [Iridis HPC](howTo/iridis.md)
- * it is an R package that you can build if you want to using 'install and restart' from the RStudio Build menu. If you do you will then be able to use its functions viz: `woRkflow::functionName()` (not that it has many).
+   * how to use R/RStudio on the University [SVE (remote desktop) service](howTo/sve.md)
+   * where to [keep your data](howTo/keepingData.md)
+   * how to use [renv](howTo/renv.md) to manage your R environment - including packages
+   * how to access Amazon Web Services S3 buckets directly from R using [aws.s3](howTo/aws-s3.md)
+ * it is a [template](repoAsATemplate.md) repo that illustrates how we work and which you can copy;
+ * it is an R package. This means:
+     * package functions are kept in /R
+     * help files auto-created by roxygen are in /man
+     * if you clone it you can build it using 'install and restart' from the RStudio Build menu and use the functions viz: `woRkflow::functionName()` (not that it has many)
+Using drake:
+ * make_XX.R contains a call to drake::r_make(source = "_drake_XX.R")
+ * _drake_XX.R contains the drake plan and the functions & package loading. This is not quite what the [drake book](https://books.ropensci.org/drake/projects.html#usage) recommends but it works for us
+ * Rmd scripts called by the drake plan to report results are kept in /Rmd
+ * outputs are kept in /docs (reports, plots etc)
+ * if you can, **run Rscript ./make_cleanFeeders.R in a terminal not at the RStudio console** <- this stops RStudio from locking up
-If you want to [contribute to the repo](CONTRIBUTING.md), or like how we work and want to use it as a template for your project or package, just [fork and go](https://happygitwithr.com/fork-and-clone.html).
+We'd love your [contributions](CONTRIBUTING.md) - feel free to:
+ * [fork & go](https://happygitwithr.com/fork-and-clone.html)
+ * make a [new branch](https://git.soton.ac.uk/SERG/workflow/-/blob/master/howTo/gitBranches.md) in your fork
+ * make some improvements
+ * send us a pull request (just code, no data please, keep your data [elsewhere](https://git.soton.ac.uk/SERG/workflow/-/blob/master/howTo/keepingData.md)!)
+As a number of people have pointed out [fork & go](https://happygitwithr.com/fork-and-clone.html) only works if you have an account on git.soton.ac.uk. If you don't, you can import the repo to (for example) your [github.com](https://github.com/new/import) account and go from there. And presumably on gitlab.com too...
--- a/Rmd/basicReport.Rmd
+++ b/Rmd/basicReport.Rmd
@@ -29,6 +29,7 @@ bibliography: '`r here::here("bibliography.bib")`'
 ---
 ```{r knitrSetup, include=FALSE}
+startTime <- proc.time()
 knitr::opts_chunk$set(echo = FALSE) # by default turn off code echo
 ```
@@ -56,17 +57,16 @@ There's quite a lot of data...
 drake::readd(gWPlot)
 ```
-# Runtime
+# R environment
 ```{r check runtime, include=FALSE}
+# within Rmd timing
 t <- proc.time() - startTime
 elapsed <- t[[3]]
 ```
 Report generated in `r round(elapsed,2)` seconds ( `r round(elapsed/60,2)` minutes) using [knitr](https://cran.r-project.org/package=knitr) in [RStudio](http://www.rstudio.com) with `r R.version.string` running on `r R.version$platform`.
-# R environment
 ## R packages used
 * base R [@baseR]

--- a/Rmd/make_basicReport.R
+++ b/Rmd/make_basicReport.R
@@ -9,7 +9,8 @@ reqLibs <- c("data.table", # data munching
             "here", # here
             "lubridate", # dates and times
             "ggplot2", # plots
-             "skimr" # for skim
+             "skimr", # for skim
+             "bookdown" # for making reports (should also install knittr etc)
 )
 # load them
 woRkflow::loadLibraries(reqLibs)
@@ -50,7 +51,7 @@ makeGWPlot <- function(dt){
         caption = "Source: UK Grid ESO (http://data.nationalgrideso.com)")
  return(p)
 }
+version <- 10
 makeReport <- function(f){
  # default = html
  rmarkdown::render(input = paste0(here::here("Rmd", f),".Rmd"), # we love here:here() - it helps us find the .Rmd to use
@@ -60,6 +61,7 @@ makeReport <- function(f){
                    output_file = paste0(here::here("docs", f),".html") # where the output goes
  )
 }
 # Set up ----
 startTime <- proc.time()
@@ -73,16 +75,19 @@ plan <- drake::drake_plan(
 # Run drake plan ----
 plan # test the plan
 make(plan) # run the plan, re-loading data if needed
 # Run the report ----
 # run the report - don't do this inside the drake plan as 
 # drake can't seem to track the .rmd file if it is not explicitly named
-#makeReport(rmdFile)
+makeReport(rmdFile)
 # Just to show we can bring spirits back from the deep (i.e. from wherever drake hid them)
 dt <- drake::readd(esoData)
-message("Data covers ", min(dt$rDateTime), " to ", max(dt$rDateTime))
+dt[, rDateTimeUTC := lubridate::as_datetime(DATETIME)]
+message("Data covers ", min(dt$rDateTimeUTC), " to ", max(dt$rDateTimeUTC))
 # Finish off ----

--- a/_drake_basicReport.R
+++ b/_drake_basicReport.R
+# basic drake makefile
+# Libraries ----
+library(woRkflow) # remember to build it first :-)
+woRkflow::setup() # load env.R set up the default paths etc
+reqLibs <- c("data.table", # data munching
+             "drake", # what's done stays done
+             "here", # here
+             "lubridate", # dates and times
+             "ggplot2", # plots
+             "skimr" # for skim
+)
+# load them
+woRkflow::loadLibraries(reqLibs)
+# Parameters ----
+# Some data to play with:
+# https://data.nationalgrideso.com/carbon-intensity1/historic-generation-mix/r/historic_gb_generation_mix
+urlToGet <- "http://data.nationalgrideso.com/backend/dataset/88313ae5-94e4-4ddc-a790-593554d8c6b9/resource/7b41ea4d-cada-491e-8ad6-7b62f6a63193/download/df_fuel_ckan.csv"
+update <- "please" # edit this in any way (at all) to get drake to re-load the data from the url
+rmdFile <- "basicReport" # <- name of the .Rmd file to run at the end 
+title <- "UK Electricity Generation"
+subtitle <- "UK ESO grid data"
+authors <- "Ben Anderson"
+# Functions ----
+# for use in drake
+getData <- function(f,update){
+  # gets the data
+  message("Getting data from: ", f)
+  dt <- data.table::fread(f)
+  return(dt)
+}
+makeGWPlot <- function(dt){
+  message("Rebuilding plot")
+  # expects the eso data as a data.table
+  # draws a plot
+  dt[, rDateTime := lubridate::ymd_hms(DATETIME)] # hurrah, somebody read https://speakerdeck.com/jennybc/how-to-name-files?slide=21
+  dt[, weekDay := lubridate::wday(rDateTime, label = TRUE)]
+  # draw a megaplot for illustrative purposes
+  p <- ggplot2::ggplot(dt, aes(x = rDateTime, 
+                               y = GENERATION/1000,
+                               colour = weekDay)) +
+    geom_point() +
+    theme(legend.position = "bottom") +
+    labs(x = "Time",
+         y = "Generation (GW - mean per halfhour?)",
+         caption = "Source: UK Grid ESO (http://data.nationalgrideso.com)")
+  return(p)
+}
+makeReport <- function(f){
+  message("Re-running report: ", f)
+  # default = html
+  rmarkdown::render(input = paste0(here::here("Rmd", f),".Rmd"), # we love here:here() - it helps us find the .Rmd to use
+                    params = list(title = title,
+                                  subtitle = subtitle,
+                                  authors = authors),
+                    output_file = paste0(here::here("docs", f),".html") # where the output goes
+  )
+}
+# Set the drake plan ----
+# Clearly this will fail if you do not have internet access...
+plan <- drake::drake_plan(
+  esoData = getData(urlToGet, update), # returns data as data.table. If you edit 'update' in any way it will reload - drake is watching you!
+  skimTable = skimr::skim(esoData), # create a data description table
+  gWPlot = makeGWPlot(esoData), # make a plot
+  out = makeReport(rmdFile)
+)
+drake::drake_config(plan, verbose = 2)
--- a/docs/basicReport.html
+++ b/docs/basicReport.html
--- a/howTo/.gitkeep
+++ b/howTo/.gitkeep
--- a/howTo/aws-s3.md
+++ b/howTo/aws-s3.md
+# Guide to accessing data from Amazon Web Services (AWS) S3 buckets using R
+This guide provides details of how to set up and access files/data stored within an AWS S3 bucket directly from an R session using the [aws.s3 package](https://github.com/cloudyr/aws.s3).
+Prerequisite: access to the AWS account where the S3 bucket is located in order to create a user access policy.
+## Creating a user access policy (in the AWS console)
+Following guidance here: https://www.gormanalysis.com/blog/connecting-to-aws-s3-with-r/
+Create user 'rconnector' ... and create user policy 'test-bucket-connector' (see [example access policy](howTo/r-with-aws/rconnector-access-policy).
+Make sure to save access key ID and secret access key to use with S3 API client.
+Use these details to set the following environment variable (see below for code) and store the credentials in an R script e.g. in your project folder (in this example in a subfolder called [access keys](howTo/r-with-aws/access_keys). Note, for security exclude this file from the project repository by adding to your .gitignore file). The R script will look something like the following ...
+```
+Sys.setenv(
+  "AWS_ACCESS_KEY_ID" = "mykey",
+  "AWS_SECRET_ACCESS_KEY" = "mysecretkey",
+  "AWS_DEFAULT_REGION" = "eu-west-2"
+)
+```
+An example script can be found [here](howTo/r-with-aws/access_keys/example_credentials_script.R).
+## Connecting to the S3 bucket with R
+You're ready to go! See [example code](howTo/r-with-aws/using_aws-s3_example.R) showing some commands to authenticate R with AWS and read and write files from/to AWS S3 buckets.
--- a/howTo/drake.md
+++ b/howTo/drake.md
+NB: drake has been superseded by [targets](https://books.ropensci.org/targets/) - more soon
+# drake:
+ * use [drake](https://docs.ropensci.org/drake/) to massively speed up and [manage your workflow](https://milesmcbain.xyz/the-drake-post/). This includes always:
+    * loading and processing all your data inside a drake plan in a .R file. _So it only gets re-run if the code or data changes_
+    * creating each of your output objects inside the drake plan. _So they only get re-created if the code or data changes_
+    * rendering your .Rmd report at the end of the drake plan. _So you can pass the params in and report the output objects_
+    * => the first time you run the plan it will build everything. The second time, e.g. after you fix a .Rmd typo, _only the bits that have changed get re-built_. **Warning: drake can reduce the time it takes to run your code by an order of magnitude. This could seriously damage your tea & cake in-take...**
+We have an example of [using drake](https://git.soton.ac.uk/SERG/workflow/-/blob/master/Rmd/make_basicReport.R)
\ No newline at end of file
--- a/gitBranches.md
+++ b/gitBranches.md
--- a/howTo/img/rtools_env_pane_h.png
+++ b/howTo/img/rtools_env_pane_h.png
--- a/howTo/img/rtools_env_pane_mydocs.png
+++ b/howTo/img/rtools_env_pane_mydocs.png
--- a/howTo/img/rtools_terminal_pane.png
+++ b/howTo/img/rtools_terminal_pane.png
--- a/howTo/iridis.md
+++ b/howTo/iridis.md
+# How to use the UoS Iridis HPC
+HPC = [High Performance Computer](https://www.southampton.ac.uk/isolutions/staff/iridis.page). Lots of memory, lots of processors, good if you can parallelise your code. Can you?
+## Which Iridis?
+We have:
+ * _Iridis 4: https://hpc.soton.ac.uk/redmine/projects/iridis-4-support/wiki (**retiring February 2022**)_
+ * Iridis 5: https://hpc.soton.ac.uk/redmine/projects/iridis-5-support/wiki
+5 is newer, bigger & better etc etc but new projects start on 4 (_before February 2022_) and, presumably when you prove you need 5, you can transfer.
+## How to access
+Fill in the form on the [iridis page](https://www.southampton.ac.uk/isolutions/staff/iridis.page).
+## How to login
+OK, so you are going to need to [get a bit friendly with unix](https://www.dummies.com/computers/operating-systems/unix/) because the Iridis service runs unix (linux). In general you will be interacting with it using a command line tool like terminal/console etc. Back to the good old days. 
+Assuming you want iridis 5 to start with you can login to any of:
+ * iridis5_a.soton.ac.uk
+ * iridis5_b.soton.ac.uk
+ * iridis5_c.soton.ac.uk
+To do this:
+ * make sure you are running the UoS vpn
+ * open a terminal/console window on your desktop
+ * type `ssh YourUsername@iridis5_a.soton.ac.uk` (or which ever you want) & hit return. YourUsername = your UoS username obvs
+ * type `yes` to the fingerprint question (DO NOT PANIC!). This should only happen the first time you do this
+ * type your UoS password & hit return
+ * you're in!
+## Now what?
+Read these first! 
+ * https://hpc.soton.ac.uk/redmine/projects/iridis-5-support/wiki/New_User_Warnings <- **especially this one!**
+ * https://hpc.soton.ac.uk/redmine/projects/iridis-5-support/wiki/Getting_started
+## R
+https://hpc.soton.ac.uk/redmine/projects/iridis-5-support/wiki/R
+After you've [loaded the R module](https://hpc.soton.ac.uk/redmine/projects/iridis-5-support/wiki/R), you can interactively run R scripts by running R. Or you can run them non-interactively using the `Rscript <yourScript.R>` command. But you probably shouldn't. 
+Instead you should submit your commands (e.g. `Rscript myScript.R`) to the Iridis scheduling system so it can balance everyone's needs. You need to [learn how to do this](https://hpc.soton.ac.uk/redmine/projects/iridis-5-support/wiki/Job_Submission)...
+## How to get your code on Iridis
+If you've set up a repo on git.soton it's easy - Iridis has git installed. You will have to use git at the command line but it's [not so bad](https://docs.gitlab.com/ee/gitlab-basics/start-using-git.html)...
+If you haven't then do so.
+## Where to put your data
+Sadly Iridis cannot see `J:\ or \\soton.ac.uk\resource\` so you you can't follow [usual advice](https://git.soton.ac.uk/SERG/workflow/-/blob/master/howTo/keepingData.md). You will need to copy the data you need to your home directory on Iridis. Do this using sftp - [CyberDuck](https://cyberduck.io/) works well on OSX, no doubt there are others for [Windoze](https://hpc.soton.ac.uk/redmine/projects/iridis-4-support/wiki/Transferring_files_to_and_from_Iridis_4).
+If you [run out of space](https://hpc.soton.ac.uk/redmine/projects/iridis-4-support/wiki/User_account_information_and_limits) in your home folder/directory (100 GB max) you can also put it in a scratch folder which will take 1 TB. But this is not backed-up - so data in there is at risk.
+## Adding packages
+Yes [you can do this](https://hpc.soton.ac.uk/redmine/projects/iridis-4-support/wiki/R#Installing-R-packages-locally). It's just a bit more involved than doing it in RStudio because you have to use the R command line.
+## What you cannot do
+Run RStudio. Well OK, you can. But it needs the ssh tunneling and XWindow thing mentioned above. Yeahnah.
+## The wish list
+ * Iridis supports RStudio Server - so you can login via a web UI
+ * Iridis can see `J:\ or \\soton.ac.uk\resource\` so all roads lead to the same data
--- a/howTo/keepingData.md
+++ b/howTo/keepingData.md
+# Where to keep your data
+Let's start by offering some advice on where _not_ to keep your data:
+ * in your git/hub/lab repo because:
+   * your repo will bloat
+   * you may accidentally publish it via github/gitlab
+   * unless you're smart with `.gitignore` every time you make new or save new data git will try to synch it with your repo. This will _hammer_ your internet connection and make your git commit process almost unusuable
+   * github/gitlab will refuse to store data of any useful size
+ * on Dropbox/Sharepoint/oneCloud/whatever or similar because:
+   * it may breach your [institutional policy](https://library.soton.ac.uk/researchdata/storage) on data storage
+   * unless you're smart with `.gitignore` every time you make new or save new data your Dropbox/Sharepoint/oneCloud/whatever will try to synch it. This will _hammer_ your internet connection
+ * only on your laptop/PC because:
+   * they crash and you'll lose it
+   * you'll lose the laptop and someone could find/steal/disclose the data
+ * on a usb drive because
+   * see previous
+OK, so where _should_ you keep your data? There are basically two types of places:
+  * an institutional file store to which you have access
+  * a cloud data service to which you can send your code such as AWS, google etc
+For most of you the first option may be the only one available for institutional policy reasons. In the case of the University of Southampton you should [read the policy on data storage](https://library.soton.ac.uk/researchdata/storage) including the advice on how to store and transfer data securely. Our suggested options are:
+ * your [personal filestore](https://knowledgenow.soton.ac.uk/Articles/KB0011651) `AKA \\filestore.soton.ac.uk\Users\<username>\, AKA “My Documents”)` which, by default, _only_ allows 50GB or 
+ * **preferred ->** the resource drive (AKA `J:\` or `\\soton.ac.uk\resource\`) which is accessible via the [web](https://fwa.soton.ac.uk/), via SMB (use the VPN), from the [University SVE](sve.md) and, in the case of SERG data, via `/mnt/SERG_data` on the University's [RStudio server](https://rstudio.soton.ac.uk/).
+ We recommend you use the `J:\` drive because it can hold much larger data volumes and can be made accessible to your colleagues/supervisors if required. For reasons of speed this implies you either:
+  * use the [University SVE](https://sotonac.sharepoint.com/teams/IT/SitePages/Services/SouthamptonVirtualEnvironment.aspx/) to run RStudio 'on campus' and thus close to the data. We have found some problems with persistence of installed packages in between SVE sessions if you try this
+  * mount the `J:\` drive on your laptop/PC and use a local version of (e.g.) RStudio to load the data. If you are doing this you might want to learn about [drake](drake.md) so the data is only loaded over the network the first time you run your code, not each time.
+  * [get access to and use](https://git.soton.ac.uk/SERG/uosrstudioserver) the University's [RStudio server](https://rstudio.soton.ac.uk/)  **<- best option**, this will let you run your code on the research filestore directly
+As far as we know the University's `Research Filestore` (AKA `\\xxx.files.soton.ac.uk\<SHARE>\`) does not allow direct access so it can only be used to archive data you are not actively using. See https://library.soton.ac.uk/researchdata/storage for more info.
+> Update: we understand that the University is looking to transition from the J drive to the use of oneDrive/sharepoint:
+<hr>
+"Over the course of this year, and next, we will be looking to move the University off J:Drive and utilise OneDrive for Business, as part of our Filestore Migration Project.
+Filestore Migration will:
+1) Move personal files from ‘My Documents’ and ‘Desktop’ to [One Drive for Business](https://support.microsoft.com/en-gb/office/what-is-onedrive-for-work-or-school-187f90af-056f-47c0-9656-cc0ddca7fdc2?ui=en-us&rs=en-gb&ad=gb), a Microsoft cloud service that connects you to all your files.  
+2) Move shared files e.g. J: drive to SharePoint Online. 
+Further information can be found here: [Storing files in O365](https://sotonac.sharepoint.com/teams/Office365/SitePages/Storing-files-in-Office-365.aspx)
+Please note: For people and processes that have a business need to maintain traditional filestore 
+(such as Linux desktop users), there will be a process to request this instead of One Drive for Business or SharePoint Online."
+</hr>
+> We have been assured by iSolutions that data will be migrated on a case by case basis and it might not suit all cases. They agreed that our use of rstudio.soton and J:\\ is a very good case for _not_ migrating. So watch out for that!
--- a/howTo/mirroringRepositories.md
+++ b/howTo/mirroringRepositories.md
+# Mirroring repositories in Gitlab
+Author: Tom Rushby (@tom_rushby)
+Gitlab has a useful page on mirroring: https://docs.gitlab.com/ee/user/project/repository/mirror/
+## Gitlab to GitHub mirror
+i.e. you want to create a copy of a repository in the University Git service on GitLab.
+In this case we create a *push mirror*: a downstream repository (on GitHub) that mirrors the commits made to the upstream repository (on Gitlab).
+Creating a mirror allows a repository with restricted access to be publically available (and backed-up on another platform), as well as taking advantage of functionality on GitHub such as pages (currently not available on the Univeristy implementation of GitLab).
+It's not terribly clear on the page linked above, but you will need to generate an Access Token within your GutHub account. This will be use as the password in the mirroring section of the repo on GitLab (git.soton.ac.uk). https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token
+## Step-by-step
+1. Create a clean (target, or 'downstream') repository in your GitHub account. I use the identical name as the repo in GitLab and add (mirror) in the description to indicate that it is a mirrored copy.
+2. Copy the https URL to the repo e.g. `https://github.com/mygitaccount/myrepo`
+3. In the **Gitlab** (source, or 'upstream') repository to be mirrored, goto > `Settings` > `Repository` and expand the `Mirrroring repositories` section. Enter the GitHub repo URL but in the format `https://gitusername@github.com/gitusername/reponame`.
+4. If you haven't already got one, go back to **GitHub** and and [create a personal access token](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token). In the top right corner. click your profile photo, and click **Settings**. On the Settings page, click **Developer settings** (at the bottom of the left-hand sidebar). Click **Personal access tokens** in the left sidebar and then **Generate new token**. When generating the access token you will need to tick the boxes to allow "repo" and "workflow" permissions. Once complete, copy the access token to the clipboard.
+5. Now in **Gitlab**, paste the access token into the **Password** field and click the **Mirror repository** button. The repository should now appear in the Mirrored repositories list with 'Push' as the direction. Each time a commit is made to your Gitlab repository  it will be automatically pushed to the mirrored repository downstream. Clicking the 'Update now' button in the right-hand column of the list will force the push and is useful to test the process when setting up. If the process fails, errors will be shown in the list under 'Last successful update'.
+6. Commit to your Gitlab repo and watch your GitHub copy update automatically. Magic!
--- a/resources.md
+++ b/resources.md
-## 'How to' resources:
+# Other 'How to' resources:
 * excellent [guidance for collaborative project teams (especially team leads)](https://opensource.guide/) even if they're not open and not R
 * [What they forgot to teach you](https://rstats.wtf/) about R including some required reading:
    * why you should use here::here() and **not setwd()** to make sure your code works _anywhere_
    * why you should use (RStudio) p/Projects to _manage your code_
-    * how you should name data files to _stay sane_
+    * [how you should name things](https://speakerdeck.com/jennybc/how-to-name-files) to _stay sane_
    * why you should not add anything to .Renviron or Rprofile unless you want to _irritate team members_
    * and much more, although:
-        *  we don't agree with [keeping your data in your project](https://rstats.wtf/project-oriented-workflow.html#work-in-a-project). Data should be somewhere else, _unless you're a .gitignore wizard_ and your data is small (and non-sensitive/non-commercial/public etc)
+        *  we don't agree with [keeping your data in your project](https://rstats.wtf/project-oriented-workflow.html#work-in-a-project). Data should be [somewhere else](keepingData.md), _unless you're a .gitignore wizard_ and your data is small (and non-sensitive/non-commercial/public etc)
 * using [git(hub/lab)](https://happygitwithr.com/) for version control (perhaps via [usethis](https://usethis.r-lib.org/) and knowing about [ohshitgit](https://ohshitgit.com/) just in case)
- * using [git branches](https://git-scm.com/book/en/v2/Git-Branching-Branches-in-a-Nutshell) as a way for different people to work on the same project without clashing 
+ * using git forks and [branches](https://git-scm.com/book/en/v2/Git-Branching-Branches-in-a-Nutshell) as a way for different people to work on the same project without clashing 
-    * Tom has [blogged](https://twrushby.wordpress.com/2017/03/27/collaboration-with-rstudio-and-git-using-branches/) about this
+    * Tom has written a [setup guide](https://git.soton.ac.uk/SERG/feeg6025support/feeg6025-2020-2021/-/blob/master/Guides/SETUP.md) for getting started
    * we have written a [short guide](gitBranches.md)
    * [HappyGit](https://happygitwithr.com/fork-and-clone.html) gives you the details
    * [ohshitgit](https://ohshitgit.com/) may be required here too (but not if you've followed the instructions above)
 * using [git(hub/lab) issues](https://guides.github.com/features/issues/) as a way to manage your project - just like we did for the [new ECCD website](https://git.soton.ac.uk/SERG/sergwebsite/-/issues)
- * how to use [drake](https://docs.ropensci.org/drake/) to massively speed up and [manage your workflow](https://milesmcbain.xyz/the-drake-post/). This includes always:
-    * loading and processing all your data inside a drake plan in a .R file. _So it only gets re-run if the code or data changes_
+Even more ...
-    * creating each of your output objects inside the drake plan. _So they only get re-created if the code or data changes_
-    * rendering your .Rmd report at the end of the drake plan. _So you can pass the params in and report the output objects_
+ * on [naming files (tidyverse)](https://style.tidyverse.org/files.html)
-    * => the first time you run the plan it will build everything. The second time, e.g. after you fix a .Rmd typo, _only the bits that have changed get re-built_. **Warning: drake can reduce the time it takes to run your code by an order of magnitude. This could seriously damage your tea & cake in-take...**
+ * coding style, syntax etc. with [lintr](https://github.com/jimhester/lintr)
+ * set up projects as packages with help from [usethis](https://usethis.r-lib.org/)
\ No newline at end of file
+ * tools for reproducibility
+    * Is your project file reproducible? [fertile](https://github.com/baumer-lab/fertile) might help
+    * Or try [rrrpkg](https://github.com/ropensci/rrrpkg) to create a research compendium
--- a/howTo/r-with-aws/access_keys/example_credentials_script.R
+++ b/howTo/r-with-aws/access_keys/example_credentials_script.R
+# Set environment variables to authenticate access to AWS S3 bucket
+# Use in conjunction with aws.s3 package
+Sys.setenv(
+  "AWS_ACCESS_KEY_ID" = "mykey",
+  "AWS_SECRET_ACCESS_KEY" = "mysecretkey",
+  "AWS_DEFAULT_REGION" = "eu-west-2"
+)
\ No newline at end of file
--- a/howTo/r-with-aws/bucket-policy
+++ b/howTo/r-with-aws/bucket-policy
+{
+    "Version": "2012-10-17",
+    "Id": "PolicyForDestinationBucket",
+    "Statement": [
+        {
+            "Sid": "Permissions on objects and buckets",
+            "Effect": "Allow",
+            "Principal": {
+                "AWS": "arn:aws:iam::000000000000:role/cross-account-bucket-replication-role"
+            },
+            "Action": [
+                "s3:List*",
+                "s3:GetBucketVersioning",
+                "s3:PutBucketVersioning",
+                "s3:ReplicateDelete",
+                "s3:ReplicateObject"
+            ],
+            "Resource": [
+                "arn:aws:s3:::my-s3-bucket-name",
+                "arn:aws:s3:::my-s3-bucket-name/*"
+            ]
+        },
+        {
+            "Sid": "Permission to override bucket owner",
+            "Effect": "Allow",
+            "Principal": {
+                "AWS": "arn:aws:iam::999999999999:root"
+            },
+            "Action": "s3:ObjectOwnerOverrideToBucketOwner",
+            "Resource": "arn:aws:s3:::my-s3-bucket-name/*"
+        }
+    ]
+}
\ No newline at end of file
No results found