Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found
Select Git revision

Target

Select target project
  • ba1e12/workflow
  • lsb1/workflow
  • er1e18/workflow
  • ak9g14/workflow
  • va1e16/workflow
  • jos1g14/workflow
  • twr1m15/workflow
7 results
Select Git revision
Show changes
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"s3:PutAnalyticsConfiguration",
"s3:GetObjectVersionTagging",
"s3:DeleteAccessPoint",
"s3:CreateBucket",
"s3:ReplicateObject",
"s3:GetObjectAcl",
"s3:GetBucketObjectLockConfiguration",
"s3:DeleteBucketWebsite",
"s3:GetIntelligentTieringConfiguration",
"s3:DeleteJobTagging",
"s3:PutLifecycleConfiguration",
"s3:GetObjectVersionAcl",
"s3:PutObjectTagging",
"s3:DeleteObject",
"s3:DeleteObjectTagging",
"s3:GetBucketPolicyStatus",
"s3:GetObjectRetention",
"s3:GetBucketWebsite",
"s3:GetJobTagging",
"s3:PutReplicationConfiguration",
"s3:GetObjectAttributes",
"s3:DeleteObjectVersionTagging",
"s3:PutObjectLegalHold",
"s3:InitiateReplication",
"s3:GetObjectLegalHold",
"s3:GetBucketNotification",
"s3:PutBucketCORS",
"s3:GetReplicationConfiguration",
"s3:ListMultipartUploadParts",
"s3:PutObject",
"s3:GetObject",
"s3:PutBucketNotification",
"s3:DescribeJob",
"s3:PutBucketLogging",
"s3:GetAnalyticsConfiguration",
"s3:PutBucketObjectLockConfiguration",
"s3:GetObjectVersionForReplication",
"s3:CreateAccessPoint",
"s3:GetLifecycleConfiguration",
"s3:GetInventoryConfiguration",
"s3:GetBucketTagging",
"s3:PutAccelerateConfiguration",
"s3:DeleteObjectVersion",
"s3:GetBucketLogging",
"s3:ListBucketVersions",
"s3:ReplicateTags",
"s3:RestoreObject",
"s3:ListBucket",
"s3:GetAccelerateConfiguration",
"s3:GetObjectVersionAttributes",
"s3:GetBucketPolicy",
"s3:PutEncryptionConfiguration",
"s3:GetEncryptionConfiguration",
"s3:GetObjectVersionTorrent",
"s3:AbortMultipartUpload",
"s3:PutBucketTagging",
"s3:GetBucketRequestPayment",
"s3:GetAccessPointPolicyStatus",
"s3:UpdateJobPriority",
"s3:GetObjectTagging",
"s3:GetMetricsConfiguration",
"s3:GetBucketOwnershipControls",
"s3:DeleteBucket",
"s3:PutBucketVersioning",
"s3:GetBucketPublicAccessBlock",
"s3:ListBucketMultipartUploads",
"s3:PutIntelligentTieringConfiguration",
"s3:PutMetricsConfiguration",
"s3:PutBucketOwnershipControls",
"s3:PutObjectVersionTagging",
"s3:PutJobTagging",
"s3:UpdateJobStatus",
"s3:GetBucketVersioning",
"s3:GetBucketAcl",
"s3:PutInventoryConfiguration",
"s3:GetObjectTorrent",
"s3:PutBucketWebsite",
"s3:PutBucketRequestPayment",
"s3:PutObjectRetention",
"s3:GetBucketCORS",
"s3:GetBucketLocation",
"s3:GetAccessPointPolicy",
"s3:ReplicateDelete",
"s3:GetObjectVersion"
],
"Resource": [
"arn:aws:s3:::my-aws-bucket",
"arn:aws:s3:*:999999999999:accesspoint/*",
"arn:aws:s3:::my-aws-bucket/*",
"arn:aws:s3:*:999999999999:job/*"
]
},
{
"Sid": "VisualEditor1",
"Effect": "Allow",
"Action": [
"s3:ListStorageLensConfigurations",
"s3:ListAccessPointsForObjectLambda",
"s3:GetAccessPoint",
"s3:GetAccountPublicAccessBlock",
"s3:ListAllMyBuckets",
"s3:ListAccessPoints",
"s3:ListJobs",
"s3:PutStorageLensConfiguration",
"s3:ListMultiRegionAccessPoints",
"s3:CreateJob"
],
"Resource": "*"
}
]
}
\ No newline at end of file
# Requires aws.s3 package install if required
# install.packages("aws.s3")
# Set environment variables to use AWS access keys
source("./howTo/r-with-aws/access_keys/aws_access.R") # Replace with your credentials e.g. next line
# source("./howTo/r-with-aws/access_keys/example_credentials_script.R")
# Get list of buckets
aws.s3::bucketlist()
# set bucket name (less typing) - this is the name of your s3 bucket
my_bucket <- "twr-test-bucket-r"
# write a file to temp dir - using a built in data frame
write.csv(iris, file.path(tempdir(), "iris.csv"))
# save an object (file from the temp dir) to the bucket
aws.s3::put_object(
file = file.path(tempdir(), "iris.csv"),
object = "iris.csv",
bucket = my_bucket
)
# list objects in the bucket
aws.s3::get_bucket(
bucket = my_bucket
)
# provide a nice table of objects in the bucket
data.table::rbindlist(aws.s3::get_bucket(bucket = my_bucket))
# read an object from s3 bucket, three ways ...
# 1. bucket and object specified separately
aws.s3::s3read_using(
FUN = read.csv, bucket = my_bucket, object = "iris.csv"
)
# 2. use the s3 URI
aws.s3::s3read_using(
FUN = read.csv, object = "s3://twr-test-bucket-r/iris.csv"
)
# 3. use data.table's fread() function for fast CSV reading
aws.s3::s3read_using(
FUN = data.table::fread, object = "s3://twr-test-bucket-r/iris.csv"
)
# Using renv to create reproducible environments for R projects
## What is renv?
An environment manager for R projects. Meaning that it organises the package dependencies within an R project, recording the versions of each package used in analysis and allowing simply transport of projects from one computer to another.
This is achieved through the creation of package 'snapshots' which can be (re)installed (or 'restored') on different computers with one simple command.
`renv` provides an alternative solution to our [workflow/loadLibraries function](https://git.soton.ac.uk/SERG/workflow/-/blob/master/R/loadLibraries.R) and can tackle package persistence problems when [using RStudio within the University SVE](https://git.soton.ac.uk/SERG/workflow/-/blob/master/howTo/sve.md).
Advantages of using `renv` over `woRkflow::loadLibraries()` is that `renv` automatically scans the code in a project to compile a list of packages used. `renv::snapshot()` also stores information on package versions. Nice!
Using `renv` might help to make collaboration that little bit simpler.
### Install
Start by installing the [renv](https://rstudio.github.io/renv/) package.
```
install.packages('renv')
```
Open your project and initialise renv to create a project specific local environment and R library.
```
renv::init()
```
If this is the first use of renv, running the init() command will generate output similar to below:
```
Welcome to renv!
It looks like this is your first time using renv. This is a one-time message,
briefly describing some of renv's functionality.
renv maintains a local cache of data on the filesystem, located at:
- "C:/Users/twr1m15/AppData/Local/R/cache/R/renv"
This path can be customized: please see the documentation in `?renv::paths`.
renv will also write to files within the active project folder, including:
- A folder 'renv' in the project directory, and
- A lockfile called 'renv.lock' in the project directory.
In particular, projects using renv will normally use a private, per-project
R library, in which new packages will be installed. This project library is
isolated from other R libraries on your system.
In addition, renv will update files within your project directory, including:
- .gitignore
- .Rbuildignore
- .Rprofile
Please read the introduction vignette with `vignette("renv")` for more information.
You can browse the package documentation online at https://rstudio.github.io/renv/.
```
If the project already has a lockfile the following message will be displayed ...
```
This project already has a lockfile. What would you like to do?
1: Restore the project from the lockfile.
2: Discard the lockfile and re-initialize the project.
3: Activate the project without snapshotting or installing any packages.
4: Abort project initialization.
```
The initialisation command ensures that any time the project is opened, a check is performed to ensure that the `renv` package is installed on the system and that the package is loaded to give access to the `renv::restore()` command (see 'Restore' below).
The use of `renv` is confirmed on opening a project by feedback in the console, for example:
```
* Project 'H:/SVE/git.soton/rtools' loaded. [renv 0.15.2]
```
### Lock file
The file `renv.lock` contains a description of the state of the project's library.
For example:
```{
"R": {
"Version": "4.1.1",
"Repositories": [
{
"Name": "CRAN",
"URL": "https://cran.rstudio.com"
}
]
},
"Packages": {
"renv": {
"Package": "renv",
"Version": "0.15.2",
"Source": "Repository",
"Repository": "CRAN",
"Hash": "206c4ef8b7ad6fb1060d69aa7b9dfe69",
"Requirements": []
}
}
}
```
## Updating the lock file
When we add some code requiring another package to the repo (in this script), for example ...
```
# install.packages("ggplot2")
library(ggplot2)
```
To create simple plot using built-in dataset `cars` ...
```
ggplot(data = cars, mapping = aes(x = speed, y = dist)) +
geom_point()
```
The package(s) can then be added to the project library by running another snapshot ...
```{r}
renv::snapshot()
```
Running renv::snapshot() updates the `renv.lock` file (and local library) with the new packages. Feedback is generated, for example (looks at the tonne of dependencies for the ggplot2 package) ...
```
The following package(s) will be updated in the lockfile:
# CRAN ===============================
- MASS [* -> 7.3-54]
- Matrix [* -> 1.3-4]
- R6 [* -> 2.5.1]
- RColorBrewer [* -> 1.1-2]
- base64enc [* -> 0.1-3]
- cli [* -> 3.1.1]
- colorspace [* -> 2.0-2]
- crayon [* -> 1.4.2]
- digest [* -> 0.6.29]
- ellipsis [* -> 0.3.2]
- evaluate [* -> 0.14]
- fansi [* -> 1.0.2]
- farver [* -> 2.1.0]
- fastmap [* -> 1.1.0]
- ggplot2 [* -> 3.3.5]
- glue [* -> 1.6.1]
- gtable [* -> 0.3.0]
- highr [* -> 0.9]
- htmltools [* -> 0.5.2]
- isoband [* -> 0.2.5]
- jquerylib [* -> 0.1.4]
- jsonlite [* -> 1.7.3]
- knitr [* -> 1.37]
- labeling [* -> 0.4.2]
- lattice [* -> 0.20-44]
- lifecycle [* -> 1.0.1]
- magrittr [* -> 2.0.2]
- mgcv [* -> 1.8-36]
- munsell [* -> 0.5.0]
- nlme [* -> 3.1-152]
- pillar [* -> 1.7.0]
- pkgconfig [* -> 2.0.3]
- rlang [* -> 1.0.0]
- rmarkdown [* -> 2.11]
- scales [* -> 1.1.1]
- stringi [* -> 1.7.6]
- stringr [* -> 1.4.0]
- tibble [* -> 3.1.6]
- tinytex [* -> 0.36]
- utf8 [* -> 1.2.2]
- vctrs [* -> 0.3.8]
- viridisLite [* -> 0.4.0]
- withr [* -> 2.4.3]
- xfun [* -> 0.29]
- yaml [* -> 2.2.2]
```
The packages now appear in the contents of the lock file as shown below ...
```
{
"R": {
"Version": "4.1.1",
"Repositories": [
{
"Name": "CRAN",
"URL": "https://cran.rstudio.com"
}
]
},
"Packages": {
"MASS": {
"Package": "MASS",
"Version": "7.3-54",
"Source": "Repository",
"Repository": "CRAN",
"Hash": "0e59129db205112e3963904db67fd0dc",
"Requirements": []
},
"Matrix": {
"Package": "Matrix",
"Version": "1.3-4",
"Source": "Repository",
"Repository": "CRAN",
"Hash": "4ed05e9c9726267e4a5872e09c04587c",
"Requirements": [
"lattice"
]
},
"R6": {
"Package": "R6",
"Version": "2.5.1",
"Source": "Repository",
"Repository": "CRAN",
"Hash": "470851b6d5d0ac559e9d01bb352b4021",
"Requirements": []
},
"RColorBrewer": {
"Package": "RColorBrewer",
"Version": "1.1-2",
"Source": "Repository",
"Repository": "CRAN",
"Hash": "e031418365a7f7a766181ab5a41a5716",
"Requirements": []
}
XXXX_CURTAILED_FOR_SPACE_XXXX
}
}
```
## Restore
The `renv::restore()` command allows a previous snapshot of packages (including package versions) to be installed.
At [SERG](https://energy.soton.ac.uk) we like to work in a collaborative manner ... this often means sharing analysis and code across platforms. To that end we need packages/libraries used within RStudio in any piece of analysis to be restored easily on another system.
The `renv` package allows all of the packages used in a specific project to be (re)installed using a single command - very useful for porting projects across different computers/contributors.
## Resources
* [renv for R projects](https://rstudio.github.io/renv/index.html)
* [Introduction to renv](https://rstudio.github.io/renv/articles/renv.html) by Kevin Ushey
* [Collaborating with renv](https://rstudio.github.io/renv/articles/collaborating.html)
# Using R/RStudio on the SVE service
## What is the SVE?
A windows [virtual desktop](https://sotonac.sharepoint.com/teams/IT/SitePages/Services/SouthamptonVirtualEnvironment.aspx/)
The SVE offers two services:
* Win 10 Student service - this one has research & academic software suites such as RStudio etc. But it does **not** have persistence. This means that any packages you install or any repos you clone into your local space will vanish when you log out. The only exception is if you:
* clone the repos to your MyDocuments/OneDrive account
* work out how to use your Windows profile to 'host' the packages (or do some nifty work with the [`renv`](howTo/renv.md) package)
* Win 10 Staff service - this is a generic staff service intended for admin & professional staff use (for now). While it **does** have persistence, it does **not** have RStudio installed...
## Why would I use the SVE?
1. The 'Student' service hosts most of the applications you'll need including RStudio etc
1. It offers easy access to [data folders such as J://](keepingData.md) as you are effectively working 'on campus'. This makes data loading fast. Well... faster than doing it over your home broadband.
1. You can easily access your oneDrive folders.
1. The virtual PC instance you log in to has reasonable memory allocation so unless you have huuuge datasets you should be OK.
But:
1. No R package or repo persistence (see above)
1. It does not seem to be able to 'mount' Sharepoint/Teams folders - so you cannot easily load data held in them
## Git
Git is installed on the SVE - yay! You do know how to use Git, right? No? [try starting here](https://happygitwithr.com/index.html).
### Git authentication
HTTPS vs SSH? It's up to you but [some argue](https://happygitwithr.com/https-pat.html#https-vs-ssh) that HTTPS is easy to get you going.
Whatever, you will need to authenticate the SVE (local machine) with Gitlab/GitHub. You can do this via HTTPS, entering your username and password in each session on the SVE (not a big deal)... or by using an RSA key. This RSA key should now be persistent on the SVE (it didn't used to be before the SVE upgrade).
## RStudio in SVE
### Packages
See note above re persistence.
If you need to add new packages use the install.packages() function. This seems to be able to bypass a permissons issue in C:/Apps/RLibraries which causes the normal the RStudio GUI/tab method to fail.
Better yet use the [loadPackages() function](https://git.soton.ac.uk/SERG/workflow/-/blob/master/R/loadLibraries.R) developed at SERG :-)
Even better, consider using the [`renv`](howTo/renv.md) package.
### Using Git within RStudio
We recommend storing your local (working) project repositories within your `Documents` folder on the university filestore as [storing Git repositories within cloud-synced folders may cause problems.](https://andreashandel.github.io/MADAcourse/Tools_Github_Introduction.html#GitGitHub_and_other_cloud_based_sync_options)
Git does seem to have some weird behaviour on the SVE to be aware of ... the correct operation of Git witin RStudio on the University network requires some careful working practices with respect to file paths.
Using the University filestore for project (working) files requires use of the mapped file path `filestore (H:)` _<u>not</u>_ the `Documents` shortcut in the `Quick access` group of Windows Explorer.
While these two paths refer to the same physical location for your documents folder, they are resolved differently i.e. `H:\` vs `\\filestore.soton.ac.uk\users\xxmyusernamexx\mydocuments`. When a project is started in RStudio from the latter location, the Git repository is not recognised correctly. As a result, the `Git` tab will be missing from the Environment pane.
Loading a project from `Documents` in `Quick access` group, the figure shows no `Git` tab and thus no access to Git processes thru the RStudio IDE:
![Environment Pane, My Documents](img/rtools_env_pane_mydocs.png)
Loading project from the same folder via `H` (mapped drive), figure shows the `Git` tab is now present and Git commands now accessible:
![Environment Pane, H](img/rtools_env_pane_h.png)
This problem seems to be limited to the Environment pane within the RStudio IDE as running Git commands through the Terminal (Console pane) pick up the repository correctly (as shown in image below).
![Terminal Pane](img/rtools_terminal_pane.png)
So using Git through the Terminal is unaffected by the path used to open the project.
Yes it's possible. I'm not sure why you would but we're a broad church so...
You need to set the interpreter...
On a mac:
* open Terminal
* type `which python`
* I get "/opt/anaconda3/bin/python"
* you might get other flavours and locations
* it needs to be the one that get updated when you install modules using `pip install numpy pandas matplotlib` etc
* in RStudio go to Project Options -> Python. If you are not using an RStudio project do this in Global Options instead but remember this will become the default...
* paste the results of `which python` into the 'select' box. You can try to use the auto-find feature but it didn't find conda for me
* hit OK. RStudio will want to restart
* you may have to do this twice for it to work
If you open a .py file and try to run it, RStudio will want to install [reticulate](https://rstudio.github.io/reticulate/) which is R's interface to Python (apparently).
it's all getting a bit serpentine...
File moved
# make file for drake
# see https://books.ropensci.org/drake/projects.html#usage
# Set up ----
startTime <- proc.time()
# use r_make to run the plan inside a clean R session so nothing gets contaminated
drake::r_make(source = "_drake_basicReport.R") # where we keep the drake plan etc
# we don't keep this in /R because that's where the package functions live
# we don't use "_drake.R" because we have lots of different plans
# Finish off ----
t <- proc.time() - startTime # how long did it take?
elapsed <- t[[3]]
print("Done")
print(paste0("Completed in ", round(elapsed/60,3), " minutes using ",
R.version.string, " running on ", R.version$platform))
\ No newline at end of file
......@@ -10,19 +10,24 @@ Things you should touch:
| Item | Description |
| --- | --- |
| **[R/](R/)** | Where we store functions that get built by the package - these are then available for use in any project|
| **[Rmd/](Rmd/)** | Where we store .Rmd files and the .R scripts that call them (usually using a `drake` plan) |
| **[docs/](docs/)** | Where we put output generated by the .R/.Rmd code. This is helpful if you are using [github/lab pages](https://guides.github.com/features/pages/). Unfortunately the University of Southampton gitlab service does not currently support this. |
| **[howTo/](howTo/)** | Our collection of guides and `how-tos` |
| **[man/](man/)** | Where roxygen puts the package man(ual) files |
| **[notData/](notData/)** | Where we do _not_ store [data](/howTo/keepingData.md). R packages expect certain kinds of data in their 'data/' folders. Do not put your data in it. |
| **[.gitignore](.gitignore)** | A place to tell git what _not_ to synchronise e.g. `.csv` or [weird OS files](https://gist.github.com/adamgit/3786883)|
| **[analysis/](analysis/)** | Where we store .Rmd files and the .R scripts that call them (usually using a `drake` plan) |
| **[CONTRIBUTING.md](CONTRIBUTING.md)** | How to contribute (nicely)|
| **[DESCRIPTION](DESCRIPTION)** | But only if you use this as a template for your own repo - it is a special file for packages |
| **[docs/](docs/)** | Where we put output generated by the .R/.Rmd code. This is helpful if you are using [github/lab pages](https://guides.github.com/features/pages/). Unfortunately the University of Southampton gitlab service does not currently support this. |
| **[env.R](env.R)** | Where we store all the parameters that might be re-used across our repo. Such as colour defaults, data paths etc. We avoid using a project/repo level .Rprofile because it can lead to [a **lot** of confusion](https://support.rstudio.com/hc/en-us/articles/360047157094-Managing-R-with-Rprofile-Renviron-Rprofile-site-Renviron-site-rsession-conf-and-repos-conf). |
| **[LICENSE](LICENSE)** | Edit to suit your needs |
| **[notData/](notData/)** | Where we do not store data. R packages expect certain kinds of data in their 'data/' folders. Do not put your data in it. |
| **[R/](R/)** | Where we store functions that get built |
| **[README.md](README.md)** | Repo readme |
| **[resources.md](resources.md)** | Our collection of guides and `how-tos` |
| **[template.md](template.md)** | This file |
| **[_drake_basicReport.R](_drake_basicReport.R)** | basic drake plan |
| **[bibliography.bib](bibliography.bib)** | a place to keep references (in bibtex style)|
| **[env.R](env.R)** | Where we store all the parameters that might be re-used across our repo. Such as colour defaults, data paths etc. We avoid using a project/repo level .Rprofile because it can lead to [a **lot** of confusion](https://support.rstudio.com/hc/en-us/articles/360047157094-Managing-R-with-Rprofile-Renviron-Rprofile-site-Renviron-site-rsession-conf-and-repos-conf). |
| **[make_basicReport.R](make_basicReport.R)** | basic reporting script which sources the drake plan |
| **[repoAsATemplate.md](repoAsATemplate.md)** | This file |
More on data:
More on [data](/howTo/keepingData.md):
> We recommend **not** putting your data in your repo at all.
......