diff --git a/notes/2020-05-11-UKDS_safeResearcherTraining.md b/notes/2020-05-11-UKDS_safeResearcherTraining.md new file mode 100644 index 0000000000000000000000000000000000000000..d8bae56be3fc4fc3126bb12d39da0e13ed853654 --- /dev/null +++ b/notes/2020-05-11-UKDS_safeResearcherTraining.md @@ -0,0 +1,60 @@ +# UKDS Safe Researcher Training + +May 11th 2020 + +Case Officer - if want to vary research or pursue new idea with same data, scope creep is OK, can be approved, can need a new project. + +## SDC in theory + +UKDS/ONS: Threshold = 10 (cell counts) + +0 can be a problem unless they are logical - e.g. no 16 year olds have degrees in a table or age x quals + +Watch for dominance e.g in % ranks etc + +Avoid min/max -> unless have 10 cases (rounding?) Do you need to publish this? + +Beware secondary disclosure - e.g. subtract 1 table counts from another -> do you get small cell counts? + +## SDC & research + +Descriptive tables are a pain - inherently high risk + +Regression coeffs etc are easier + +No automated output checker + +Avoid skewed distributions, small numbers, huge outliers, dominant observations, rare events etc + +Avoid min/max values unless can prove not identifiable. Same for median (which is on a box plot) + +If asked for new details etc always add these to the outputs - never in an email + +Raise/notify an issue early + +## Output checking + +At end of project ask for output which is syntax files & they will send + +All sessions are recorded - can be used to resolve issues etc + +Always include underlying counts for Figures (unweighted) + +Beware summary tables which include all units and then categorised tables which leave out a row/cell with 1 item in it. + +Maps - use heat maps -> intensity/prevalence, much safer. Not point maps. + +Put yourself in checker's position! Make it easy. + +'Good' output: + +use https://www.ukdataservice.ac.uk/help/get-in-touch.aspx + +-> output request + +- publishable quality output (word & excel tables?) +- doesn't support knitr etc - does SERL portal? + +User Guide: https://www.ukdataservice.ac.uk/media/622685/sa_user_guide.pdf + + diff --git a/notes/README.md b/notes/README.md new file mode 100644 index 0000000000000000000000000000000000000000..d86426b96a58066a84491f0703cf0f3ba2139e94 --- /dev/null +++ b/notes/README.md @@ -0,0 +1,7 @@ +# SERL + +Repo to support work on the SERL project (https://serl.ac.uk/) + +## Notes + +From meetings, training etc. Public stuff. \ No newline at end of file