Skip to content
Snippets Groups Projects
Commit 3ad40be1 authored by Jakub Dylag's avatar Jakub Dylag
Browse files

Update README

parent c6eb56a7
No related branches found
No related tags found
Loading
...@@ -14,7 +14,7 @@ ...@@ -14,7 +14,7 @@
<sup>*</sup>Correspondence to: Jakub J. Dylag, Digital Health and Biomedical Engineering, School of Electronics and Computer Science, Faculty of Engineering and Physical Sciences, University of Southampton, J.J.Dylag@soton.ac.uk <sup>*</sup>Correspondence to: Jakub J. Dylag, Digital Health and Biomedical Engineering, School of Electronics and Computer Science, Faculty of Engineering and Physical Sciences, University of Southampton, J.J.Dylag@soton.ac.uk
### 🖋 How to cite this work ### 🖋 How to cite this work
> Dylag JJ, Chiovoloni R, Akbari A, Fraser SD, Boniface MJ. A Tool for Automating the Curation of Medical Concepts derived from Coding Lists. GitLab [Internet]. May 2024. Available from: https://git.soton.ac.uk/meld/meldb/concepts-processing > Dylag JJ, Chiovoloni R, Akbari A, Fraser SD, Boniface MJ. A Tool for Automating the Curation of Medical Concepts derived from Coding Lists. GitLab [Internet]. May 2024. Available from: https://git.soton.ac.uk/meldb/concepts-processing
## 🙌 Introduction ## 🙌 Introduction
This project generate the medical coding lists that defines cohort phenotypes used for inclusion criteria in MELD-B. The goal is to automatically prepare a code list from an approved clinical specification of inclusion criteria. This project generate the medical coding lists that defines cohort phenotypes used for inclusion criteria in MELD-B. The goal is to automatically prepare a code list from an approved clinical specification of inclusion criteria.
...@@ -24,8 +24,8 @@ The output code list is then used by data providers to select MELD-B cohorts. ...@@ -24,8 +24,8 @@ The output code list is then used by data providers to select MELD-B cohorts.
## 📃 Method ## 📃 Method
### Process ### Process
1. Approved MELB-B concepts are defined in a Excel spreadsheet (currently CONC_summary_working.xlsx). 1. Approved MELB-B concepts are defined in a CSV spreadsheet (currently PHEN_summary_working.csv).
2. Imported Code Lists in `/codes` are verified against all NHS TRUD registered codes 2. Imported Code Lists in `/src` are verified against all NHS TRUD registered codes
3. Mappings from Imported Code Lists to Outputted MELD-B Concept's Code list are defined in JSON format within `PHEN_assign_v3.json`. 3. Mappings from Imported Code Lists to Outputted MELD-B Concept's Code list are defined in JSON format within `PHEN_assign_v3.json`.
- See "JSON Phenotype Mapping" section for more details - See "JSON Phenotype Mapping" section for more details
4. Process is executed from command line either manually or from bash script `run.sh` 4. Process is executed from command line either manually or from bash script `run.sh`
...@@ -55,57 +55,6 @@ MELD-B refers to various diagnostic code formats included in target datasets. ...@@ -55,57 +55,6 @@ MELD-B refers to various diagnostic code formats included in target datasets.
* ICD-10 are codes used in hospital settings and are importnat for the HES linked datasets. * ICD-10 are codes used in hospital settings and are importnat for the HES linked datasets.
* ATC codes are interntionally accepted for the classification of medicinces and maintained by the WHO. * ATC codes are interntionally accepted for the classification of medicinces and maintained by the WHO.
### Medical Code Sources
#### MLTC Coding Sources
MELD-B has defined a set of phenotypes for MLTC conditions that are considered burdensome. Each Phenotype includes one or more diagnosis.
* Ho et al - https://cronfa.swan.ac.uk/Record/cronfa60877/Download/60877__25107__3700915cf20e418aae714e5639722449.pdf
* The diagnositc codes have been mapped by SAIL to by the ThinkingGroup https://github.com/THINKINGGroup/phenotypes which has been replicated by the RSF https://github.com/aim-rsf/phenotypes
* Azcoaga-Lorenzo, A., Akbari, A., Davies, J., Khunti, K., Kadam, U.T., Lyons, R., McCowan, C., Mercer, S.W., Nirantharakumar, K., Staniszewska, S. and Guthrie, B., 2022. Measuring multimorbidity in research: Delphi consensus study. BMJ medicine, 1(1), p.e000247.
* Hanlon et al - https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8901063/#pmed.1003931.s005
* https://github.com/dmcalli2/dynamic_protocols/blob/master/defining_comorbidities_SAIL.md??
* Hanlon, P., Jani, B.D., Nicholl, B., Lewsey, J., McAllister, D.A. and Mair, F.S., 2022. Associations between multimorbidity and adverse health outcomes in UK Biobank and the SAIL Databank: A comparison of longitudinal cohort studies. PLoS Medicine, 19(3), p.e1003931.
- ClinicalCodes Project, University of Manchester - https://clinicalcodes.rss.mhs.man.ac.uk/
- Hollinghurst et al (for Frailty Primary Care) - https://academic.oup.com/ageing/article/48/6/922/5576114
- Joe Hollinghurst, Richard Fry, Ashley Akbari, Andy Clegg, Ronan A Lyons, Alan Watkins, Sarah E Rodgers, External validation of the electronic Frailty Index using the population of Wales within the Secure Anonymised Information Linkage Databank, Age and Ageing, Volume 48, Issue 6, November 2019, Pages 922–926, https://doi.org/10.1093/ageing/afz110
- Gilbert et al (for Frailty Secondary Care) - https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5946808/
- Gilbert T, Neuburger J, Kraindler J, Keeble E, Smith P, Ariti C, Arora S, Street A, Parker S, Roberts HC, Bardsley M, Conroy S. Development and validation of a Hospital Frailty Risk Score focusing on older people in acute care settings using electronic hospital records: an observational study. Lancet. 2018 May 5;391(10132):1775-1782. doi: 10.1016/S0140-6736(18)30668-8. Epub 2018 Apr 26. PMID: 29706364; PMCID: PMC5946808.
- Abbasizanjani et al (for Long Covid) https://www.adruk.org/news-publications/publications-reports/data-insight-clinical-coding-and-capture-of-long-covid-a-cohort-study-in-wales-using-linked-health-and-demographic-data-824
- Abbasizanjani H, Bedston S, Robinson L, Curds M, Akbari A. Clinical coding and capture of Long COVID: a cohort study in Wales using linked health and demographic data. ADR Wales Data Insight. August 2023. https://adrwales.org/wp-content/uploads/2023/08/Clinical-coding-and-capture-of-Long-COVID.pdf
* Morton C., Walker A. (for Sickle Disease) https://www.opencodelists.org/codelist/opensafely/sickle-cell-disease/2020-04-14/
* Department of Public Health and Primary Care, University of Cambridge (for Irritable Bowel Syndrome) https://www.phpc.cam.ac.uk/pcu/research/research-groups/crmh/cprd_cam/codelists/v11/
<!-- MELD-B has extended the set of conditions to include additional conditions that are considered burdensome but further work is needed to determine if these should be included
* LongCovid: https://bjgp.org/content/71/712/e806
MELD-B has identified other MLTC lists but it's not clear how these are processed
* Cambridge: https://www.phpc.cam.ac.uk/pcu/research/research-groups/crmh/cprd_cam/codelists/v11/
* CALIBAR: https://www.thelancet.com/article/S2589-7500(19)30012-3/fulltext, https://www.caliberresearch.org/portal/codelists
* Head: https://www.thelancet.com/journals/lanhl/article/PIIS2666-7568(21)00146-X/fulltext
* Barnet: https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(12)60240-2/fulltext#sec1 -->
#### Drug Code Sources
MELD-B uses drug codes as a proxy indicator of burden. This codes are derived from work by Francis Mair.
* Hanlon, P. https://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.1003931
* Hanlon, P., Jani, B.D., Nicholl, B., Lewsey, J., McAllister, D.A. and Mair, F.S., 2022. Associations between multimorbidity and adverse health outcomes in UK Biobank and the SAIL Databank: A comparison of longitudinal cohort studies. PLoS Medicine, 19(3), p.e1003931.
- Wilinson et al http://dx.doi.org/10.1136/jech-2021-217090
- Wilkinson T, Schnier C, Bush K, et alDrug prescriptions and dementia incidence: a medication-wide association study of 17000 dementia cases among half a million participantsJ Epidemiol Community Health 2022;76:223-229.
## ⚙️ Setup ## ⚙️ Setup
- Delete corrupted files that cannot be read with `bash import.sh` - Delete corrupted files that cannot be read with `bash import.sh`
...@@ -187,11 +136,10 @@ Script preprocess code lists and to map to given concept/phenotype ...@@ -187,11 +136,10 @@ Script preprocess code lists and to map to given concept/phenotype
`bash ./run.sh` `bash ./run.sh`
### Execution (Shell Command) ### Execution (Shell Command)
usage: `main.py [-h] [-r2] [-r3] [-i] [-s] [-o] [-a] [-m] [-c] [--no-translate] [--no-verify] map summary` usage: `python main.py [-h] [-r2] [-r3] [-i] [-s] [-o] [-a] [-m] [-c] [--no-translate] [--no-verify] [--output] [--error-log] mapping_file`
positional arguments: positional arguments:
- `map` Concept/Phenotype Assignment File (json) - `mapping_file` Concept/Phenotype Assignment File (json)
- `summary` Summary working excel document
optional arguments: optional arguments:
- `-r2`, `--read2-code` Read V2 Codes Column name in Source File - `-r2`, `--read2-code` Read V2 Codes Column name in Source File
...@@ -204,6 +152,10 @@ optional arguments: ...@@ -204,6 +152,10 @@ optional arguments:
- `-c`, `--cprd-code` CPRD Product Codes Column name in Source File - `-c`, `--cprd-code` CPRD Product Codes Column name in Source File
- `--no-translate` Do not translate code types - `--no-translate` Do not translate code types
- `--no-verify` Do not verify codes are correct - `--no-verify` Do not verify codes are correct
- `--output` Filepath to save output to
- `--error-log` Filepath to save error log to
> **_EXAMPLE:_** `python main.py PHEN_assign_v3.json -r2 --output output/MELD_concepts_readv2.csv --error-log output/MELD_errors.csv`
## ❤️ Contributing ## ❤️ Contributing
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment