Skip to content
Snippets Groups Projects
Commit b66f3d89 authored by Jakub Dylag's avatar Jakub Dylag
Browse files

Updated README

parent b0e3147f
No related branches found
No related tags found
No related merge requests found
......@@ -14,7 +14,7 @@
<sup>*</sup>Correspondence to: Jakub J. Dylag, Digital Health and Biomedical Engineering, School of Electronics and Computer Science, Faculty of Engineering and Physical Sciences, University of Southampton, J.J.Dylag@soton.ac.uk
### 🖋 How to cite this work
> Dylag J. J., Chiovoloni R., Akbari A., Fraser S. D., Boniface M. J., A Tool for Automating the Curation of Medical Concepts derived from Coding Lists. May 2024. https://git.soton.ac.uk/meld/meldb/concepts-processing
> Dylag JJ, Chiovoloni R, Akbari A, Fraser SD, Boniface MJ. A Tool for Automating the Curation of Medical Concepts derived from Coding Lists. GitLab [Internet]. May 2024. Available from: https://git.soton.ac.uk/meld/meldb/concepts-processing
## 🙌 Introduction
This project generate the medical coding lists that defines cohort phenotypes used for inclusion criteria in MELD-B. The goal is to automatically prepare a code list from an approved clinical specification of inclusion criteria.
......@@ -24,18 +24,14 @@ The output code list is then used by data providers to select MELD-B cohorts.
## 📃 Method
### Process
1. MELB-B conditions are defined in a Excel spreadsheet (currently CONC_summary_working.xlsx).
2.
2. Each sheet in the file includes a mapping from a MELD-B condition to source MLTC condition that is then associated with a diagnostic code list
3. Each sheet is processed to create a master code list
* READ_CODE: full 7 character read code id
* CPRD_GOLD_MEDICAL_CODE_ID: CPRD GOLD medical code id
* CPRD_AURUM_MEDICAL_CODE_ID: CPRD AURUM medical code id
* DESCRIPTION: diagnosis description
* MELDB_CONDITION: meld b multimorbidity label
* DATABASE: list of databases mapped
* SOURCE: list of sources mapped
* any other meta data columns (including descriptions etc)
1. Approved MELB-B concepts are defined in a Excel spreadsheet (currently CONC_summary_working.xlsx).
2. Imported Code Lists in `/codes` are verified against all NHS TRUD registered codes
3. Mappings from Imported Code Lists to Outputted MELD-B Concept's Code list are defined in JSON format within `PHEN_assign_v3.json`.
- See "JSON Phenotype Mapping" section for more details
4. Process is executed from command line either manually or from bash script `run.sh`
- See "Usage" section for more details
5. Output Concept Code Lists are saved to the `/concepts` git repository and any changes are tracked.
6. Output Concept Code Lists can be exported into SAIL or any other Data Bank
### Medical Coding Standards Supported
| Code Type | Verification | Maps to |
......@@ -51,11 +47,11 @@ The output code list is then used by data providers to select MELD-B cohorts.
MELD-B refers to various diagnostic code formats included in target datasets.
* Read V2
* Read codes were used widely in primary care but were replaced by SNOMED-CT from around 2018 https://isd.digital.nhs.uk/trud/user/guest/group/0/pack/9
* SAIL only supports five character read codes V2
* Read codes were used widely in primary care but were replaced by SNOMED-CT from around 2018 https://isd.digital.nhs.uk/trud/user/guest/group/0/pack/9
* SAIL only supports five character read codes V2
* SNOMED-CT was adopted by the NHS around 2018
* CPRD AURUM uses SNOWMED codes and include mapping to read codes but no other database (CPRD Gold, SAIL) does.
* Mappings exist from SNOWMED to Read codes, some provided by CPRD and others NHS Trud
* CPRD AURUM uses SNOWMED codes and include mapping to read codes but no other database (CPRD Gold, SAIL) does.
* Mappings exist from SNOWMED to Read codes, some provided by CPRD and others NHS Trud
* ICD-10 are codes used in hospital settings and are importnat for the HES linked datasets.
* ATC codes are interntionally accepted for the classification of medicinces and maintained by the WHO.
......@@ -65,12 +61,12 @@ MELD-B refers to various diagnostic code formats included in target datasets.
MELD-B has defined a set of phenotypes for MLTC conditions that are considered burdensome. Each Phenotype includes one or more diagnosis.
* Ho et al - https://cronfa.swan.ac.uk/Record/cronfa60877/Download/60877__25107__3700915cf20e418aae714e5639722449.pdf
* The diagnositc codes have been mapped by SAIL to by the ThinkingGroup https://github.com/THINKINGGroup/phenotypes which has been replicated by the RSF https://github.com/aim-rsf/phenotypes
* Azcoaga-Lorenzo, A., Akbari, A., Davies, J., Khunti, K., Kadam, U.T., Lyons, R., McCowan, C., Mercer, S.W., Nirantharakumar, K., Staniszewska, S. and Guthrie, B., 2022. Measuring multimorbidity in research: Delphi consensus study. BMJ medicine, 1(1), p.e000247.
* The diagnositc codes have been mapped by SAIL to by the ThinkingGroup https://github.com/THINKINGGroup/phenotypes which has been replicated by the RSF https://github.com/aim-rsf/phenotypes
* Azcoaga-Lorenzo, A., Akbari, A., Davies, J., Khunti, K., Kadam, U.T., Lyons, R., McCowan, C., Mercer, S.W., Nirantharakumar, K., Staniszewska, S. and Guthrie, B., 2022. Measuring multimorbidity in research: Delphi consensus study. BMJ medicine, 1(1), p.e000247.
* Hanlon et al - https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8901063/#pmed.1003931.s005
* https://github.com/dmcalli2/dynamic_protocols/blob/master/defining_comorbidities_SAIL.md??
* Hanlon, P., Jani, B.D., Nicholl, B., Lewsey, J., McAllister, D.A. and Mair, F.S., 2022. Associations between multimorbidity and adverse health outcomes in UK Biobank and the SAIL Databank: A comparison of longitudinal cohort studies. PLoS Medicine, 19(3), p.e1003931.
* https://github.com/dmcalli2/dynamic_protocols/blob/master/defining_comorbidities_SAIL.md??
* Hanlon, P., Jani, B.D., Nicholl, B., Lewsey, J., McAllister, D.A. and Mair, F.S., 2022. Associations between multimorbidity and adverse health outcomes in UK Biobank and the SAIL Databank: A comparison of longitudinal cohort studies. PLoS Medicine, 19(3), p.e1003931.
- ClinicalCodes Project, University of Manchester - https://clinicalcodes.rss.mhs.man.ac.uk/
......@@ -80,7 +76,7 @@ MELD-B has defined a set of phenotypes for MLTC conditions that are considered b
- Gilbert et al (for Frailty Secondary Care) - https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5946808/
- Gilbert T, Neuburger J, Kraindler J, Keeble E, Smith P, Ariti C, Arora S, Street A, Parker S, Roberts HC, Bardsley M, Conroy S. Development and validation of a Hospital Frailty Risk Score focusing on older people in acute care settings using electronic hospital records: an observational study. Lancet. 2018 May 5;391(10132):1775-1782. doi: 10.1016/S0140-6736(18)30668-8. Epub 2018 Apr 26. PMID: 29706364; PMCID: PMC5946808.
- Abbasizanjani et al (for Long Covid) https://www.adruk.org/news-publications/publications-reports/data-insight-clinical-coding-and-capture-of-long-covid-a-cohort-study-in-wales-using-linked-health-and-demographic-data-824/
- Abbasizanjani et al (for Long Covid) https://www.adruk.org/news-publications/publications-reports/data-insight-clinical-coding-and-capture-of-long-covid-a-cohort-study-in-wales-using-linked-health-and-demographic-data-824
- Abbasizanjani H, Bedston S, Robinson L, Curds M, Akbari A. Clinical coding and capture of Long COVID: a cohort study in Wales using linked health and demographic data. ADR Wales Data Insight. August 2023. https://adrwales.org/wp-content/uploads/2023/08/Clinical-coding-and-capture-of-Long-COVID.pdf
......@@ -112,29 +108,76 @@ MELD-B uses drug codes as a proxy indicator of burden. This codes are derived fr
## ⚙️ Setup
1. Delete corrupted files: `bash import.sh`
- Delete corrupted files that cannot be read with `bash import.sh`
### Code Translationg Tables
### Code Translation Tables
1. Due to the licencing of NHS TRUD coding tables, the following resources <mark>must be downloaded separately</mark>:
- [nhs_readbrowser_25.0.0_20180401000001](https://isd.digital.nhs.uk/trud/users/guest/filters/2/categories/9/items/8/releases)
- [nhs_datamigration_29.0.0_20200401000001](https://isd.digital.nhs.uk/trud/users/guest/filters/0/categories/9/items/9/releases)
- [ICD10_Edition5_XML_20160401](https://isd.digital.nhs.uk/trud/users/guest/filters/0/categories/28/items/258/releases?source=summary)
- [OPCS-4.10 Data files](https://isd.digital.nhs.uk/trud/users/guest/filters/0/categories/10/items/119/releases)
- [BNF/Snomed Mapping data.xlsx](https://www.nhsbsa.nhs.uk/prescription-data/understanding-our-data/bnf-snomed-mapping)
Due to the licencing of NHS TRUD coding tables, the following
2. Update Convertion Tables:
2. Next, prepare the convertion Tables by saving them as `.parquet` tables.
- See "Mappings" section in process_codes_WP.ipynb to generate table with appropriate name
- For reversible convertions create a duplicate table with the name reversed
- For reversible convertions create a duplicate table with the name reversed. However be aware this is <b>NOT ADVISED</b> and goes against NHS guidance.
### JSON phenotype mapping
3. Update JSON Codes List
- Manually Edit the PHEN_asssign_v2.json
- Use "Ho generate JSON" section in process_codes_WP.ipynb to generate JSON for Ho
- Cases which require additional preprocessing
<!-- - Large Table with sub-categorical column
- Need to split table by categorical column
- Then read each categorical file individually
- USE "divide_col" action
- Table with multiple code types in single column
- Need to split column into multiple columns, so only one code type per column
- USE "split_col" action -->
Mappings from Imported Code Lists to Outputted MELD-B Concept's Code list are defined in JSON format within `PHEN_assign_v3.json`.
#### Defining the Strucutre for Folders and Files:
```
"folder":"codes/Medication code source",
"description":"Medication Codes - downloaded 15/12/23",
"files": [
{
"file":"WP02_SAIL_WILK_matched_drug_codes_with_categories.xlsx"
}
]
```
#### Define Column Code Types
```
"columns":{
"read2_code":"READCODE",
"metadata":["DESCRIPTION"]
},
```
#### Define Concepts to be mapped to
```
"meldb_phenotypes": ["ALL_MEDICATIONS"]
```
#### Actions: Additional preprocessing (if required):
- In certain cases where you wish to sub-divde a code list table or a column features multiple code types additional processing is required. Add a `action` object inside of the `file` object.
- Table with a sub-categorical column:
- In order to sub-divide a table by a categorical column use the "divide_col" action
- e.g. ``` "actions":{"divide_col": "MMCode"}```
- Table with multiple code types in single column:
- Need to split column into multiple columns, so only one code type per column.
- The "split_col" attribute is the categorical column indicating the code type in that row. The <b>category names should replace column</b> names in the `columns` properties.
- The "codes_col" attribute is the code column with mulitple code types in a single column
- e.g.
```
"actions":{
"split_col":"coding_system",
"codes_col":"code"
},
"columns":{
"read2_code":"Read codes v2",
"med_code":"Med codes",
"icd10_code":"ICD10 codes",
"metadata":["description"]
},
```
*<b>Large Code lists</b> with numerous phenotypes (e.g. Ho et al), require lots of JSON to be generated. See the "Ho generate JSON" section in process_codes_WP.ipynb for example code to generate
## ⚡ Usage
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment