Skip to content
Snippets Groups Projects
Commit b66f3d89 authored by Jakub Dylag's avatar Jakub Dylag
Browse files

Updated README

parent b0e3147f
No related branches found
No related tags found
No related merge requests found
...@@ -14,7 +14,7 @@ ...@@ -14,7 +14,7 @@
<sup>*</sup>Correspondence to: Jakub J. Dylag, Digital Health and Biomedical Engineering, School of Electronics and Computer Science, Faculty of Engineering and Physical Sciences, University of Southampton, J.J.Dylag@soton.ac.uk <sup>*</sup>Correspondence to: Jakub J. Dylag, Digital Health and Biomedical Engineering, School of Electronics and Computer Science, Faculty of Engineering and Physical Sciences, University of Southampton, J.J.Dylag@soton.ac.uk
### 🖋 How to cite this work ### 🖋 How to cite this work
> Dylag J. J., Chiovoloni R., Akbari A., Fraser S. D., Boniface M. J., A Tool for Automating the Curation of Medical Concepts derived from Coding Lists. May 2024. https://git.soton.ac.uk/meld/meldb/concepts-processing > Dylag JJ, Chiovoloni R, Akbari A, Fraser SD, Boniface MJ. A Tool for Automating the Curation of Medical Concepts derived from Coding Lists. GitLab [Internet]. May 2024. Available from: https://git.soton.ac.uk/meld/meldb/concepts-processing
## 🙌 Introduction ## 🙌 Introduction
This project generate the medical coding lists that defines cohort phenotypes used for inclusion criteria in MELD-B. The goal is to automatically prepare a code list from an approved clinical specification of inclusion criteria. This project generate the medical coding lists that defines cohort phenotypes used for inclusion criteria in MELD-B. The goal is to automatically prepare a code list from an approved clinical specification of inclusion criteria.
...@@ -24,18 +24,14 @@ The output code list is then used by data providers to select MELD-B cohorts. ...@@ -24,18 +24,14 @@ The output code list is then used by data providers to select MELD-B cohorts.
## 📃 Method ## 📃 Method
### Process ### Process
1. MELB-B conditions are defined in a Excel spreadsheet (currently CONC_summary_working.xlsx). 1. Approved MELB-B concepts are defined in a Excel spreadsheet (currently CONC_summary_working.xlsx).
2. 2. Imported Code Lists in `/codes` are verified against all NHS TRUD registered codes
2. Each sheet in the file includes a mapping from a MELD-B condition to source MLTC condition that is then associated with a diagnostic code list 3. Mappings from Imported Code Lists to Outputted MELD-B Concept's Code list are defined in JSON format within `PHEN_assign_v3.json`.
3. Each sheet is processed to create a master code list - See "JSON Phenotype Mapping" section for more details
* READ_CODE: full 7 character read code id 4. Process is executed from command line either manually or from bash script `run.sh`
* CPRD_GOLD_MEDICAL_CODE_ID: CPRD GOLD medical code id - See "Usage" section for more details
* CPRD_AURUM_MEDICAL_CODE_ID: CPRD AURUM medical code id 5. Output Concept Code Lists are saved to the `/concepts` git repository and any changes are tracked.
* DESCRIPTION: diagnosis description 6. Output Concept Code Lists can be exported into SAIL or any other Data Bank
* MELDB_CONDITION: meld b multimorbidity label
* DATABASE: list of databases mapped
* SOURCE: list of sources mapped
* any other meta data columns (including descriptions etc)
### Medical Coding Standards Supported ### Medical Coding Standards Supported
| Code Type | Verification | Maps to | | Code Type | Verification | Maps to |
...@@ -51,11 +47,11 @@ The output code list is then used by data providers to select MELD-B cohorts. ...@@ -51,11 +47,11 @@ The output code list is then used by data providers to select MELD-B cohorts.
MELD-B refers to various diagnostic code formats included in target datasets. MELD-B refers to various diagnostic code formats included in target datasets.
* Read V2 * Read V2
* Read codes were used widely in primary care but were replaced by SNOMED-CT from around 2018 https://isd.digital.nhs.uk/trud/user/guest/group/0/pack/9 * Read codes were used widely in primary care but were replaced by SNOMED-CT from around 2018 https://isd.digital.nhs.uk/trud/user/guest/group/0/pack/9
* SAIL only supports five character read codes V2 * SAIL only supports five character read codes V2
* SNOMED-CT was adopted by the NHS around 2018 * SNOMED-CT was adopted by the NHS around 2018
* CPRD AURUM uses SNOWMED codes and include mapping to read codes but no other database (CPRD Gold, SAIL) does. * CPRD AURUM uses SNOWMED codes and include mapping to read codes but no other database (CPRD Gold, SAIL) does.
* Mappings exist from SNOWMED to Read codes, some provided by CPRD and others NHS Trud * Mappings exist from SNOWMED to Read codes, some provided by CPRD and others NHS Trud
* ICD-10 are codes used in hospital settings and are importnat for the HES linked datasets. * ICD-10 are codes used in hospital settings and are importnat for the HES linked datasets.
* ATC codes are interntionally accepted for the classification of medicinces and maintained by the WHO. * ATC codes are interntionally accepted for the classification of medicinces and maintained by the WHO.
...@@ -65,12 +61,12 @@ MELD-B refers to various diagnostic code formats included in target datasets. ...@@ -65,12 +61,12 @@ MELD-B refers to various diagnostic code formats included in target datasets.
MELD-B has defined a set of phenotypes for MLTC conditions that are considered burdensome. Each Phenotype includes one or more diagnosis. MELD-B has defined a set of phenotypes for MLTC conditions that are considered burdensome. Each Phenotype includes one or more diagnosis.
* Ho et al - https://cronfa.swan.ac.uk/Record/cronfa60877/Download/60877__25107__3700915cf20e418aae714e5639722449.pdf * Ho et al - https://cronfa.swan.ac.uk/Record/cronfa60877/Download/60877__25107__3700915cf20e418aae714e5639722449.pdf
* The diagnositc codes have been mapped by SAIL to by the ThinkingGroup https://github.com/THINKINGGroup/phenotypes which has been replicated by the RSF https://github.com/aim-rsf/phenotypes * The diagnositc codes have been mapped by SAIL to by the ThinkingGroup https://github.com/THINKINGGroup/phenotypes which has been replicated by the RSF https://github.com/aim-rsf/phenotypes
* Azcoaga-Lorenzo, A., Akbari, A., Davies, J., Khunti, K., Kadam, U.T., Lyons, R., McCowan, C., Mercer, S.W., Nirantharakumar, K., Staniszewska, S. and Guthrie, B., 2022. Measuring multimorbidity in research: Delphi consensus study. BMJ medicine, 1(1), p.e000247. * Azcoaga-Lorenzo, A., Akbari, A., Davies, J., Khunti, K., Kadam, U.T., Lyons, R., McCowan, C., Mercer, S.W., Nirantharakumar, K., Staniszewska, S. and Guthrie, B., 2022. Measuring multimorbidity in research: Delphi consensus study. BMJ medicine, 1(1), p.e000247.
* Hanlon et al - https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8901063/#pmed.1003931.s005 * Hanlon et al - https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8901063/#pmed.1003931.s005
* https://github.com/dmcalli2/dynamic_protocols/blob/master/defining_comorbidities_SAIL.md?? * https://github.com/dmcalli2/dynamic_protocols/blob/master/defining_comorbidities_SAIL.md??
* Hanlon, P., Jani, B.D., Nicholl, B., Lewsey, J., McAllister, D.A. and Mair, F.S., 2022. Associations between multimorbidity and adverse health outcomes in UK Biobank and the SAIL Databank: A comparison of longitudinal cohort studies. PLoS Medicine, 19(3), p.e1003931. * Hanlon, P., Jani, B.D., Nicholl, B., Lewsey, J., McAllister, D.A. and Mair, F.S., 2022. Associations between multimorbidity and adverse health outcomes in UK Biobank and the SAIL Databank: A comparison of longitudinal cohort studies. PLoS Medicine, 19(3), p.e1003931.
- ClinicalCodes Project, University of Manchester - https://clinicalcodes.rss.mhs.man.ac.uk/ - ClinicalCodes Project, University of Manchester - https://clinicalcodes.rss.mhs.man.ac.uk/
...@@ -80,7 +76,7 @@ MELD-B has defined a set of phenotypes for MLTC conditions that are considered b ...@@ -80,7 +76,7 @@ MELD-B has defined a set of phenotypes for MLTC conditions that are considered b
- Gilbert et al (for Frailty Secondary Care) - https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5946808/ - Gilbert et al (for Frailty Secondary Care) - https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5946808/
- Gilbert T, Neuburger J, Kraindler J, Keeble E, Smith P, Ariti C, Arora S, Street A, Parker S, Roberts HC, Bardsley M, Conroy S. Development and validation of a Hospital Frailty Risk Score focusing on older people in acute care settings using electronic hospital records: an observational study. Lancet. 2018 May 5;391(10132):1775-1782. doi: 10.1016/S0140-6736(18)30668-8. Epub 2018 Apr 26. PMID: 29706364; PMCID: PMC5946808. - Gilbert T, Neuburger J, Kraindler J, Keeble E, Smith P, Ariti C, Arora S, Street A, Parker S, Roberts HC, Bardsley M, Conroy S. Development and validation of a Hospital Frailty Risk Score focusing on older people in acute care settings using electronic hospital records: an observational study. Lancet. 2018 May 5;391(10132):1775-1782. doi: 10.1016/S0140-6736(18)30668-8. Epub 2018 Apr 26. PMID: 29706364; PMCID: PMC5946808.
- Abbasizanjani et al (for Long Covid) https://www.adruk.org/news-publications/publications-reports/data-insight-clinical-coding-and-capture-of-long-covid-a-cohort-study-in-wales-using-linked-health-and-demographic-data-824/ - Abbasizanjani et al (for Long Covid) https://www.adruk.org/news-publications/publications-reports/data-insight-clinical-coding-and-capture-of-long-covid-a-cohort-study-in-wales-using-linked-health-and-demographic-data-824
- Abbasizanjani H, Bedston S, Robinson L, Curds M, Akbari A. Clinical coding and capture of Long COVID: a cohort study in Wales using linked health and demographic data. ADR Wales Data Insight. August 2023. https://adrwales.org/wp-content/uploads/2023/08/Clinical-coding-and-capture-of-Long-COVID.pdf - Abbasizanjani H, Bedston S, Robinson L, Curds M, Akbari A. Clinical coding and capture of Long COVID: a cohort study in Wales using linked health and demographic data. ADR Wales Data Insight. August 2023. https://adrwales.org/wp-content/uploads/2023/08/Clinical-coding-and-capture-of-Long-COVID.pdf
...@@ -112,29 +108,76 @@ MELD-B uses drug codes as a proxy indicator of burden. This codes are derived fr ...@@ -112,29 +108,76 @@ MELD-B uses drug codes as a proxy indicator of burden. This codes are derived fr
## ⚙️ Setup ## ⚙️ Setup
1. Delete corrupted files: `bash import.sh` - Delete corrupted files that cannot be read with `bash import.sh`
### Code Translationg Tables ### Code Translation Tables
1. Due to the licencing of NHS TRUD coding tables, the following resources <mark>must be downloaded separately</mark>:
- [nhs_readbrowser_25.0.0_20180401000001](https://isd.digital.nhs.uk/trud/users/guest/filters/2/categories/9/items/8/releases)
- [nhs_datamigration_29.0.0_20200401000001](https://isd.digital.nhs.uk/trud/users/guest/filters/0/categories/9/items/9/releases)
- [ICD10_Edition5_XML_20160401](https://isd.digital.nhs.uk/trud/users/guest/filters/0/categories/28/items/258/releases?source=summary)
- [OPCS-4.10 Data files](https://isd.digital.nhs.uk/trud/users/guest/filters/0/categories/10/items/119/releases)
- [BNF/Snomed Mapping data.xlsx](https://www.nhsbsa.nhs.uk/prescription-data/understanding-our-data/bnf-snomed-mapping)
Due to the licencing of NHS TRUD coding tables, the following 2. Next, prepare the convertion Tables by saving them as `.parquet` tables.
2. Update Convertion Tables:
- See "Mappings" section in process_codes_WP.ipynb to generate table with appropriate name - See "Mappings" section in process_codes_WP.ipynb to generate table with appropriate name
- For reversible convertions create a duplicate table with the name reversed - For reversible convertions create a duplicate table with the name reversed. However be aware this is <b>NOT ADVISED</b> and goes against NHS guidance.
### JSON phenotype mapping ### JSON phenotype mapping
3. Update JSON Codes List Mappings from Imported Code Lists to Outputted MELD-B Concept's Code list are defined in JSON format within `PHEN_assign_v3.json`.
- Manually Edit the PHEN_asssign_v2.json
- Use "Ho generate JSON" section in process_codes_WP.ipynb to generate JSON for Ho #### Defining the Strucutre for Folders and Files:
- Cases which require additional preprocessing ```
<!-- - Large Table with sub-categorical column "folder":"codes/Medication code source",
- Need to split table by categorical column "description":"Medication Codes - downloaded 15/12/23",
- Then read each categorical file individually "files": [
- USE "divide_col" action {
- Table with multiple code types in single column "file":"WP02_SAIL_WILK_matched_drug_codes_with_categories.xlsx"
- Need to split column into multiple columns, so only one code type per column }
- USE "split_col" action --> ]
```
#### Define Column Code Types
```
"columns":{
"read2_code":"READCODE",
"metadata":["DESCRIPTION"]
},
```
#### Define Concepts to be mapped to
```
"meldb_phenotypes": ["ALL_MEDICATIONS"]
```
#### Actions: Additional preprocessing (if required):
- In certain cases where you wish to sub-divde a code list table or a column features multiple code types additional processing is required. Add a `action` object inside of the `file` object.
- Table with a sub-categorical column:
- In order to sub-divide a table by a categorical column use the "divide_col" action
- e.g. ``` "actions":{"divide_col": "MMCode"}```
- Table with multiple code types in single column:
- Need to split column into multiple columns, so only one code type per column.
- The "split_col" attribute is the categorical column indicating the code type in that row. The <b>category names should replace column</b> names in the `columns` properties.
- The "codes_col" attribute is the code column with mulitple code types in a single column
- e.g.
```
"actions":{
"split_col":"coding_system",
"codes_col":"code"
},
"columns":{
"read2_code":"Read codes v2",
"med_code":"Med codes",
"icd10_code":"ICD10 codes",
"metadata":["description"]
},
```
*<b>Large Code lists</b> with numerous phenotypes (e.g. Ho et al), require lots of JSON to be generated. See the "Ho generate JSON" section in process_codes_WP.ipynb for example code to generate
## ⚡ Usage ## ⚡ Usage
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment