Skip to content
Snippets Groups Projects
Commit 3876e7cd authored by Jakub Dylag's avatar Jakub Dylag
Browse files

Re-write README

parent 5b98129c
No related branches found
No related tags found
No related merge requests found
...@@ -10,31 +10,32 @@ ...@@ -10,31 +10,32 @@
<sup>1</sup> Digital Health and Biomedical Engineering, School of Electronics and Computer Science, Faculty of Engineering and Physical Sciences, University of Southampton<br> <sup>1</sup> Digital Health and Biomedical Engineering, School of Electronics and Computer Science, Faculty of Engineering and Physical Sciences, University of Southampton<br>
<sup>2</sup> School of Primary Care Population Sciences and Medical Education, University of Southampton <br> <sup>2</sup> School of Primary Care Population Sciences and Medical Education, University of Southampton <br>
<sup>3</sup> Population Data Science, Swansea University Medical School, Faculty of Medicine, Health & Life Science, Swansea University <br> <sup>3</sup> Population Data Science, Swansea University Medical School, Faculty of Medicine, Health & Life Science, Swansea University <br>
<br>
<sup>*</sup>Correspondence to: Jakub J. Dylag, Digital Health and Biomedical Engineering, School of Electronics and Computer Science, Faculty of Engineering and Physical Sciences, University of Southampton, J.J.Dylag@soton.ac.uk
### 🖋 How to cite this work *Correspondence to: Jakub J. Dylag, Digital Health and Biomedical Engineering, School of Electronics and Computer Science, Faculty of Engineering and Physical Sciences, University of Southampton, J.J.Dylag@soton.ac.uk*
### Citation
> Dylag JJ, Chiovoloni R, Akbari A, Fraser SD, Boniface MJ. A Tool for Automating the Curation of Medical Concepts derived from Coding Lists. GitLab [Internet]. May 2024. Available from: https://git.soton.ac.uk/meldb/concepts-processing > Dylag JJ, Chiovoloni R, Akbari A, Fraser SD, Boniface MJ. A Tool for Automating the Curation of Medical Concepts derived from Coding Lists. GitLab [Internet]. May 2024. Available from: https://git.soton.ac.uk/meldb/concepts-processing
## 🙌 Introduction
This project generate the medical coding lists that defines cohort phenotypes used for inclusion criteria in MELD-B. The goal is to automatically prepare a code list from an approved clinical specification of inclusion criteria.
The output code list is then used by data providers to select MELD-B cohorts. ## Introduction
This tool automates the verification, translation and organisation of medical coding lists defining cohort phenotypes for inclusion criteria. By processing externally sourced clinical inclusion criteria into actionable code lists, this tool ensures consistent and efficient curation of cohort definitions. These code lists can be subsequently used by data providers (e.g. SAIL) to construct study cohorts.
## 📃 Method ## Methods
### Process ### Workflow Overview
1. Approved MELB-B concepts are defined in a CSV spreadsheet (currently PHEN_summary_working.csv). 1. Approved MELD-B concepts are outlined in a CSV spreadsheet (e.g., `PHEN_summary_working.csv`).
2. Imported Code Lists in `/src` are verified against all NHS TRUD registered codes 2. Imported code lists in the `/src` directory are validated against NHS TRUD-registered codes.
3. Mappings from Imported Code Lists to Outputted MELD-B Concept's Code list are defined in JSON format within `PHEN_assign_v3.json`. 3. Mappings from imported code lists to outputted MELD-B concepts are defined in the `PHEN_assign_v3.json` file.
- See "JSON Phenotype Mapping" section for more details - See "JSON Phenotype Mapping" section for more details
4. Process is executed from command line either manually or from bash script `run.sh` 4. The process is executed via command-line. Refer to the "Usage" section for execution instructions.
- See "Usage" section for more details 5. Outputted concept code lists are saved to the `/concepts` Git repository, with all changes tracked.
5. Output Concept Code Lists are saved to the `/concepts` git repository and any changes are tracked. 6. The code lists can be exported to SAIL or any other Data Bank.
6. Output Concept Code Lists can be exported into SAIL or any other Data Bank
### Medical Coding Standards Supported ### Supported Medical Coding Standards
| Code Type | Verification | Maps to | The tool supports verification and mapping across various diagnostic coding formats:
| Medical Code | Verification | Translation to |
|---------------|--------------|-----------------------------------| |---------------|--------------|-----------------------------------|
| Readv2 | NHS TRUD | Readv3, SNOMED, ICD10, OPCS4, ATC | | Readv2 | NHS TRUD | Readv3, SNOMED, ICD10, OPCS4, ATC |
| Readv3 (CTV3) | NHS TRUD | Readv3, SNOMED, ICD10, OPCS4 | | Readv3 (CTV3) | NHS TRUD | Readv3, SNOMED, ICD10, OPCS4 |
...@@ -43,53 +44,51 @@ The output code list is then used by data providers to select MELD-B cohorts. ...@@ -43,53 +44,51 @@ The output code list is then used by data providers to select MELD-B cohorts.
| OPCS4 | NHS TRUD | | | OPCS4 | NHS TRUD | |
| ATC | None | | | ATC | None | |
MELD-B refers to various diagnostic code formats included in target datasets. #### Notes on Code Systems:
* Read V2 - **Read V2:** Replaced by SNOMED-CT in 2018, but still supported by SAIL (restricted to five-character codes).
* Read codes were used widely in primary care but were replaced by SNOMED-CT from around 2018 https://isd.digital.nhs.uk/trud/user/guest/group/0/pack/9 - **SNOMED-CT:** Adopted widely by the NHS in 2018; mappings to Read codes are partially provided by CPRD and NHS TRUD.
* SAIL only supports five character read codes V2 - **ICD-10:** Widely used in hospital settings and critical for HES-linked datasets.
* SNOMED-CT was adopted by the NHS around 2018 - **ATC Codes:** Maintained by WHO and used internationally for medication classification.
* CPRD AURUM uses SNOWMED codes and include mapping to read codes but no other database (CPRD Gold, SAIL) does.
* Mappings exist from SNOWMED to Read codes, some provided by CPRD and others NHS Trud ## Installation
* ICD-10 are codes used in hospital settings and are importnat for the HES linked datasets.
* ATC codes are interntionally accepted for the classification of medicinces and maintained by the WHO. 1. **Sign Up:** Register at [NHS TRUD](https://isd.digital.nhs.uk/trud/user/guest/group/0/account/form) and accept the following licenses:
- [NHS Read Browser](https://isd.digital.nhs.uk/trud/users/guest/filters/2/categories/9/items/8/releases)
## ⚙️ Setup - [NHS Data Migration](https://isd.digital.nhs.uk/trud/users/guest/filters/0/categories/9/items/9/releases)
- [ICD10 Edition 5 XML](https://isd.digital.nhs.uk/trud/users/authenticated/filters/0/categories/28/items/259/releases)
### Code Translation Tables - [OPCS-4.10 Data Files](https://isd.digital.nhs.uk/trud/users/guest/filters/0/categories/10/items/119/releases)
1. Due to the licencing of NHS TRUD resources, you <mark>MUST first [Sign Up](https://isd.digital.nhs.uk/trud/user/guest/filters/0/account/form) to NHS TRUD and accept the following licences</mark>: <!-- - [BNF/Snomed Mapping data.xlsx](https://www.nhsbsa.nhs.uk/prescription-data/understanding-our-data/bnf-snomed-mapping) -->
- [nhs_readbrowser_25.0.0_20180401000001](https://isd.digital.nhs.uk/trud/users/guest/filters/2/categories/9/items/8/releases)
- [nhs_datamigration_29.0.0_20200401000001](https://isd.digital.nhs.uk/trud/users/guest/filters/0/categories/9/items/9/releases)
- [ICD10_Edition5_XML_20160401](https://isd.digital.nhs.uk/trud/users/authenticated/filters/0/categories/28/items/259/releases)
- [OPCS-4.10 Data files](https://isd.digital.nhs.uk/trud/users/guest/filters/0/categories/10/items/119/releases)
<!-- - [BNF/Snomed Mapping data.xlsx](https://www.nhsbsa.nhs.uk/prescription-data/understanding-our-data/bnf-snomed-mapping) -->
2. Once all licences are accepted, get your [API Key](https://isd.digital.nhs.uk/trud/users/authenticated/filters/0/account/manage) for NHS TRUD. 2. **Obtain API Key:** Retrieve your API key from [NHS TRUD Account Management](https://isd.digital.nhs.uk/trud/users/authenticated/filters/0/account/manage).
3. Finally, run the automated extraction script, inputting your API Key to granty temporary access to the resources above. Use the command `python trud_api.py --key <INSERT KEY>` (replacing your key in the marked area). 3. **Install TRUD:** Download and Install NHS TRUD medical code resources.
- The convertion Tables will be saved as `.parquet` tables in the folder `maps/processed/`. Executing the script using the command: `python trud_api.py --key <API_KEY>`.
- NHS TRUD defines one-way mappings and does <b>NOT ADVISE</b> reversing the mappings. If you still wish to reverse these into two-way mappings, duplicate the given `.parquet` table and reverse the filename (e.g. `read2_code_to_snomed_code.parquet` to `snomed_code_to_read2_code.parquet`) Processed tables will be saved as `.parquet` files in the `maps/processed/` directory.
- *Note: NHS TRUD defines one-way mappings and does <b>NOT ADVISE</b> reversing the mappings. If you still wish to reverse these into two-way mappings, duplicate the given `.parquet` table and reverse the filename (e.g. `read2_code_to_snomed_code.parquet` to `snomed_code_to_read2_code.parquet`)*
4. Populate the SQLite3 database with OMOP Vocabularies. These can be download from https://athena.ohdsi.org/vocabulary/list.
- Install the following vocabularies by ticking the box: 4. ***Optional: Install OMOP Database:** Download and install OMOP vocabularies from [Athena OHDSI](https://athena.ohdsi.org/vocabulary/list).
- 1-SNOMED - Required vocabularies include:
- 2-ICD9CM - 1) SNOMED
- 17-Readv2 - 2) ICD9CM
- 21-ATC - 17) Readv2
- 55-OPCS4 - 21) ATC
- 57-HES Specialty - 55) OPCS4
- 70-ICD10CM - 57) HES Specialty
- 75-dm+d - 70) ICD10CM
- 144-UK Biobank - 75) dm+d
- 154-NHS Ethnic Category - 144) UK Biobank
- 155-NHS Place of Service - 154) NHS Ethnic Category
- Use the command `python omop_api.py --install <INSERT PATH>` to load vocabularies into database (insert your own path to unzipped download folder). - 155) NHS Place of Service
- Un-zip the downloaded folder and copy it's path.
### JSON phenotype mapping - Install vocabularies using:
`python omop_api.py --install <PATH_TO_DOWNLOADED_FILES>`
Mappings from Imported Code Lists to Outputted MELD-B Concept's Code list are defined in JSON format within `PHEN_assign_v3.json`.
## Configuration
#### Defining the Strucutre for Folders and Files:
``` The mappings from imported code lists to outputted MELD-B concept code lists are defined in JSON format in `PHEN_assign_v3.json`.
### Folder and File Definitions:
```json
"folder":"codes/Medication code source", "folder":"codes/Medication code source",
"description":"Medication Codes - downloaded 15/12/23", "description":"Medication Codes - downloaded 15/12/23",
"files": [ "files": [
...@@ -99,78 +98,79 @@ Mappings from Imported Code Lists to Outputted MELD-B Concept's Code list are de ...@@ -99,78 +98,79 @@ Mappings from Imported Code Lists to Outputted MELD-B Concept's Code list are de
] ]
``` ```
#### Define Column Code Types ### Columns in Files:
``` ```json
"columns":{ "columns":{
"read2_code":"READCODE", "read2_code":"READCODE",
"metadata":["DESCRIPTION"] "metadata":["DESCRIPTION"]
}, },
``` ```
#### Define Concepts to be mapped to ### Concept Set Assigment
``` ```json
"meldb_phenotypes": ["ALL_MEDICATIONS"] "meldb_phenotypes": ["ALL_MEDICATIONS"]
``` ```
#### Actions: Additional preprocessing (if required): ### Additional preprocessing (if required):
- In certain cases where you wish to sub-divde a code list table or a column features multiple code types additional processing is required. Add a `action` object inside of the `file` object. In certain cases where you wish to sub-divde a code list table or a column features multiple code types additional processing is required. Add a `action` object inside of the `file` object.
- Table with a sub-categorical column: #### Table with a sub-categorical column:
- In order to sub-divide a table by a categorical column use the "divide_col" action In order to sub-divide a table by a categorical column use the "divide_col" action
- e.g. ``` "actions":{"divide_col": "MMCode"}``` ```json
"actions":{
- Table with multiple code types in single column: "divide_col": "MMCode"
- Need to split column into multiple columns, so only one code type per column. }
- The "split_col" attribute is the categorical column indicating the code type in that row. The <b>category names should replace column</b> names in the `columns` properties. ```
- The "codes_col" attribute is the code column with mulitple code types in a single column
- e.g.
```
"actions":{
"split_col":"coding_system",
"codes_col":"code"
},
"columns":{
"read2_code":"Read codes v2",
"med_code":"Med codes",
"icd10_code":"ICD10 codes",
"metadata":["description"]
},
```
#### Table with multiple code types in single column:
Need to split column into multiple columns, so only one code type per column.
- The "split_col" attribute is the categorical column indicating the code type in that row. The <b>category names should replace column</b> names in the `columns` properties.
- The "codes_col" attribute is the code column with mulitple code types in a single column
```json
"actions":{
"split_col":"coding_system",
"codes_col":"code"
},
"columns":{
"read2_code":"Read codes v2",
"med_code":"Med codes",
"icd10_code":"ICD10 codes",
"metadata":["description"]
},
```
*<b>Large Code lists</b> with numerous phenotypes (e.g. Ho et al), require lots of JSON to be generated. See the "Ho generate JSON" section in process_codes_WP.ipynb for example code to generate **<b>Large Code lists</b> with numerous phenotypes (e.g. Ho et al), require lots of JSON to be generated. See the "Ho generate JSON" section in process_codes_WP.ipynb for example code to generate*
## Usage ## Usage
Script preprocess code lists and to map to given concept/phenotype Script preprocess code lists and to map to given concept/phenotype
### Execution (Bash Script) ### Execute Command Line
`bash ./run.sh` Execute via shell with customizable parameters:
```bash
### Execution (Shell Command) python main.py [OPTIONS] mapping_file
usage: `python main.py [-h] [-r2] [-r3] [-i] [-s] [-o] [-a] [-m] [-c] [--no-translate] [--no-verify] [--output] [--error-log] mapping_file` ```
usage: `python main.py [-h] [-r2] [-r3] [-i] [-s] [-o] [-a] [--no-translate] [--no-verify] [--output] [--error-log] mapping_file`
positional arguments: **Required Arguments:**
- `mapping_file` Concept/Phenotype Assignment File (json) - `mapping_file` Concept/Phenotype Assignment File (json)
- `--output` Filepath to save output to CSV or OMOP SQLite Database
optional arguments: **Options Arguments:**
- `-r2`, `--read2-code` Read V2 Codes Column name in Source File - `-r2`, `--read2-code` Read V2 Codes Column name in Source File
- `-r3`, `--read3-code` Read V3 Codes Column name in Source File - `-r3`, `--read3-code` Read V3 Codes Column name in Source File
- `-i`, `--icd10-code` ICD10 Codes Column name in Source File - `-i`, `--icd10-code` ICD10 Codes Column name in Source File
- `-s`, `--snomed-code` SNOMED Codes Column name in Source File - `-s`, `--snomed-code` SNOMED Codes Column name in Source File
- `-o`, `--opcs4-code` OPCS4 Codes Column name in Source File - `-o`, `--opcs4-code` OPCS4 Codes Column name in Source File
- `-a`, `--atc-code` ATC Codes Column name in Source File - `-a`, `--atc-code` ATC Codes Column name in Source File
- `-m`, `--med-code` Med Codes Column name in Source File
- `-c`, `--cprd-code` CPRD Product Codes Column name in Source File
- `--no-translate` Do not translate code types - `--no-translate` Do not translate code types
- `--no-verify` Do not verify codes are correct - `--no-verify` Do not verify codes are correct
- `--output` Filepath to save output to
- `--error-log` Filepath to save error log to - `--error-log` Filepath to save error log to
> **_EXAMPLE:_** `python main.py PHEN_assign_v3.json -r2 --output output/MELD_concepts_readv2.csv --error-log output/MELD_errors.csv` > **_EXAMPLE:_** `python main.py PHEN_assign_v3.json -r2 --output output/MELD_concepts_readv2.csv --error-log output/MELD_errors.csv`
## ❤️ Contributing ## Contributing
### Commit to GitLab ### Commit to GitLab
``` ```
...@@ -180,12 +180,11 @@ git tag -a v1.0.0 -m "added features ..." ...@@ -180,12 +180,11 @@ git tag -a v1.0.0 -m "added features ..."
git push git push
``` ```
## 🏦 Funding ## Acknowledgements
This project has received funding from the National Institute of Health Research under grant agreement NIHR203988. This project was developed in the context of the [MELD-B](https://www.southampton.ac.uk/publicpolicy/support-for-policymakers/policy-projects/Current%20projects/meld-b.page) project, which is funded by the UK [National Institute of Health Research](https://www.nihr.ac.uk/) under grant agreement NIHR203988.
<img src="img/nihr-logo-1200-375.jpg" height="100" /> ## License
This work is licensed under a [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0).
## ⚖️ License
![apache2](https://img.shields.io/github/license/saltstack/salt) ![apache2](https://img.shields.io/github/license/saltstack/salt)
This work is licensed under a [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0).
\ No newline at end of file
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment