# A Tool for Automating the Curation of Medical Concepts derived from Coding Lists
# A Tool for Automating the Curation of Medical Concepts derived from Coding Lists (ACMC)
### Jakub J. Dylag <sup>1</sup>, Roberta Chiovoloni <sup>3</sup>, Ashley Akbari <sup>3</sup>, Simon D. Fraser <sup>2</sup>, Michael J. Boniface <sup>1</sup>
### Jakub J. Dylag <sup>1</sup>, Roberta Chiovoloni <sup>3</sup>, Ashley Akbari <sup>3</sup>, Simon D. Fraser <sup>2</sup>, Michael J. Boniface <sup>1</sup>
...
@@ -16,60 +16,72 @@
...
@@ -16,60 +16,72 @@
### Citation
### Citation
> Dylag JJ, Chiovoloni R, Akbari A, Fraser SD, Boniface MJ. A Tool for Automating the Curation of Medical Concepts derived from Coding Lists. GitLab [Internet]. May 2024. Available from: https://git.soton.ac.uk/meldb/concepts-processing
> Dylag JJ, Chiovoloni R, Akbari A, Fraser SD, Boniface MJ. A Tool for Automating the Curation of Medical Concepts derived from Coding Lists. GitLab [Internet]. May 2024. Available from: https://git.soton.ac.uk/meldb/concepts-processing
## Introduction
## Introduction
This tool automates the verification, translation and organisation of medical coding lists defining phenotypes for inclusion criteria in cohort analysis. By processing externally sourced clinical inclusion criteria into actionable code lists, this tool ensures consistent and efficient curation of cohort definitions. These code lists can be subsequently used by data providers to construct study cohorts.
This tool automates the verification, translation and organisation of medical coding lists defining cohort phenotypes for inclusion criteria. By processing externally sourced clinical inclusion criteria into actionable code lists, this tool ensures consistent and efficient curation of cohort definitions. These code lists can be subsequently used by data providers (e.g. SAIL) to construct study cohorts.
## Overview
## Methods
### Workflow
### Workflow Overview
The high level steps to use the tools are outlined below:
1. Approved concept sets are outlined in a CSV spreadsheet (e.g., `PHEN_summary_working.csv`).
2. Imported code lists in the `/src` directory are validated against NHS TRUD-registered codes.
**1. Define concept sets:** A domain expert defines a list of [concept sets](#concept-set-assigment) for each observable characteristic of the phenotype using CSV file format (e.g., `PHEN_concept_sets.csv`).
3. Mappings from imported code lists to outputted concept sets are defined in the `PHEN_assign_v3.json` file.
- See "JSON Phenotype Mapping" section for more details
**2. Define code lists for concept sets:** A domain expert defines [code lists](#???) for each concept set within the phenotype using supported coding list formats and stores them in the `/src` directory.
4. The process is executed via command-line. Refer to the "Usage" section for execution instructions.
5. Outputted concept set codes lists are saved to the `/concepts` Git repository, with all changes tracked.
**3. Define mapping from code lists to concept sets:** A domain expert defines a [phenotype mapping](#???) that maps code lists to concept sets in JSON file format (PHEN_assign_v3.json)
6. The code lists can be exported to SAIL or any other Data Bank.
**4. Generate versioned phenotype coding lists and translations:** A domain expert or data engineer processes the phenotype mappings [using the command line tool](#usage) to validate against NHS TRUD-registered codes and mapping and to generate versioned concept set code lists with translations between coding standards.
### Supported Medical Coding Standards
### Supported Medical Coding Standards
The tool supports verification and mapping across various diagnostic coding formats:
The tool supports verification and mapping across diagnostic coding formats below:
-[**Read V2:**](https://digital.nhs.uk/services/terminology-and-classifications/read-codes) NHS clinical terminology standard used in primary care and replaced by SNOMED-CT in 2018; Still supported by some data providers as widely used in primary care, e.g. [SAIL Databank](https://saildatabank.com/)
-**Read V2:** Replaced by SNOMED-CT in 2018, but still supported by SAIL (restricted to five-character codes).
-[**SNOMED-CT:**](https://icd.who.int/browse10/2019/en) international standard for clinical terminology for Electronic Healthcare Records adopted by the NHS in 2018; Mappings to Read codes are partially provided by [Clinical Research Practice Database (CPRD)](https://www.cprd.com/) and [NHS Technology Reference Update Distribution (TRUD)](https://isd.digital.nhs.uk/trud).
-**SNOMED-CT:** Adopted widely by the NHS in 2018; mappings to Read codes are partially provided by CPRD and NHS TRUD.
-[**ICD-10:**](https://icd.who.int/browse10/2019/en) International Classification of Diseases (ICD) is a medical classification list from the World Health Organization (WHO) and widely used in hospital settings, e.g. Hospital Episode Statistics (HES).
-**ICD-10:** Widely used in hospital settings and critical for HES-linked datasets.
-[**ATC Codes:**](https://www.who.int/tools/atc-ddd-toolkit/atc-classification) Anatomical Therapeutic Chemical (ATC) Classification is a drug classification list from the World Health Organization (WHO)
-**ATC Codes:** Maintained by WHO and used internationally for medication classification.
## Installation
## Installation
1.**Setup Conda Enviroment:** Download and Install Python Enviroment. Follow insturctions to install minicoda from [https://docs.conda.io/en/latest/miniconda.html](https://docs.conda.io/en/latest/miniconda.html).
- Run the following command to recreate the environment: `conda env create -f conda.yaml`.
- Activate the environment: `conda activate base`
2.**Sign Up:** Register at [NHS TRUD](https://isd.digital.nhs.uk/trud/user/guest/group/0/account/form) and accept the following licenses:
1.**Setup Conda Enviroment:** Download and Install Python Enviroment. Follow insturctions to install minicoda from [https://docs.conda.io/en/latest/miniconda.html](https://docs.conda.io/en/latest/miniconda.html).
- Run the following command to recreate the environment: `conda env create -f conda.yaml`.
- Activate the environment: `conda activate acmc`
2.**Sign Up:** Register at [NHS TRUD](https://isd.digital.nhs.uk/trud/user/guest/group/0/account/form)
3.**Subscribe** and accept the following licenses:
Each data file has a "Subscribe" link that will take you to the licence. You will need to "Tell us about your subscription request" that summarises why you need access to the data. Your subscription will not be approved immediately and will remain in the "pending" state until it is. This is usually approved within 24 hours.
3.**Obtain API Key:** Retrieve your API key from [NHS TRUD Account Management](https://isd.digital.nhs.uk/trud/users/authenticated/filters/0/account/manage).
4.**Get API Key:** Retrieve your API key from [NHS TRUD Account Management](https://isd.digital.nhs.uk/trud/users/authenticated/filters/0/account/manage).
5.**Install TRUD:** Download and install NHS TRUD medical code resources.
Executing the script using the command: `python trud_api.py --key <API_KEY>`.
4.**Install TRUD:** Download and Install NHS TRUD medical code resources.
Executing the script using the command: `python trud_api.py --key <API_KEY>`.
Processed tables will be saved as `.parquet` files in the `maps/processed/` directory.
Processed tables will be saved as `.parquet` files in the `maps/processed/` directory.
-*Note: NHS TRUD defines one-way mappings and does <b>NOT ADVISE</b> reversing the mappings. If you still wish to reverse these into two-way mappings, duplicate the given `.parquet` table and reverse the filename (e.g. `read2_code_to_snomed_code.parquet` to `snomed_code_to_read2_code.parquet`)*
-*Note: NHS TRUD defines one-way mappings and does <b>NOT ADVISE</b> reversing the mappings. If you still wish to reverse these into two-way mappings, duplicate the given `.parquet` table and reverse the filename (e.g. `read2_code_to_snomed_code.parquet` to `snomed_code_to_read2_code.parquet`)*
5.***Optional: Install OMOP Database:** Download and install OMOP vocabularies from [Athena OHDSI](https://athena.ohdsi.org/vocabulary/list).
6.***Optional: Install OMOP Database:** Download and install OMOP vocabularies from [Athena OHDSI](https://athena.ohdsi.org/vocabulary/list).
error_exit(f"Failed to fetch releases for item {item_id}. Status code: {response.status_code}")
error_exit(f"Failed to fetch releases for item {item_id}. Status code: {response.status_code}, error {response.json()['message']}. If no releases found for API key, please ensure you are subscribed to the data release and that it is not pending approval")