

A Tool for Automating the Curation of Medical Concepts derived from Coding Lists (ACMC)
Jakub J. Dylag 1, Roberta Chiovoloni 3, Ashley Akbari 3, Simon D. Fraser 2, Michael J. Boniface 1
1 Digital Health and Biomedical Engineering, School of Electronics and Computer Science, Faculty of Engineering and Physical Sciences, University of Southampton
2 School of Primary Care Population Sciences and Medical Education, University of Southampton
3 Population Data Science, Swansea University Medical School, Faculty of Medicine, Health & Life Science, Swansea University
Correspondence to: Jakub J. Dylag, Digital Health and Biomedical Engineering, School of Electronics and Computer Science, Faculty of Engineering and Physical Sciences, University of Southampton, J.J.Dylag@soton.ac.uk
Citation
Dylag JJ, Chiovoloni R, Akbari A, Fraser SD, Boniface MJ. A Tool for Automating the Curation of Medical Concepts derived from Coding Lists. GitLab [Internet]. May 2024. Available from: https://git.soton.ac.uk/meldb/concepts-processing
Introduction
This tool automates the verification, translation and organisation of medical coding lists defining phenotypes for inclusion criteria in cohort analysis. By processing externally sourced clinical inclusion criteria into actionable code lists, this tool ensures consistent and efficient curation of cohort definitions. These code lists can be subsequently used by data providers to construct study cohorts.
Requirements
- Python 3.9 or higher
Installation
To install the acmc
package, simply run:
pip install acmc
Once installed, you'll be ready to use the acmc
tool along with the associated vocabularies.
Getting Started
Install Clinically Assured NHS TRUD Code Mappings
-
Register at TRUD
Registry your account with TRUD at NHS TRUD.
-
Subscribe and Accept Licenses: Subscribe to the following data files:
After subscribing, you'll receive an API key once your request is approved (usually within a few hours).
-
Get TRUD API KEY
Copy your API key from NHS TRUD Account Management and store it securely.
-
Add TRUD API KEY to as an environment variable
To set the environment variable temporarily (for the current session), run:
On macOS/Linux:
export ACMC_TRUD_API_KEY="your_api_key_here"
On Windows (Command Prompt or PowerShell):
setx ACMC_TRUD_API_KEY "your_api_key_here"
-
Download and Install TRUD Resources:
Run the following
acmc
command to download and process the TRUD resources:acmc trud install
Install OMOP Vocabularies
-
Register with OHDSI Athena
-
Download vocabularies from OHDSI Athena
- Required vocabularies include:
-
- SNOMED
-
- ICD9CM
-
- Readv2
-
- ATC
-
- OPCS4
-
- HES Specialty
-
- ICD10CM
-
- dm+d
-
- UK Biobank
-
- NHS Ethnic Category
-
- NHS Place of Service
-
You will be notified by email (usually within an hour) with a vocabularies version number and link to download a zip file of OMOP database tables in CSV format. The subject will be
OHDSI Standardized Vocabularies. Your download link
frompallas@ohdsi.org
- Required vocabularies include:
Content of your package
Vocabularies release version: v20240830
acmc-omop Vocabularies:
SNOMED - Systematic Nomenclature of Medicine - Clinical Terms (IHTSDO)
ICD9CM - International Classification of Diseases, Ninth Revision, Clinical Modification, Volume 1 and 2 (NCHS)
Read - NHS UK Read Codes Version 2 (HSCIC)
ATC - WHO Anatomic Therapeutic Chemical Classification
OPCS4 - OPCS Classification of Interventions and Procedures version 4 (NHS)
HES Specialty - Hospital Episode Statistics Specialty (NHS)
ICD10CM - International Classification of Diseases, Tenth Revision, Clinical Modification (NCHS)
dm+d - Dictionary of Medicines and Devices (NHS)
UK Biobank - UK Biobank (UK Biobank)
NHS Ethnic Category - NHS Ethnic Category
NHS Place of Service - NHS Admission Source and Discharge Destination
Installation of the OHDSI Standardized Vocabularies
Please execute the following process:
Click on this link to download the zip file. Typical file sizes, depending on the number of vocabularies selected, are between 30 and 1500 MB.
Unpack.
Reconstitute CPT-4. See below for details.
If needed, create the tables.
Load the unpacked files into the tables.
Download the OMOP file onto your computer and note the path to the file
-
Install OMOP vocabularies
Run the following
acmc
command to create a local OMOP database from the OMOP zip file with a specific version:acmc omop install -f <path to downloaded OMOP zip file> -v <release version from email>
Expected output:
[INFO] - Installing OMOP from zip file: ../data/acmc-omop.zip
[INFO] - Extracted OMOP zip file ../data/acmc-omop.zip to vocab/omop/
[INFO] - Processing 1 of 9 tables: vocab/omop/CONCEPT.csv
[INFO] - Processing 2 of 9 tables: vocab/omop/DOMAIN.csv
[INFO] - Processing 3 of 9 tables: vocab/omop/CONCEPT_CLASS.csv
[INFO] - Processing 4 of 9 tables: vocab/omop/RELATIONSHIP.csv
[INFO] - Processing 5 of 9 tables: vocab/omop/DRUG_STRENGTH.csv
[INFO] - Processing 6 of 9 tables: vocab/omop/VOCABULARY.csv
[INFO] - Processing 7 of 9 tables: vocab/omop/CONCEPT_SYNONYM.csv
[INFO] - Processing 8 of 9 tables: vocab/omop/CONCEPT_ANCESTOR.csv
[INFO] - Processing 9 of 9 tables: vocab/omop/CONCEPT_RELATIONSHIP.csv
[INFO] - OMOP installation completed
Example Usage
Follow these steps to initialize and manage a phenotype using acmc
. In this example, we use a source concept list for the Concept Set Abdominal Pain
created from ClinicalCodes.org. The source concept codes are read2. We genereate versioned phenotypes for read2 and translate to snomed in normalised, standard formats.
-
Initialize a phenotype in the workspace
Use the followijng
acmc
command to initialize the phenotype in a local Git repository:
acmc phen init
Expected Output:
[INFO] - Initialising Phenotype in directory: <path>/concepts-processing/workspace/phen
[INFO] - Creating phen directory structure and config files
[INFO] - Phenotype initialised successfully
-
Copy example medical code lists to the phenotype codes directory
From the command prompt, copy medical code lists
/examples/codes
to the phenotype code directory:
cp -r ./examples/codes/* ./workspace/phen/codes
- You can view the source code list here
res176-abdominal-pain.csv
- Alternatively, place your code lists in
./workspace/phen/codes
.
-
Copy the example phenotype configuration file to the phenotype directory
From the command prompt, copy example phenotype configuration files
/examples/config.json
to the phenotype directory:
cp -r ./examples/config1.yaml ./workspace/phen/config.yaml
- You can view the configuarion file here
config.json
- Alternatively, place your own
config.json
file in./workspace/phen
.
-
Validate the phenotype configuration
Use the followijng
acmc
command to validate the phenotype configuration to ensure it's correct:
acmc phen validate
Expected Output:
[INFO] - Validating phenotype: <path>/concepts-processing/workspace/phen
[INFO] - Phenotype validated successfully
-
Generate phenotype in Read2 code format
Use the following
acmc
command to generate the phenotype inread2
format:
acmc phen map -t read2
Expected Output:
[INFO] - Processing phenotype: <path>/concepts-processing/workspace/phen
[INFO] - Validating phenotype: <path>/concepts-processing/workspace/phen
[INFO] - Phenotype validated successfully
[INFO] - Processing read2 codes...
[INFO] - Converting to target code type read2
[INFO] - Saved mapped concepts to <path>/concepts-processing/workspace/phen/map/read2.csv
[INFO] - Phenotype processed successfully
The concept sets translating read2 to the acmc normalised CSV format will be stored in ./workspace/phen/concept-set/snomed/
in, e.g. ./workspace/phen/concept-set/read2/ABDO_PAIN.csv
.
-
Publish phenotype at an initial version
Use the following
acmc
command to publish the phenotype at an initial version:
acmc phen publish
Expected Output:
[INFO] - Validating phenotype: /home/mjbonifa/datahdd/brcbat/derived_datasets/mjbonifa/concepts-processing/workspace/phen
[INFO] - Phenotype validated successfully
[INFO] - New version: v1.0.3
[INFO] - Phenotype published successfully
- Generate phenotype in SNOWMED code format
Generate the phenotype in snomed
format:
acmc phen map -t snomed
Expected Output:
[INFO] - Processing phenotype: <path>/concepts-processing/workspace/phen
[INFO] - Validating phenotype: <path>/concepts-processing/workspace/phen
[INFO] - Phenotype validated successfully
[INFO] - Processing read2 codes...
[INFO] - Converting to target code type snomed
[INFO] - Saved mapped concepts to <path>/concepts-processing/workspace/phen/map/snomed.csv
[INFO] - Phenotype processed successfully
The concept sets translating read2 to snomed will be stored in acmc CSV format in ./workspace/phen/concept-set/snomed/
, e.g. ./workspace/phen/concept-set/snomed/ABDO_PAIN.csv
-
Get a copy of the previous version from the repo
Use the following
acmc
command to retrieve a copy of the previous version (v1.0.3
) from the repository:
acmc phen copy -v v1.0.3
Expected Output:
[INFO] - Validating phenotype: <path>/concepts-processing/workspace/phen
[INFO] - Phenotype validated successfully
[INFO] - Copying repo <path>/concepts-processing/workspace/phen to <path>/concepts-processing/workspace/v1.0.3
[INFO] - Checking out version v1.0.3...
[INFO] - Phenotype copied successfully
A copy of the phenotype will be created in the directory ./workspace/v1.0.3
-
Compare the previous version
v1.0.3
with the latest versionUse the following
acmc
command to compare the previous versionv1.0.3
with the latest version in the repository:
acmc phen diff -old ./workspace/v1.0.3/
Expected Output:
[INFO] - Validating phenotype: ./workspace/v1.0.3/
[INFO] - Phenotype validated successfully
[INFO] - Validating phenotype: <path>/concepts-processing/workspace/phen
[INFO] - Phenotype validated successfully
[INFO] - Phenotypes diff'd successfully
A report comparing the phenotype versions will be created in the workspace called './workspace/phen/v1.0.3_diff.md'
-
Publish the phenotype at the next version
Use the following
acmc
command to publish the phenotype at the next version:
acmc phen publish
Expected Output:
[INFO] - Validating phenotype: /home/mjbonifa/datahdd/brcbat/derived_datasets/mjbonifa/concepts-processing/workspace/phen
[INFO] - Phenotype validated successfully
[INFO] - New version: v1.0.4
[INFO] - Phenotype published successfully
Support
If you need help please open an issue in the repository
Contributing
Please contacted the corresponding author Jakub Dylag at J.J.Dylag@soton.ac.uk.
Acknowledgements
This project was developed in the context of the MELD-B project, which is funded by the UK National Institute of Health Research under grant agreement NIHR203988.
License
This work is licensed under a Apache License, Version 2.0.