Skip to content
Snippets Groups Projects

A Tool for Automating the Curation of Medical Concepts derived from Coding Lists (ACMC)

Jakub J. Dylag 1, Roberta Chiovoloni 3, Ashley Akbari 3, Simon D. Fraser 2, Michael J. Boniface 1

1 Digital Health and Biomedical Engineering, School of Electronics and Computer Science, Faculty of Engineering and Physical Sciences, University of Southampton
2 School of Primary Care Population Sciences and Medical Education, University of Southampton
3 Population Data Science, Swansea University Medical School, Faculty of Medicine, Health & Life Science, Swansea University

Correspondence to: Jakub J. Dylag, Digital Health and Biomedical Engineering, School of Electronics and Computer Science, Faculty of Engineering and Physical Sciences, University of Southampton, J.J.Dylag@soton.ac.uk

Citation

Dylag JJ, Chiovoloni R, Akbari A, Fraser SD, Boniface MJ. A Tool for Automating the Curation of Medical Concepts derived from Coding Lists. GitLab [Internet]. May 2024. Available from: https://git.soton.ac.uk/meldb/concepts-processing

Introduction

This tool automates the verification, translation and organisation of medical coding lists defining phenotypes for inclusion criteria in cohort analysis. By processing externally sourced clinical inclusion criteria into actionable code lists, this tool ensures consistent and efficient curation of cohort definitions. These code lists can be subsequently used by data providers to construct study cohorts.

Requirements

  • Python 3.9 or higher

Installation

To install the acmc package, simply run:

pip install acmc

Once installed, you'll be ready to use the acmc tool along with the associated vocabularies.

Getting Started

Install Clinically Assured NHS TRUD Code Mappings

  1. Register at TRUD

    Registry your account with TRUD at NHS TRUD.

  2. Subscribe and Accept Licenses: Subscribe to the following data files:

    After subscribing, you'll receive an API key once your request is approved (usually within 24 hours).

  3. Get TRUD API KEY

    Copy your API key from NHS TRUD Account Management and store it securely.

  4. Add TRUD API KEY to as an environment variable

    To set the environment variable temporarily (for the current session), run:

    On macOS/Linux:

    export ACMC_TRUD_API_KEY="your_api_key_here"

    On Windows (Command Prompt or PowerShell):

    setx ACMC_TRUD_API_KEY "your_api_key_here"
  5. Download and Install TRUD Resources:

    Run the following acmc command to download and process the TRUD resources:

    acmc trud install

Install OMOP Vocabularies

  1. Register with OHDSI Athena

  2. Download vocabularies from OHDSI Athena

    • Required vocabularies include:
        1. SNOMED
        1. ICD9CM
        1. Readv2
        1. ATC
        1. OPCS4
        1. HES Specialty
        1. ICD10CM
        1. dm+d
        1. UK Biobank
        1. NHS Ethnic Category
        1. NHS Place of Service

    You will be notified by email with a vocabularies version number and link to download a zip file of OMOP database tables in CSV format. The subject will be OHDSI Standardized Vocabularies. Your download link from pallas@ohdsi.org

Content of your package

Vocabularies release version: v20240830
acmc-omop Vocabularies:
SNOMED	-	Systematic Nomenclature of Medicine - Clinical Terms (IHTSDO)
ICD9CM	-	International Classification of Diseases, Ninth Revision, Clinical Modification, Volume 1 and 2 (NCHS)
Read	-	NHS UK Read Codes Version 2 (HSCIC)
ATC	-	WHO Anatomic Therapeutic Chemical Classification
OPCS4	-	OPCS Classification of Interventions and Procedures version 4 (NHS)
HES Specialty	-	Hospital Episode Statistics Specialty (NHS)
ICD10CM	-	International Classification of Diseases, Tenth Revision, Clinical Modification (NCHS)
dm+d	-	Dictionary of Medicines and Devices (NHS)
UK Biobank	-	UK Biobank (UK Biobank)
NHS Ethnic Category	-	NHS Ethnic Category
NHS Place of Service	-	NHS Admission Source and Discharge Destination
Installation of the OHDSI Standardized Vocabularies

Please execute the following process:

    Click on this link to download the zip file. Typical file sizes, depending on the number of vocabularies selected, are between 30 and 1500 MB.
    Unpack.
    Reconstitute CPT-4. See below for details.
    If needed, create the tables.
    Load the unpacked files into the tables.
Download the OMOP file onto your computer and note the path to the file
  1. Install OMOP vocabularies

    Run the following acmc command to create a local OMOP database from the OMOP zip file with a specific version:

    acmc omop install -f <path to downloaded OMOP zip file> -v <release version from email>

Example

Follow these steps to initialize and manage a phenotype using acmc. In this example, we use a source concept code list for the Concept Set Abdominal Pain created from ClinicalCodes.org. The source concept codes are is read2. We genereate versioned phenotypes for read2 and then translate to snomed with a another version.

  1. Initialize a phenotype in the workspace

    Use the followijng acmc command to initialize the phenotype in a local Git repository:

acmc phen init
  1. Copy example medical code lists to the phenotype codes directory

    From the command prompt, copy medical code lists /examples/codesto the phenotype code directory:

cp -r ./examples/codes/* ./workspace/phen/codes
  1. Copy the example phenotype configuration file to the phenotype directory

    From the command prompt, copy example phenotype configuration files /examples/config.json to the phenotype directory:

cp -r ./examples/config.json ./workspace/phen
  1. Validate the phenotype configuration

    Use the followijng acmc command to validate the phenotype configuration to ensure it's correct:

acmc phen validate
**Expected Output:**

Once the command is executed, you should see output similar to this:
[INFO] - Validating phenotype: <path>/concepts-processing/workspace/phen
[INFO] - Phenotype validated successfully
  1. Publish phenotype at an initial version

    Use the following acmc command to publish the phenotype at an initial version:

acmc phen publish
  1. Generate phenotype in SNOWMED code format

Generate the phenotype in snomed format:

acmc phen map -t snomed
  1. Get a copy of the previous version from the repo

    Use the following acmc command to retrieve a copy of the previous version (v1.0.3) from the repository:

acmc phen copy -v v1.0.3
  1. Compare the previous version v1.0.3 with the latest version

    Use the following acmc command to compare the previous version (v1.0.3) with the latest version in the repository:

acmc phen diff -old ./workspace/v1.0.3/
  1. Publish the phenotype at the next version

    Use the following acmc command to publish the phenotype at the next version:

acmc phen publish

Usage

The acmc command-line tool provides various commands to interact with TRUD, OMOP, and Phenotype data. Below are the usage details for each command.

General Syntax

acmc [OPTIONS] COMMAND [SUBCOMMAND] [ARGUMENTS]

Where:

  • [OPTIONS] are global options that apply to all commands (e.g., --debug, --version).
  • [COMMAND] is the top-level command (e.g., trud, omop, phen).
  • [SUBCOMMAND] refers to the specific operation within the command (e.g., install, validate).

Global Options

  • --version: Display the acmc tool version number
  • --debug: Enable debug mode for more verbose logging.

Commands

TRUD Command

The trud command is used for installing NHS TRUD vocabularies.

  • Install TRUD

    Install clinically assurred TRUD medical code mappings:

    acmc trud install

OMOP Command

The omop command is used for installing OMOP vocabularies.

  • Install OMOP

    Install vocabularies in a local OMOP database:

    acmc omop install -d <OMOP_DIRECTORY_PATH> -v <OMOP_VERSION>
    • -d, --omop-dir: (Optional) Directory path to extracted OMOP downloads, default is ./build/omop
    • -v, --version: OMOP vocabularies release version.
  • Clear OMOP

    Clear data from the local OMOP database:

    acmc omop clear
  • Delete OMOP

    Delete the local OMOP database:

    acmc omop delete

PHEN Command

The phen command is used phenotype-related operations.

  • Initialize Phenotype

    Initialize a phenotype directory locally or from a remote git repository:

    acmc phen init -d <PHENOTYPE_DIRECTORY> -r <REMOTE_URL>
    • -d, --phen-dir: (Optional) Directory to write phenotype configuration (the default is ./build/phen).
    • -r, --remote_url: (Optional) URL to a remote git repository.
  • Validate Phenotype

    Validate the phenotype configuration:

    acmc phen validate -d <PHENOTYPE_DIRECTORY>
    • -d, --phen-dir: (Optional) Directory of phenotype configuration (the default is ./build/phen).
  • Map Phenotype

    Process phenotype mapping and specify the target coding and output format:

    acmc phen map -d <PHENOTYPE_DIRECTORY> -t <TARGET_CODING> -o <OUTPUT_FORMAT>
    • -t, --target-coding: Specify the target coding (e.g., read2, read3, icd10, snomed, opcs4).
    • -d, --phen-dir: (Optional) Directory of phenotype configuration (the default is ./build/phen).
    • -o, --output: Output format(s) (csv, omop, or both), default is 'csv'.
  • Publish Phenotype Configuration

    Publish a phenotype configuration, committing all changes and tagging with a new version number. If the phenotype has been initialised from a remote git URL, then the commit and new version tag will be pushed to the remote repo:

    acmc phen publish -d <PHENOTYPE_DIRECTORY>
    • -d, --phen-dir: (Optional) Directory of phenotype configuration (the default is ./build/phen).
  • Copy Phenotype Configuration

    Copy a phenotype configuration from a source directory to a target directory at a specific version. This is used when wanting to compare versions of phenotypes using the acmc phen diff command:

    acmc phen copy -d <PHENOTYPE_DIRECTORY> -td <TARGET_DIRECTORY> -v <PHENOTYPE_VERSION>
    • -d, --phen-dir: (Optional) Directory of phenotype configuration (the default is ./build/phen).
    • -td, --target-dir: (Optional) Directory to copy the phenotype configuration to, (the default is ./build).
    • -v, --version: The phenotype version to copy.
  • Compare Phenotype Configurations

    Compare a a new phenotype version with pervious version of a phenotype:

    acmc phen diff -d <NEW_PHENOTYPE_DIRECTORY> -old <OLD_PHENOTYPE_DIRECTORY>
    • -d, --phen-dir: (Optional) Directory of current phenotype configuration (the default is ./build/phen).
    • -old, --phen-dir-old: (Required) Directory of old phenotype version)

License

MIT License

Support

For issues, open an issue in the repository

Contributing

Please contacted the corresponding author Jakub Dylag at J.J.Dylag@soton.ac.uk.

Acknowledgements

This project was developed in the context of the MELD-B project, which is funded by the UK National Institute of Health Research under grant agreement NIHR203988.

License

This work is licensed under a Apache License, Version 2.0.

apache2