Skip to content
Snippets Groups Projects
Select Git revision
  • 859d27c788551a48eb66bff6531d736e1a40c30a
  • dev default
  • 64-feat-blacklist-unwanted-concepts-from-output
  • 61-feature-add-optional-backwards-mapping-for-consistency-with-older-version-2
  • main protected
  • 11-test-fix-tests-to-handle-licensed-data-resources-from-trud-snd-omop
  • general
  • pypi
  • old-main
  • v0.0.3
10 results

docs

  • Clone with SSH
  • Clone with HTTPS
  • user avatar
    Jakub Dylag authored
    859d27c7
    History
    Name Last commit Last update
    ..
    api
    examples
    img
    changelog.md
    cli.md
    index.md

    A Tool for Automating the Curation of Medical Concepts derived from Coding Lists (ACMC)

    Overview

    ACMC is a tool that automates the verification, translation and organisation of medical coding lists defining phenotypes for inclusion criteria in cohort analysis. By processing externally sourced clinical inclusion criteria into actionable code lists, this tool ensures consistent and efficient curation of cohort definitions. These code lists can be subsequently used by data providers to construct study cohorts.

    Phenotype Workflow

    ACMC has a five step workflow to create a phenotype including steps initialise, validate, map, publish and export

    Step 1: Initialise

    The initialise step creates a phenotype directory within the acmc workspace. The outcome will be a directory with all required subdirectories and files, see directory structure

    acmc phen init

    Step 2: Validate

    The validate step checks that the phenotype configuration is valid including verification of the configuration file according to schema and consistency between concept sets and source concept coding lists. The outcome will be notifications of the validity of the phenotype configuration.

    acmc phen validate

    Step 3: Map

    The map step performs code translations between source and target coding types. The outcome will be a concept sets defined for the target coding types stored in CSV files.

    acmc phen map

    Step 4: Publish

    The publish step commits the phenotype to a git repo and increments the version number. The outcome will be a published phenotype at the next version number

    acmc phen publish

    Step 5: Export

    The export step creates an OMOP database for the phenotype. The outcome will be an OMOP database including concept sets for all target coding types exported as CSV files

    acmc phen export

    Phenotype Definition

    Phenotype directory structure

    Each phenotype has a standard directory structure that includes configuration, source concept coding lists, maps to target coding types and output concept sets.

    workspace/                          # Default workspace directory
    ├── phen/                           # Default phenotype directory
    │   ├── concepts/                   # Phenotype source concept code lists directory
    │   ├── concept-sets/               # Processed phenotype concept sets
    │   │   ├── csv/                    # Processed phenotype concept sets in ACMC CSV format
    │   │   ├── omop/                   # Processed phenotype concept sets in OMOP CDM database exported as CSV files
    │   ├── map/                        # Process mapping from source to target code types
    │   │   ├── errors/                 # Errors recorded during mapping process
    │   ├── config.yml                 # Phenotype configuration file
    │   ├── vocab_versions.yml         # Versions file for vocabularies used to generate concept sets

    Configuration File

    Each phenotype is defined by configuration is stored in the root of the phenotype directory in config.yml. The file is yaml format.

    Root Phenotype Element

    • phenotype: (object) The root element containing all phenotype-related concept sets and metadata.

    Phenotype Attributes

    • version: (string) Specifies the version of the phenotype definition.
    • omop: (object) Metadata related to OMOP vocabulary.
      • vocabulary_id: (string) Identifier for the vocabulary.
      • vocabulary_name: (string) Human-readable name of the vocabulary.
      • vocabulary_reference: (string, URL) A reference URL for the vocabulary source.

    Concept Sets

    • concept_sets: (array) A list of concept set definitions, where each item has the following attributes:
      • name: (string) Unique name of the concept set.
      • file: (object) Contains file-related metadata.
        • path: (string, file path) Relative path to the source concepts coding list file, relative to <phen_directory>/concepts
        • columns: (object) Key-value pairs mapping column names in the file to coding list types
      • category (optional, string) A categorical identifier for processing files containing multiple concept sets.
      • actions (optional, object) Additional transformations on data.
        • divide_col: (string) Specifies a column name in the source concept file to group on.
      • metadata: (object) Reserved for additional metadata.

    Phenotype Version Control

    ACMC uses Git to support versioning of phenotypes. Git is a version control system that track changes in documents such as source coding lists, coding maps or configuration files. Using git allows ACMC to track versions and changes.

    When a phenotype is initialised the directory is configured as a Git repository. ACMC then provides a simplified interaction with Git through a specific workflow using ACMC commands including integrate with remote Git services such as GitLab or GitHub.

    ACMC does not currently support merging contributions from multiple collaborators on a phenotype through ACMC commands. This has to be done using existing Git tools.

    Version Numbers

    ACMC uses semantic versioning to version phenotypes. Semantic versioning uses three numbers MAJOR.MINOR.PATCH where each number is incremented depending on the significance of the change. Although semantic versioning is designed for sofware the idea of major, minor and patch changes is retained for the phenotype as per the following

    • MAJOR version when you make changes to concept sets
    • MINOR version when you make changes to coding list concepts
    • PATCH version when you make other minor changes such as documentation

    Workflows

    ACMC supports local and remote repositories

    Local Workflow

    A local phenotype is only stored within a directory on a filesystem. The following command will create a git repository with the initial phenotype directory structure and make a commit to the git repository.

    acmc phen init

    You can then configure your phenotype and generate maps to other coding types as required. When you are finished and happy to publish a version of your phenotype, you run the following command

    acmc phen publish

    This will commit the changes to the git repository and generate a new version number. If this is the first publish the initial version will be 0.0.1. You can tell ACMC how to increment the version using the -i argument with either major, minor or patch. The defaul is a patch change, i.e. incrementing the patch number. Using the following command will create a major release 1.0.0

    acmc phen publish -i major

    Remote Workflow

    A remote phenotype is stored on a central server that can be accessed remotely by others. Common central services include GitHub or GitLab (public or private). You can connect your local phenotype to a remote repository during initialisation or publication. When connecting to a remote repository it is important and recommended that you connect to an empty repo without any previous commits. Do not initialise it with a readme.md file, which is often the default. If there are commits you will need to resolve the conflicts manually before ACMC will work.

    To initiatise a phenotype with a remote Git repository using the following command replacing the git URL with the URL to your remote repo.

    acmc phen init -r https://git.soton.ac.uk/meldb/remote-phenotype.git

    If you have a local phenotype and later want to connect it to a remote phenotype you can do this when it's published

    acmc phen publish -r https://git.soton.ac.uk/meldb/remote-phenotype.git

    Fork Remote Workflow

    If there is an existing published remote phenotype that you want to use as a starting point you can fork the upstream repo and create a new phenotype. To do this you can run the following command to create a fork of the remote repo in a local directory

    acmc fork -u https://git.soton.ac.uk/meldb/forked-phenotype.git -v 1.0.0

    If you want to fork the repo and connect this to a remote repo you can run the following command.

    acmc fork -u https://git.soton.ac.uk/meldb/forked-phenotype.git -v 1.0.0 -r https://git.soton.ac.uk/meldb/remote-phenotype.git

    Alternatively you can connect the remote repo later when you publish

    acmc phen publish -r https://git.soton.ac.uk/meldb/remote-phenotype.git

    Supported Medical Coding Standards

    The tool supports verification and mapping across diagnostic coding formats below:

    Medical Code Verification Translation to
    Readv2 NHS TRUD Readv3, SNOMED, ICD10, OPCS4, ATC
    Readv3 (CTV3) NHS TRUD Readv3, SNOMED, ICD10, OPCS4
    ICD10 NHS TRUD None
    SNOMED NHS TRUD None
    OPCS4 NHS TRUD None
    ATC None None
    • Read V2: NHS clinical terminology standard used in primary care and replaced by SNOMED-CT in 2018; Still supported by some data providers as widely used in primary care, e.g. SAIL Databank
    • SNOMED-CT: international standard for clinical terminology for Electronic Healthcare Records adopted by the NHS in 2018; Mappings to Read codes are partially provided by Clinical Research Practice Database (CPRD) and NHS Technology Reference Update Distribution (TRUD).
    • ICD-10: International Classification of Diseases (ICD) is a medical classification list from the World Health Organization (WHO) and widely used in hospital settings, e.g. Hospital Episode Statistics (HES).
    • ATC Codes: Anatomical Therapeutic Chemical (ATC) Classification is a drug classification list from the World Health Organization (WHO)

    Notes

    Processed resources will be saved in the build/maps/processed/ directory.

    Note: NHS TRUD provides one-way mappings. To reverse mappings, duplicate the .parquet file and reverse the filename (e.g., read2_code_to_snomed_code.parquet to snomed_code_to_read2_code.parquet).