<center>
  <img src="https://git.soton.ac.uk/meldb/concepts-processing/-/raw/main/docs/img/University_of_Southampton_Logo.png" height="100" style="padding-right: 50px;" />
  <img src="https://git.soton.ac.uk/meldb/concepts-processing/-/raw/main/docs/img/swansea-university-logo-vector.png" height="100" />
</center>

# A Tool for Automating the Curation of Medical Concepts derived from Coding Lists (ACMC)

### Jakub J. Dylag <sup>1</sup>, Roberta Chiovoloni <sup>3</sup>, Ashley Akbari <sup>3</sup>, Simon D. Fraser <sup>2</sup>, Michael J. Boniface <sup>1</sup>

<sup>1</sup> Digital Health and Biomedical Engineering, School of Electronics and Computer Science, Faculty of Engineering and Physical Sciences, University of Southampton<br>
<sup>2</sup> School of Primary Care Population Sciences and Medical Education, University of Southampton <br>
<sup>3</sup> Population Data Science, Swansea University Medical School, Faculty of Medicine, Health & Life Science, Swansea University <br>

*Correspondence to: Jakub J. Dylag, Digital Health and Biomedical Engineering, School of Electronics and Computer Science, Faculty of Engineering and Physical Sciences, University of Southampton, J.J.Dylag@soton.ac.uk*

### Citation
> Dylag JJ, Chiovoloni R, Akbari A, Fraser SD, Boniface MJ. A Tool for Automating the Curation of Medical Concepts derived from Coding Lists. GitLab [Internet]. May 2024. Available from: https://git.soton.ac.uk/meldb/concepts-processing

## Introduction

This tool automates the verification, translation and organisation of medical coding lists defining  phenotypes for inclusion criteria in cohort analysis. By processing externally sourced clinical inclusion criteria into actionable code lists, this tool ensures consistent and efficient curation of cohort definitions. These code lists can be subsequently used by data providers to construct study cohorts.

## Requirements

- Python 3.9 or higher

## Installation

To install the `acmc` package, simply run:

```bash
pip install acmc
```

Once installed, you'll be ready to use the `acmc` tool along with the associated vocabularies.

## Getting Started

### Install Clinically Assured NHS TRUD Code Mappings

1. **Register at TRUD**

	Registry your account with TRUD  at [NHS TRUD](https://isd.digital.nhs.uk/trud/user/guest/group/0/account/form).

3. **Subscribe and Accept Licenses**: Subscribe to the following data files:

   - [NHS Read Browser](https://isd.digital.nhs.uk/trud/users/guest/filters/2/categories/9/items/8/releases)
   - [NHS Data Migration](https://isd.digital.nhs.uk/trud/users/guest/filters/0/categories/8/items/9/releases)
   - [ICD10 Edition 5 XML](https://isd.digital.nhs.uk/trud/users/guest/filters/0/categories/28/items/259/releases)
   - [OPCS-4.10 Data Files](https://isd.digital.nhs.uk/trud/users/guest/filters/0/categories/10/items/119/releases)

   After subscribing, you'll receive an API key once your request is approved (usually within 24 hours).

4. **Get TRUD API KEY**

	Copy your API key from [NHS TRUD Account Management](https://isd.digital.nhs.uk/trud/users/authenticated/filters/0/account/manage) and store it securely.

7. **Add TRUD API KEY to as an environment variable**

	To set the environment variable temporarily (for the current session), run:

	On macOS/Linux:

   ```bash
   export ACMC_TRUD_API_KEY="your_api_key_here"
   ```

   On Windows (Command Prompt or PowerShell):

   ```bash
   setx ACMC_TRUD_API_KEY "your_api_key_here"
   ```

4. **Download and Install TRUD Resources**:

	Run the following `acmc` command to download and process the TRUD resources:

   ```bash
   acmc trud install
   ```

### Install OMOP Vocabularies 

1. **Register with [OHDSI Athena](https://athena.ohdsi.org/auth/login)**

2. **Download vocabularies from [OHDSI Athena](https://athena.ohdsi.org/vocabulary/list)**

	* Required vocabularies include:
	  * 1) SNOMED
	  * 2) ICD9CM
	  * 17) Readv2
	  * 21) ATC
	  * 55) OPCS4
	  * 57) HES Specialty
	  * 70) ICD10CM
	  * 75) dm+d
	  * 144) UK Biobank
	  * 154) NHS Ethnic Category
	  * 155) NHS Place of Service

	You will be notified by email with a vocabularies version number and link to download a zip file of OMOP database tables in CSV format. The subject will be `OHDSI Standardized Vocabularies. Your download link` from `pallas@ohdsi.org`

```
Content of your package

Vocabularies release version: v20240830
acmc-omop Vocabularies:
SNOMED	-	Systematic Nomenclature of Medicine - Clinical Terms (IHTSDO)
ICD9CM	-	International Classification of Diseases, Ninth Revision, Clinical Modification, Volume 1 and 2 (NCHS)
Read	-	NHS UK Read Codes Version 2 (HSCIC)
ATC	-	WHO Anatomic Therapeutic Chemical Classification
OPCS4	-	OPCS Classification of Interventions and Procedures version 4 (NHS)
HES Specialty	-	Hospital Episode Statistics Specialty (NHS)
ICD10CM	-	International Classification of Diseases, Tenth Revision, Clinical Modification (NCHS)
dm+d	-	Dictionary of Medicines and Devices (NHS)
UK Biobank	-	UK Biobank (UK Biobank)
NHS Ethnic Category	-	NHS Ethnic Category
NHS Place of Service	-	NHS Admission Source and Discharge Destination
Installation of the OHDSI Standardized Vocabularies

Please execute the following process:

    Click on this link to download the zip file. Typical file sizes, depending on the number of vocabularies selected, are between 30 and 1500 MB.
    Unpack.
    Reconstitute CPT-4. See below for details.
    If needed, create the tables.
    Load the unpacked files into the tables.
```

	Download the OMOP file onto your computer and note the path to the file

4. **Install OMOP vocabularies**

	Run the following `acmc` command to create a local OMOP database from the OMOP zip file with a specific version:

	```bash
	acmc omop install -f <path to downloaded OMOP zip file> -v <release version from email>
	```

---

## **Example**

Follow these steps to initialize and manage a phenotype using `acmc`. In this example, we use a source concept code list for the Concept Set `Abdominal Pain` created from [ClinicalCodes.org](ClinicalCodes.org). The source concept codes are is read2. We genereate versioned phenotypes for read2 and then translate to snomed with a another version.  

1. **Initialize a phenotype in the workspace**

	Use the followijng `acmc` command to initialize the phenotype in a local Git repository:

```bash
acmc phen init
```

2. **Copy example medical code lists to the phenotype codes directory**

	From the command prompt, copy medical code lists `/examples/codes`to the phenotype code directory:
   - [Download `res176-abdominal-pain.csv`](.//examples/codes/clinical-codes-org/Symptom%20code%20lists/Abdominal%20pain/res176-abdominal-pain.csv)
   - Alternatively, place your code lists in `./workspace/phen/codes`.

```bash
cp -r ./examples/codes/* ./workspace/phen/codes
```

3. **Copy the example phenotype configuration file to the phenotype directory**

	From the command prompt, copy example phenotype configuration files `/examples/config.json` to the phenotype directory:
   - [Download `config.json`](./examples/config.json) 
   - Alternatively, place your own `config.json` file in `./workspace/phen`.

```bash
cp -r ./examples/config.json ./workspace/phen
```

4. **Validate the phenotype configuration**

	Use the followijng `acmc` command to validate the phenotype configuration to ensure it's correct:

```bash
acmc phen validate
```

	**Expected Output:**

	Once the command is executed, you should see output similar to this:

```bash
[INFO] - Validating phenotype: <path>/concepts-processing/workspace/phen
[INFO] - Phenotype validated successfully
```

6. **Publish phenotype at an initial version**

	Use the following `acmc` command to publish the phenotype at an initial version:

```bash
acmc phen publish
```

7. **Generate phenotype in SNOWMED code format**

Generate the phenotype in `snomed` format:

```bash
acmc phen map -t snomed
```

8. **Get a copy of the previous version from the repo**

	Use the following `acmc` command to retrieve a copy of the previous version (`v1.0.3`) from the repository:

```bash
acmc phen copy -v v1.0.3
```

9. **Compare the previous version `v1.0.3` with the latest version**

	Use the following `acmc` command to compare the previous version (`v1.0.3`) with the latest version in the repository:

```bash
acmc phen diff -old ./workspace/v1.0.3/
```

10. **Publish the phenotype at the next version**

	Use the following `acmc` command to publish the phenotype at the next version:

```bash
acmc phen publish
```

## Usage

The `acmc` command-line tool provides various commands to interact with TRUD, OMOP, and Phenotype data. Below are the usage details for each command.

### General Syntax

```bash
acmc [OPTIONS] COMMAND [SUBCOMMAND] [ARGUMENTS]
```

Where:
- `[OPTIONS]` are global options that apply to all commands (e.g., `--debug`, `--version`).
- `[COMMAND]` is the top-level command (e.g., `trud`, `omop`, `phen`).
- `[SUBCOMMAND]` refers to the specific operation within the command (e.g., `install`, `validate`).

### Global Options

- `--version`: Display the acmc tool version number
- `--debug`: Enable debug mode for more verbose logging.

### Commands

#### TRUD Command

The `trud` command is used for installing NHS TRUD vocabularies.

- **Install TRUD**

  Install clinically assurred TRUD medical code mappings:

  ```bash
  acmc trud install
  ```

#### OMOP Command

The `omop` command is used for installing OMOP vocabularies.

- **Install OMOP**

  Install vocabularies in a local OMOP database:

  ```bash
  acmc omop install -d <OMOP_DIRECTORY_PATH> -v <OMOP_VERSION>
  ```

  - `-d`, `--omop-dir`: (Optional) Directory path to extracted OMOP downloads, default is `./build/omop`
  - `-v`, `--version`: OMOP vocabularies release version.

- **Clear OMOP**

  Clear data from the local OMOP database:

  ```bash
  acmc omop clear
  ```

- **Delete OMOP**

  Delete the local OMOP database:

  ```bash
  acmc omop delete
  ```

#### PHEN Command

The `phen` command is used phenotype-related operations.

- **Initialize Phenotype**

  Initialize a phenotype directory locally or from a remote git repository:

  ```bash
  acmc phen init -d <PHENOTYPE_DIRECTORY> -r <REMOTE_URL>
  ```

  - `-d`, `--phen-dir`: (Optional) Directory to write phenotype configuration (the default is ./build/phen).
  - `-r`, `--remote_url`: (Optional) URL to a remote git repository.

- **Validate Phenotype**

  Validate the phenotype configuration:

  ```bash
  acmc phen validate -d <PHENOTYPE_DIRECTORY>
  ```

  - `-d`, `--phen-dir`: (Optional) Directory of phenotype configuration (the default is ./build/phen).

- **Map Phenotype**

  Process phenotype mapping and specify the target coding and output format:

  ```bash
  acmc phen map -d <PHENOTYPE_DIRECTORY> -t <TARGET_CODING> -o <OUTPUT_FORMAT>
  ```

  - `-t`, `--target-coding`: Specify the target coding (e.g., `read2`, `read3`, `icd10`, `snomed`, `opcs4`).
  - `-d`, `--phen-dir`: (Optional) Directory of phenotype configuration (the default is ./build/phen).
  - `-o`, `--output`: Output format(s) (`csv`, `omop`, or both), default is 'csv'.

- **Publish Phenotype Configuration**

  Publish a phenotype configuration, committing all changes and tagging with a new version number. If the phenotype has been initialised from a remote git URL, then the commit and new version tag will be pushed to the remote repo:

  ```bash
  acmc phen publish -d <PHENOTYPE_DIRECTORY>
  ```

  - `-d`, `--phen-dir`: (Optional) Directory of phenotype configuration (the default is ./build/phen).

- **Copy Phenotype Configuration**

  Copy a phenotype configuration from a source directory to a target directory at a specific version. This is used when wanting to compare versions of phenotypes using the `acmc phen diff` command: 

  ```bash
  acmc phen copy -d <PHENOTYPE_DIRECTORY> -td <TARGET_DIRECTORY> -v <PHENOTYPE_VERSION>
  ```

  - `-d`, `--phen-dir`: (Optional) Directory of phenotype configuration (the default is ./build/phen).
  - `-td`, `--target-dir`: (Optional) Directory to copy the phenotype configuration to, (the default is ./build).
  - `-v`, `--version`: The phenotype version to copy.

- **Compare Phenotype Configurations**

  Compare a a new phenotype version with pervious version of a phenotype:

  ```bash
  acmc phen diff -d <NEW_PHENOTYPE_DIRECTORY> -old <OLD_PHENOTYPE_DIRECTORY>
  ```

  - `-d`, `--phen-dir`: (Optional) Directory of current phenotype configuration (the default is ./build/phen).
  - `-old`, `--phen-dir-old`: (Required) Directory of old phenotype version)


## License

MIT License

## Support

For issues, open an [issue in the repository](https://git.soton.ac.uk/meldb/concepts-processing/-/issues)

## Contributing

Please contacted the corresponding author Jakub Dylag at J.J.Dylag@soton.ac.uk.

## Acknowledgements  

This project was developed in the context of the [MELD-B](https://www.southampton.ac.uk/publicpolicy/support-for-policymakers/policy-projects/Current%20projects/meld-b.page) project, which is funded by the UK [National Institute of Health Research](https://www.nihr.ac.uk/) under grant agreement NIHR203988.

## License

This work is licensed under a [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0).

![apache2](https://img.shields.io/github/license/saltstack/salt)