diff --git a/README.md b/README.md index 621a30914ab520a71c120e2871763fba3ef75f25..f932ca0d77839940d6e473929b9051e94385408e 100644 --- a/README.md +++ b/README.md @@ -73,7 +73,7 @@ Once installed, you'll be ready to use the `acmc` tool along with the associated 4. **Download and Install TRUD Resources**: - Run the following acmc command to download and process the TRUD resources: + Run the following `acmc` command to download and process the TRUD resources: ```bash acmc trud install @@ -83,7 +83,7 @@ Once installed, you'll be ready to use the `acmc` tool along with the associated 1. **Register with [OHDSI Athena](https://athena.ohdsi.org/auth/login)** -2. **Request download of vocabularies from [OHDSI Athena](https://athena.ohdsi.org/vocabulary/list)** +2. **Download vocabularies from [OHDSI Athena](https://athena.ohdsi.org/vocabulary/list)** * Required vocabularies include: * 1) SNOMED @@ -127,19 +127,100 @@ Please execute the following process: Load the unpacked files into the tables. ``` -4. **Un-zip the downloaded OMOP files to a directory** + Download the OMOP file onto your computer and note the path to the file - Create a directory where you want the OMOP CSV tables to be stored, the default from the current working directory is `./build/omop`. Unzip the OMOP files into that directory +4. **Install OMOP vocabularies** -5. **Install OMOP vocabularies** - - Run the following acmc command to create a local OMOP database from the download: + Run the following `acmc` command to create a local OMOP database from the OMOP zip file with a specific version: ```bash - acmc omop install -d <Directory path to extracted OMOP downloads> -v <release version from email> + acmc omop install -f <path to downloaded OMOP zip file> -v <release version from email> ``` +Here's a well-structured **README.md** example with the series of steps you provided: + +--- + +## **Example** + +Follow these steps to initialize and manage a phenotype using `acmc`. In this example, we use a source concept code list for the Concept Set `Abdominal Pain` created from [ClinicalCodes.org](ClinicalCodes.org). The source concept codes are is read2. We genereate versioned phenotypes for read2 and then translate to snomed with a another version. + +### **1. Initialize the Phenotype in the Workspace** + +Use the followijng acmc command to initialize the phenotype in a local Git repository: + +```bash +acmc phen init +``` + +### **2. Copy Example Medical Code Lists to the Phenotype Code Directory** + +Copy medical code lists to the phenotype code directory: + +```bash +cp -r ./examples/codes/* ./workspace/phen/codes +``` + +### **3. Copy the Example Phenotype Configuration Files** + +Copy example phenotype configuration files (`.json`) to the phenotype directory: + +```bash +cp -r ./examples/config.json ./workspace/phen +``` + +### **4. Validate the Phenotype Configuration** + +Use the followijng acmc command to validate the phenotype configuration to ensure it's correct: + +```bash +acmc phen validate +``` + +### **5. Generate Phenotype in Read2 Format** + +Use the followijng acmc command to generate the phenotype in `read2` format: -### Running an example workflow +```bash +acmc phen map -t read2 +``` + +### **6. Publish the Phenotype at an Initial Version** + +Use the followijng acmc command to publish the phenotype at an initial version: + +```bash +acmc phen publish +``` + +### **7. Generate Phenotype in SNOMED Format** + +Generate the phenotype in `SNOMED` format: + +```bash +acmc phen map -t snomed +``` + +### **8. Get a Copy of the Previous Version in the Repo** + +Retrieve a copy of the previous version (`v1.0.3`) from the repository: + +```bash +acmc phen copy -v v1.0.3 +``` + +### **9. Compare the Previous Version `v1.0.3` with the Latest Version** +Compare the previous version (`v1.0.3`) with the latest version in the repository: +```bash +python acmc.py phen diff -old ./workspace/v1.0.3/ +``` + +### **10. Publish the Phenotype at the Next Version** + +Use the followijng acmc command to publish the phenotype at the next version: + +```bash +acmc phen publish +``` ## Usage diff --git a/acmc/main.py b/acmc/main.py index 6048c451d83d2921b95d6732bebe848713bc03b7..72a055f323d0c9e099f02cad2b04351e334e5e30 100644 --- a/acmc/main.py +++ b/acmc/main.py @@ -16,7 +16,7 @@ def trud_install(args): def omop_install(args): """Handle the `omop install` command.""" - omop.install(args.omop_dir, args.version) + omop.install(args.omop_zip_file, args.version) def omop_clear(args): """Handle the `omop clear` command.""" @@ -82,8 +82,14 @@ def main(): # omop install omop_install_parser = omop_subparsers.add_parser("install", help="Install OMOP codes within database") - omop_install_parser.add_argument("-d", "--omop-dir", type=str, default=str(omop.VOCAB_PATH.resolve()), help="Directory path to extracted OMOP downloads") - omop_install_parser.add_argument("-v", "--version", required=True, help="OMOP vocabularies release version") + omop_install_parser.add_argument("-f", + "--omop-zip-file", + required=True, + help="Path to downloaded OMOP zip file") + omop_install_parser.add_argument("-v", + "--version", + required=True, + help="OMOP vocabularies release version") omop_install_parser.set_defaults(func=omop_install) # omop clear @@ -100,13 +106,23 @@ def main(): # phen init phen_init_parser = phen_subparsers.add_parser("init", help="Initiatise phenotype directory") - phen_init_parser.add_argument("-d", "--phen-dir", type=str, default=str(phen.DEFAULT_PHEN_PATH.resolve()), help="Phenotype directory") - phen_init_parser.add_argument("-r", "--remote_url", help="URL to remote git repository") + phen_init_parser.add_argument("-d", + "--phen-dir", + type=str, + default=str(phen.DEFAULT_PHEN_PATH.resolve()), + help="Phenotype workspace directory") + phen_init_parser.add_argument("-r", + "--remote_url", + help="URL to remote git repository") phen_init_parser.set_defaults(func=phen_init) # phen validate phen_validate_parser = phen_subparsers.add_parser("validate", help="Validate phenotype configuration") - phen_validate_parser.add_argument("-d", "--phen-dir", type=str, default=str(phen.DEFAULT_PHEN_PATH.resolve()), help="Phenotype directory") + phen_validate_parser.add_argument("-d", + "--phen-dir", + type=str, + default=str(phen.DEFAULT_PHEN_PATH.resolve()), + help="Phenotype workspace directory") phen_validate_parser.set_defaults(func=phen_validate) # phen map @@ -115,7 +131,7 @@ def main(): "--phen-dir", type=str, default=str(phen.DEFAULT_PHEN_PATH.resolve()), - help="Phenotype directory") + help="Phenotype workspace directory") phen_map_parser.add_argument("-t", "--target-coding", required=True, @@ -135,7 +151,7 @@ def main(): "--phen-dir", type=str, default=str(phen.DEFAULT_PHEN_PATH.resolve()), - help="Phenotype directory") + help="Phenotype workspace directory") phen_publish_parser.set_defaults(func=phen_publish) # phen copy @@ -144,7 +160,7 @@ def main(): "--phen-dir", type=str, default=str(phen.DEFAULT_PHEN_PATH.resolve()), - help="Phenotype directory") + help="Phenotype workspace directory") phen_copy_parser.add_argument("-td", "--target-dir", type=str, @@ -163,11 +179,11 @@ def main(): "--phen-dir", type=str, default=str(phen.DEFAULT_PHEN_PATH.resolve()), - help="The directory for the new phenotype version") + help="Directory for the new phenotype version") phen_diff_parser.add_argument("-old", "--phen-dir-old", required=True, - help="The directory of the old phenotype version that is compared to the new one") + help="Directory of the old phenotype version that is compared to the new one") phen_diff_parser.set_defaults(func=phen_diff) # Parse arguments diff --git a/acmc/omop.py b/acmc/omop.py index e5d828a9dd0d8af63ba822b2e28c16cd67abd5e4..fb40413801822dc2b5dcc13e9c02a38b9f052249 100644 --- a/acmc/omop.py +++ b/acmc/omop.py @@ -4,8 +4,10 @@ import sqlite3 import pandas as pd import json import logging +import zipfile from pathlib import Path + from acmc import logging_config # setup logging @@ -34,29 +36,47 @@ vocabularies = { { "id": 154, "name": "NHS Ethnic Category"}, { "id": 155, "name": "NHS Place of Service"} ], - "model": [] + "tables": [] } #Populate SQLite3 Database with default OMOP CONCEPTS -def install (omop_install_folder, version): +def install (omop_zip_file, version): """Installs the OMOP release csv files in a file-based sql database""" - logger.info(f"Installing OMOP database from {omop_install_folder}") + logger.info(f"Installing OMOP downloads {omop_zip_file}") + omop_zip_path = Path(omop_zip_file) - # check folder for omop install files is a directory - omop_install_path = Path(omop_install_folder) - if not omop_install_path.is_dir(): - raise NotADirectoryError(f"Error: '{omop_install_path}' for OMOP installation files is not a directory") + # Check if the file exists and is a ZIP file + if not omop_zip_path.exists(): + msg = f"{omop_zip_path} does not exist." + logger.error(msg) + raise ValueError(msg) + if not zipfile.is_zipfile(omop_zip_path): + msg = f"Error: {omop_zip_path} is not a valid ZIP file." + logger.error(msg) + raise ValueError(msg) + # check codes directory exists and if not create it if not VOCAB_PATH.exists(): VOCAB_PATH.mkdir(parents=True) - logger.debug(f"OMOP directory '{VOCAB_PATH}' created.") - + logger.debug(f"OMOP directory '{VOCAB_PATH}' created.") + else: + # removing existing OMOP files + csv_files = list(VOCAB_PATH.glob("*.csv")) + for file in csv_files: + file.unlink() + logger.debug(f"Deleted OMOP csv file: {file}") + + # Extract ZIP contents + with zipfile.ZipFile(omop_zip_path, 'r') as zip_ref: + zip_ref.extractall(VOCAB_PATH) + logger.info(f"Extracted OMOP zip file {omop_zip_path} to {VOCAB_PATH}/") + # connect to database, if it does not exist it will be created conn = sqlite3.connect(DB_PATH) # Iterate through files in the folder - for filename in os.listdir(omop_install_folder): + for filename in os.listdir(VOCAB_PATH): if filename.endswith(".csv"): # Check if the file is a CSV - file_path = os.path.join(omop_install_folder, filename) + file_path = os.path.join(VOCAB_PATH, filename) try: logger.info(f"Reading table: {file_path}") # read the CSV file with the specified delimiter @@ -67,7 +87,7 @@ def install (omop_install_folder, version): df.to_sql(table_name, conn, if_exists='replace', index=False) # add to the metadata - vocabularies["model"].append(filename) + vocabularies["tables"].append(filename) except Exception as e: raise Exception(f"Error reading file {file_path}: {e}") conn.close() diff --git a/docs/index.md b/docs/index.md new file mode 100644 index 0000000000000000000000000000000000000000..d5c162077744e251cb97815d2b45b5a8746ef946 --- /dev/null +++ b/docs/index.md @@ -0,0 +1,37 @@ +## Overview + +### Workflow + +The high level steps to use the tools are outlined below: + +**1. Define concept sets:** A domain expert defines a list of [concept sets](#defining-concept-sets) for each observable characteristic of the phenotype using CSV file format (e.g., `PHEN_concept_sets.csv`). + +**2. Define concept code lists for concept sets:** A domain expert defines [code lists](#defining-concept-codes) for each concept set within the phenotype using supported coding list formats and stores them in the `/src` directory. + +**3. Define mapping from code lists to concept sets:** A domain expert defines a [phenotype mapping](#mapping-codes-to-concept-sets) that maps code lists to concept sets. + +**4. Generate versioned phenotype coding lists and translations:** A domain expert or data engineer processes the phenotype mappings [using the command line tool](#usage) to validate against NHS TRUD-registered codes and mapping and to generate versioned concept set code lists with translations between coding standards. + +### Supported Medical Coding Standards + +The tool supports verification and mapping across diagnostic coding formats below: + +| Medical Code | Verification | Translation to | +|---------------|--------------|-----------------------------------| +| Readv2 | NHS TRUD | Readv3, SNOMED, ICD10, OPCS4, ATC | +| Readv3 (CTV3) | NHS TRUD | Readv3, SNOMED, ICD10, OPCS4 | +| ICD10 | NHS TRUD | None | +| SNOMED | NHS TRUD | None | +| OPCS4 | NHS TRUD | None | +| ATC | None | None | + +- [**Read V2:**](https://digital.nhs.uk/services/terminology-and-classifications/read-codes) NHS clinical terminology standard used in primary care and replaced by SNOMED-CT in 2018; Still supported by some data providers as widely used in primary care, e.g. [SAIL Databank](https://saildatabank.com/) +- [**SNOMED-CT:**](https://icd.who.int/browse10/2019/en) international standard for clinical terminology for Electronic Healthcare Records adopted by the NHS in 2018; Mappings to Read codes are partially provided by [Clinical Research Practice Database (CPRD)](https://www.cprd.com/) and [NHS Technology Reference Update Distribution (TRUD)](https://isd.digital.nhs.uk/trud). +- [**ICD-10:**](https://icd.who.int/browse10/2019/en) International Classification of Diseases (ICD) is a medical classification list from the World Health Organization (WHO) and widely used in hospital settings, e.g. Hospital Episode Statistics (HES). +- [**ATC Codes:**](https://www.who.int/tools/atc-ddd-toolkit/atc-classification) Anatomical Therapeutic Chemical (ATC) Classification is a drug classification list from the World Health Organization (WHO) + +## Notes + + Processed resources will be saved in the `build/maps/processed/` directory. + +*Note: NHS TRUD provides one-way mappings. To reverse mappings, duplicate the `.parquet` file and reverse the filename (e.g., `read2_code_to_snomed_code.parquet` to `snomed_code_to_read2_code.parquet`).* \ No newline at end of file diff --git a/examples/config1.json b/examples/config.json similarity index 89% rename from examples/config1.json rename to examples/config.json index 00b8ba299f149583166cc0815c8dfdf1a9a3dd81..2acfeae968ed0abe9330a98f763247e8bb1fbaaa 100644 --- a/examples/config1.json +++ b/examples/config.json @@ -4,7 +4,7 @@ "omop": { "vocabulary_id": "ACMC_Example", "vocabulary_name": "ACMC example phenotype", - "vocabulary_reference": "https://www.it-innovation.soton.ac.uk/projects/meldb/concept-processing/example" + "vocabulary_reference": "https://git.soton.ac.uk/meldb/concepts-processing/-/tree/main/examples" }, "concept_set": [ {