added usage example to readme

ed81d814 · mjbonifa · e6fa6be4 · ed81d814 · ed81d814 · ed81d814
Commit ed81d814 authored 3 months ago by mjbonifa
--- a/README.md
+++ b/README.md
@@ -73,7 +73,7 @@ Once installed, you'll be ready to use the `acmc` tool along with the associated
 4. **Download and Install TRUD Resources**:
-	Run the following acmc command to download and process the TRUD resources:
+	Run the following `acmc` command to download and process the TRUD resources:
   ```bash
   acmc trud install
@@ -83,7 +83,7 @@ Once installed, you'll be ready to use the `acmc` tool along with the associated
 1. **Register with [OHDSI Athena](https://athena.ohdsi.org/auth/login)**
-2. **Request download of vocabularies from [OHDSI Athena](https://athena.ohdsi.org/vocabulary/list)**
+2. **Download vocabularies from [OHDSI Athena](https://athena.ohdsi.org/vocabulary/list)**
 	* Required vocabularies include:
 	  * 1) SNOMED
@@ -127,19 +127,100 @@ Please execute the following process:
    Load the unpacked files into the tables.
 ```
-4. **Un-zip the downloaded OMOP files to a directory**
+	Download the OMOP file onto your computer and note the path to the file
-	Create a directory where you want the OMOP CSV tables to be stored, the default from the current working directory is `./build/omop`. Unzip the OMOP files into that directory
+4. **Install OMOP vocabularies**
-5. **Install OMOP vocabularies**
+	Run the following `acmc` command to create a local OMOP database from the OMOP zip file with a specific version:
-	Run the following acmc command to create a local OMOP database from the download:
 	```bash
-	acmc omop install -d <Directory path to extracted OMOP downloads> -v <release version from email>
+	acmc omop install -f <path to downloaded OMOP zip file> -v <release version from email>
 	```
+Here's a well-structured **README.md** example with the series of steps you provided:
+---
+## **Example**
+Follow these steps to initialize and manage a phenotype using `acmc`. In this example, we use a source concept code list for the Concept Set `Abdominal Pain` created from [ClinicalCodes.org](ClinicalCodes.org). The source concept codes are is read2. We genereate versioned phenotypes for read2 and then translate to snomed with a another version.  
+### **1. Initialize the Phenotype in the Workspace**
+Use the followijng acmc command to initialize the phenotype in a local Git repository:
+```bash
+acmc phen init
+```
+### **2. Copy Example Medical Code Lists to the Phenotype Code Directory**
+Copy medical code lists to the phenotype code directory:
+```bash
+cp -r ./examples/codes/* ./workspace/phen/codes
+```
+### **3. Copy the Example Phenotype Configuration Files**
+Copy example phenotype configuration files (`.json`) to the phenotype directory:
+```bash
+cp -r ./examples/config.json ./workspace/phen
+```
+### **4. Validate the Phenotype Configuration**
+Use the followijng acmc command to validate the phenotype configuration to ensure it's correct:
+```bash
+acmc phen validate
+```
+### **5. Generate Phenotype in Read2 Format**
+Use the followijng acmc command to generate the phenotype in `read2` format:
-### Running an example workflow
+```bash
+acmc phen map -t read2
+```
+### **6. Publish the Phenotype at an Initial Version**
+Use the followijng acmc command to publish the phenotype at an initial version:
+```bash
+acmc phen publish
+```
+### **7. Generate Phenotype in SNOMED Format**
+Generate the phenotype in `SNOMED` format:
+```bash
+acmc phen map -t snomed
+```
+### **8. Get a Copy of the Previous Version in the Repo**
+Retrieve a copy of the previous version (`v1.0.3`) from the repository:
+```bash
+acmc phen copy -v v1.0.3
+```
+### **9. Compare the Previous Version `v1.0.3` with the Latest Version**
+Compare the previous version (`v1.0.3`) with the latest version in the repository:
+```bash
+python acmc.py phen diff -old ./workspace/v1.0.3/
+```
+### **10. Publish the Phenotype at the Next Version**
+Use the followijng acmc command to publish the phenotype at the next version:
+```bash
+acmc phen publish
+```
 ## Usage

--- a/acmc/main.py
+++ b/acmc/main.py
@@ -16,7 +16,7 @@ def trud_install(args):
 def omop_install(args):
    """Handle the `omop install` command."""
-    omop.install(args.omop_dir, args.version)
+    omop.install(args.omop_zip_file, args.version)
 def omop_clear(args):
    """Handle the `omop clear` command."""
@@ -82,8 +82,14 @@ def main():
 	# omop install
 	omop_install_parser = omop_subparsers.add_parser("install", help="Install OMOP codes within database")
-	omop_install_parser.add_argument("-d", "--omop-dir", type=str, default=str(omop.VOCAB_PATH.resolve()), help="Directory path to extracted OMOP downloads")
+	omop_install_parser.add_argument("-f",
-	omop_install_parser.add_argument("-v", "--version", required=True, help="OMOP vocabularies release version")	
+									 "--omop-zip-file",
+									 required=True,
+									 help="Path to downloaded OMOP zip file")
+	omop_install_parser.add_argument("-v",
+									 "--version",
+									 required=True,
+									 help="OMOP vocabularies release version")	
 	omop_install_parser.set_defaults(func=omop_install)
 	# omop clear
@@ -100,13 +106,23 @@ def main():
 	# phen init
 	phen_init_parser = phen_subparsers.add_parser("init", help="Initiatise phenotype directory")
-	phen_init_parser.add_argument("-d", "--phen-dir", type=str, default=str(phen.DEFAULT_PHEN_PATH.resolve()), help="Phenotype directory")
+	phen_init_parser.add_argument("-d",
-	phen_init_parser.add_argument("-r", "--remote_url", help="URL to remote git repository")	
+								  "--phen-dir",
+								  type=str,
+								  default=str(phen.DEFAULT_PHEN_PATH.resolve()),
+								  help="Phenotype workspace directory")
+	phen_init_parser.add_argument("-r",
+								  "--remote_url",
+								  help="URL to remote git repository")	
 	phen_init_parser.set_defaults(func=phen_init)
 	# phen validate
 	phen_validate_parser = phen_subparsers.add_parser("validate", help="Validate phenotype configuration")
-	phen_validate_parser.add_argument("-d", "--phen-dir", type=str, default=str(phen.DEFAULT_PHEN_PATH.resolve()), help="Phenotype directory")
+	phen_validate_parser.add_argument("-d",
+									  "--phen-dir",
+									  type=str,
+									  default=str(phen.DEFAULT_PHEN_PATH.resolve()),
+									  help="Phenotype workspace directory")
 	phen_validate_parser.set_defaults(func=phen_validate)
 	# phen map
@@ -115,7 +131,7 @@ def main():
 								 "--phen-dir",
 								 type=str,
 								 default=str(phen.DEFAULT_PHEN_PATH.resolve()),
-								 help="Phenotype directory")
+								 help="Phenotype workspace directory")
 	phen_map_parser.add_argument("-t",
 								 "--target-coding",
 								 required=True,
@@ -135,7 +151,7 @@ def main():
 									 "--phen-dir",
 									 type=str,
 									 default=str(phen.DEFAULT_PHEN_PATH.resolve()),
-									 help="Phenotype directory")
+									 help="Phenotype workspace directory")
 	phen_publish_parser.set_defaults(func=phen_publish)
 	# phen copy
@@ -144,7 +160,7 @@ def main():
 								  "--phen-dir",
 								  type=str,
 								  default=str(phen.DEFAULT_PHEN_PATH.resolve()),
-								  help="Phenotype directory")
+								  help="Phenotype workspace directory")
 	phen_copy_parser.add_argument("-td",
 								  "--target-dir",
 								  type=str,
@@ -163,11 +179,11 @@ def main():
 								  "--phen-dir",
 								  type=str,
 								  default=str(phen.DEFAULT_PHEN_PATH.resolve()),
-								  help="The directory for the new phenotype version")
+								  help="Directory for the new phenotype version")
 	phen_diff_parser.add_argument("-old",
 								  "--phen-dir-old",
 								  required=True,
-								  help="The directory of the old phenotype version that is compared to the new one")	
+								  help="Directory of the old phenotype version that is compared to the new one")	
 	phen_diff_parser.set_defaults(func=phen_diff)	
 	# Parse arguments

--- a/acmc/omop.py
+++ b/acmc/omop.py
@@ -4,8 +4,10 @@ import sqlite3
 import pandas as pd
 import json
 import logging
+import zipfile
 from pathlib import Path
 from acmc import logging_config
 # setup logging
@@ -34,29 +36,47 @@ vocabularies = {
 		{ "id": 154, "name": "NHS Ethnic Category"},
 		{ "id": 155, "name": "NHS Place of Service"}
 	],
-	"model": []
+	"tables": []
 }
 #Populate SQLite3 Database with default OMOP CONCEPTS 
-def install (omop_install_folder, version):
+def install (omop_zip_file, version):
 	"""Installs the OMOP release csv files in a file-based sql database"""
-	logger.info(f"Installing OMOP database from {omop_install_folder}")
+	logger.info(f"Installing OMOP downloads {omop_zip_file}")
+	omop_zip_path = Path(omop_zip_file)
-	# check folder for omop install files is a directory
+	# Check if the file exists and is a ZIP file
-	omop_install_path = Path(omop_install_folder) 
+	if not omop_zip_path.exists():
-	if not omop_install_path.is_dir():
+		msg = f"{omop_zip_path} does not exist."
-		raise NotADirectoryError(f"Error: '{omop_install_path}' for OMOP installation files is not a directory")    
+		logger.error(msg)
+		raise ValueError(msg)
+	if not zipfile.is_zipfile(omop_zip_path):
+		msg = f"Error: {omop_zip_path} is not a valid ZIP file."
+		logger.error(msg)
+		raise ValueError(msg)
 	# check codes directory exists and if not create it
 	if not VOCAB_PATH.exists():  
 		VOCAB_PATH.mkdir(parents=True)
-		logger.debug(f"OMOP directory '{VOCAB_PATH}' created.")    
+		logger.debug(f"OMOP directory '{VOCAB_PATH}' created.")
+	else:
+		# removing existing OMOP files
+		csv_files = list(VOCAB_PATH.glob("*.csv"))
+		for file in csv_files:
+			file.unlink()  
+			logger.debug(f"Deleted OMOP csv file: {file}")
+	# Extract ZIP contents
+	with zipfile.ZipFile(omop_zip_path, 'r') as zip_ref:
+		zip_ref.extractall(VOCAB_PATH)
+		logger.info(f"Extracted OMOP zip file {omop_zip_path} to {VOCAB_PATH}/")
 	# connect to database, if it does not exist it will be created
 	conn = sqlite3.connect(DB_PATH)    
 	# Iterate through files in the folder
-	for filename in os.listdir(omop_install_folder):
+	for filename in os.listdir(VOCAB_PATH):
 		if filename.endswith(".csv"):  # Check if the file is a CSV
-			file_path = os.path.join(omop_install_folder, filename)
+			file_path = os.path.join(VOCAB_PATH, filename)
 			try:
 				logger.info(f"Reading table: {file_path}")
 				# read the CSV file with the specified delimiter
@@ -67,7 +87,7 @@ def install (omop_install_folder, version):
 				df.to_sql(table_name, conn, if_exists='replace', index=False)
 				# add to the metadata
-				vocabularies["model"].append(filename)
+				vocabularies["tables"].append(filename)
 			except Exception as e:
 				raise Exception(f"Error reading file {file_path}: {e}")
 	conn.close()

--- a/docs/index.md
+++ b/docs/index.md
+## Overview
+### Workflow
+The high level steps to use the tools are outlined below:
+**1. Define concept sets:** A domain expert defines a list of [concept sets](#defining-concept-sets) for each observable characteristic of the phenotype using CSV file format (e.g., `PHEN_concept_sets.csv`).
+**2. Define concept code lists for concept sets:** A domain expert defines [code lists](#defining-concept-codes) for each concept set within the phenotype using supported coding list formats and stores them in the `/src` directory.
+**3. Define mapping from code lists to concept sets:** A domain expert defines a [phenotype mapping](#mapping-codes-to-concept-sets) that maps code lists to concept sets.
+**4. Generate versioned phenotype coding lists and translations:** A domain expert or data engineer processes the phenotype mappings [using the command line tool](#usage) to validate against NHS TRUD-registered codes and mapping and to generate versioned concept set code lists with translations between coding standards. 
+### Supported Medical Coding Standards
+The tool supports verification and mapping across diagnostic coding formats below:
+| Medical Code  | Verification | Translation to                    |
+|---------------|--------------|-----------------------------------|
+| Readv2        | NHS TRUD     | Readv3, SNOMED, ICD10, OPCS4, ATC |
+| Readv3 (CTV3) | NHS TRUD     | Readv3, SNOMED, ICD10, OPCS4      |
+| ICD10         | NHS TRUD     | None                              |
+| SNOMED        | NHS TRUD     | None                              |
+| OPCS4         | NHS TRUD     | None                              |
+| ATC           | None         | None                              |
+- [**Read V2:**](https://digital.nhs.uk/services/terminology-and-classifications/read-codes) NHS clinical terminology standard used in primary care and replaced by SNOMED-CT in 2018; Still supported by some data providers as widely used in primary care, e.g. [SAIL Databank](https://saildatabank.com/)
+- [**SNOMED-CT:**](https://icd.who.int/browse10/2019/en) international standard for clinical terminology for Electronic Healthcare Records adopted by the NHS in 2018; Mappings to Read codes are partially provided by [Clinical Research Practice Database (CPRD)](https://www.cprd.com/) and [NHS Technology Reference Update Distribution (TRUD)](https://isd.digital.nhs.uk/trud).
+- [**ICD-10:**](https://icd.who.int/browse10/2019/en) International Classification of Diseases (ICD) is a medical classification list from the World Health Organization (WHO) and widely used in hospital settings, e.g. Hospital Episode Statistics (HES).
+- [**ATC Codes:**](https://www.who.int/tools/atc-ddd-toolkit/atc-classification) Anatomical Therapeutic Chemical (ATC) Classification is a drug classification list from the World Health Organization (WHO)
+## Notes
+   Processed resources will be saved in the `build/maps/processed/` directory.
+*Note: NHS TRUD provides one-way mappings. To reverse mappings, duplicate the `.parquet` file and reverse the filename (e.g., `read2_code_to_snomed_code.parquet` to `snomed_code_to_read2_code.parquet`).*
\ No newline at end of file
--- a/examples/config1.json
+++ b/examples/config1.json
@@ -4,7 +4,7 @@
        "omop": {
            "vocabulary_id": "ACMC_Example",
            "vocabulary_name": "ACMC example phenotype",
-            "vocabulary_reference": "https://www.it-innovation.soton.ac.uk/projects/meldb/concept-processing/example"
+            "vocabulary_reference": "https://git.soton.ac.uk/meldb/concepts-processing/-/tree/main/examples"
        },
        "concept_set": [
            {