Skip to content
Snippets Groups Projects
Commit d35aa813 authored by mjbonifa's avatar mjbonifa
Browse files

moved image

parent 50a29145
No related branches found
No related tags found
No related merge requests found
# A Tool for Automating the Curation of Medical Concepts derived from Coding Lists (ACMC)
<center> <center>
<img src="docs/img/University_of_Southampton_Logo.png" height="100" style="padding-right: 50px;" /> <img src="docs/img/University_of_Southampton_Logo.png" height="100" style="padding-right: 50px;" />
<img src="docs/img/swansea-university-logo-vector.png" height="100" /> <img src="docs/img/swansea-university-logo-vector.png" height="100" />
</center> </center>
# A Tool for Automating the Curation of Medical Concepts derived from Coding Lists (ACMC)
### Jakub J. Dylag <sup>1</sup>, Roberta Chiovoloni <sup>3</sup>, Ashley Akbari <sup>3</sup>, Simon D. Fraser <sup>2</sup>, Michael J. Boniface <sup>1</sup> ### Jakub J. Dylag <sup>1</sup>, Roberta Chiovoloni <sup>3</sup>, Ashley Akbari <sup>3</sup>, Simon D. Fraser <sup>2</sup>, Michael J. Boniface <sup>1</sup>
<sup>1</sup> Digital Health and Biomedical Engineering, School of Electronics and Computer Science, Faculty of Engineering and Physical Sciences, University of Southampton<br> <sup>1</sup> Digital Health and Biomedical Engineering, School of Electronics and Computer Science, Faculty of Engineering and Physical Sciences, University of Southampton<br>
...@@ -139,141 +139,7 @@ Please execute the following process: ...@@ -139,141 +139,7 @@ Please execute the following process:
acmc omop install -d <Directory path to extracted OMOP downloads> -v <release version from email> acmc omop install -d <Directory path to extracted OMOP downloads> -v <release version from email>
``` ```
## Defining phenotypes ### Running an example workflow
Phenotypes are defined in a JSON configuration file. The file describes how source **concept codes** (i.e. a code list) are mapped to the collection of **concept set** included in the phenotype.
* **concept_sets**: defines the collection of observable characteristics of the phenotype. See Observational Health Data Sciences and Informatics (OHDSI) definition for [Concept Set](https://github.com/OHDSI/Atlas/wiki/Concept-Sets)
* **codes**: defines lists of source concept codes associated with a specific concept set and declarative actions (e.g. filepaths, column names, code types, actions) to process source concept code files. See OMOP Common Data Model definition for [Concept Codes](https://ohdsi.github.io/TheBookOfOhdsi/StandardizedVocabularies.html#concept-codes)
An example concept set and code list for Abdominal Pain is show below:
```json
{
"concept_sets": {
"version": "3.2.10",
"omop": {
"vocabulary_id": "MELDB",
"vocabulary_name": "Multidisciplinary Ecosystem to study Lifecourse Determinants and Prevention of Early-onset Burdensome Multimorbidity",
"vocabulary_reference": "https://www.it-innovation.soton.ac.uk/projects/meldb"
},
"concept_set": [
{
"concept_set_name": "ABDO_PAIN",
"concept_set_status": "AGREED",
"metadata": {
"#": "18",
"CONCEPT DESCRIPTION": "Abdominal pain",
"CONCEPT TYPE": "Workload indicator (symptom)",
"DATE ADDED ": "2023-08-25",
"REQUEST REASON ": "Clinician SF - requested by email - symptom example from Qualitative Evidence Synthesis",
"SOURCE INFO": "YES",
"FUNCTION": "QUERY BY CODING LIST",
"FUNCTION.1": "https://clinicalcodes.rss.mhs.man.ac.uk/",
"CODING LIST": "https://git.soton.ac.uk/meld/meldb-external/phenotype/-/tree/main/codes/ClinicalCodes.org%20from%20the%20University%20of%20Manchester/Symptom%20code%20lists/Abdominal%20pain/res176-abdominal-pain.csv ",
"NOTES": "2023-09-08: Clinical SF confirmed that the clinical view would be that this would need to be recurrent or persistent."
}
}
]
},
"codes": [
{
"folder": "clinical-codes-org",
"description": "SF's clinical codes - downloaded 16/11/23",
"files": [
{
"file": "Symptom code lists/Abdominal pain/res176-abdominal-pain.csv",
"columns": {
"read2_code": "code",
"metadata": [
"description"
]
},
"concept_set": [
"ABDO_PAIN"
]
}
]
}
]
}
```
A full example of the phenotype for burdensome multiple long term conditions from the MELDB project can be found [here](https://git.soton.ac.uk/meldb/concepts/-/blob/main/PHEN_assign_v3.json?ref_type=heads)
### Defining concept sets
The `"concept_sets"` object defines the structure for grouping input codes into concept sets based on source concepts. Key elements include:
- **`version`**: Identifies the version of the concept set definitions being used. This can help track changes over time.
- **`concept_set`**: Defines a list of concept_set objects along with their attributes:
- **`concept_set_name`**: Specifies the name of the concept set.
- **`concept_set_status`**: Indicates the status of the concept set. Only concept sets the **"AGREED"** status will be outputted!
- **`metadata`** (optional): A list of additional properties that will be copied into the output. Can be used for descriptive or contextual purposes.
### Defining concept codes
The `"codes"` object defines the location and description of all input medical coding lists required for processing. Each `"folder"` is defined as an object within the `"codes"` list. Similarily all files are objects within the `"files"` list.
* **`folder`**: Specifies the directory containing the input files.
* **`description`**: Provides a brief summary of the content or purpose of the files, often including additional context such as the date the data was downloaded.
* **`files`**: Lists the files within the specified folder. Each file is represented as an object with the key `"file"` and the file name as its value. Definitions of the columns in each file are detailed below.
### Mapping source column definitions in files to standard vocabulary types
The `"columns"` property within a file object specifies the type and corresponding names of columns in the input file. Each key in the object represents a column type, while the associated value denotes the name of the column in the input file.
The supported column types include:
* **`read2_code`**: Read Version 2 codes
* **`read3_code`**: Read Version 3 codes
* **`icd10_code`**: International Classification of Diseases, 10th Revision
* **`snomed_code`**: SNOMED-CT codes
* **`opcs4_code`**: OPCS Classification of Interventions and Procedures, Version 4
* **`atc_code`**: Anatomical Therapeutic Chemical classification codes
Additionally, the `"metadata"` object ensures that any remaining columns not explicitly categorized by the supported column types are preserved in the output file. These columns are specified as an array of column names to be copied directly.
### Mapping codes to concept sets
The `"codes"` object are mapping to a corresponding concept sets through the `"concept_set"` field.
* **`concept_set`**: Lists the concept sets to which all codes within this file will be assigned.
### Additional preprocessing actions supported
In certain cases where you wish to sub-divde a code list table or a column features multiple code types additional processing is required. Add a `action` object inside of the `file` object.
#### Table with a sub-categorical column:
In order to sub-divide a table by a categorical column use the "divide_col" action
```json
"actions":{
"divide_col": "MMCode"
}
```
#### Table with multiple code types in single column:
Need to split column into multiple columns, so only one code type per column.
* The "split_col" attribute is the categorical column indicating the code type in that row. The <b>category names should replace column</b> names in the `columns` properties.
* The "codes_col" attribute is the code column with mulitple code types in a single column
```json
"actions":{
"split_col":"coding_system",
"codes_col":"code"
},
"columns":{
"read2_code":"Read codes v2",
"med_code":"Med codes",
"icd10_code":"ICD10 codes",
"metadata":["description"]
},
```
**<b>Large Code lists</b> with numerous phenotypes (e.g. Ho et al), require lots of JSON to be generated. See the "Ho generate JSON" section in process_codes_WP.ipynb for example code to generate*
## Usage ## Usage
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment