Skip to content
Snippets Groups Projects
Commit 38fc682b authored by Jakub Dylag's avatar Jakub Dylag
Browse files

Update README

parent 55bbd31a
No related branches found
No related tags found
No related merge requests found
...@@ -86,32 +86,111 @@ Processed tables will be saved as `.parquet` files in the `maps/processed/` dire ...@@ -86,32 +86,111 @@ Processed tables will be saved as `.parquet` files in the `maps/processed/` dire
- Install vocabularies using: - Install vocabularies using:
`python omop_api.py --install <PATH_TO_DOWNLOADED_FILES>` `python omop_api.py --install <PATH_TO_DOWNLOADED_FILES>`
## Configuration ## Configuration
The mappings from imported code lists to outputted MELD-B concept code lists are defined in JSON format in `PHEN_assign_v3.json`. The JSON configuration file specifies how input codes are grouped into **concept sets**, which are collections of related codes used for defining phenotypes or other data subsets. The configuration is divided into two main components: the `"concept_sets"` object and the `"codes"` object. The `"codes"` objects specifies the inputted codes; their filepaths, column names and code types, as well as any formatting actions that maybe be neccessary. The `"concept_sets"` object defines the concept groups each of the inputted codes will be assigned to. All files must be formatted as shown below.
```json
{
"concept_sets": {
},
"codes":[
]
}
```
> **_EXAMPLE:_** Configuration file used in the MELD-B project: https://git.soton.ac.uk/meldb/concepts/-/blob/main/PHEN_assign_v3.json?ref_type=heads
### Folder and File Definitions
The `"codes"` section defines the location and description of all input files required for processing. Each `"folder"` is defined as an object of within the `"codes"` list. Similarily all files are objects within the `"files"` list.
- **`folder`**: Specifies the directory containing the input files.
- **`description`**: Provides a brief summary of the content or purpose of the files, often including additional context such as the date the data was downloaded.
- **`files`**: Lists the files within the specified folder. Each file is represented as an object with the key `"file"` and the file name as its value. Definitions of the columns in each file are detailed below.
### Folder and File Definitions:
```json ```json
"folder":"codes/Medication code source", "codes":[
"description":"Medication Codes - downloaded 15/12/23", {
"files": [ "folder": "codes/Medication code source",
{ "description": "Medication Codes - downloaded 15/12/23",
"file":"WP02_SAIL_WILK_matched_drug_codes_with_categories.xlsx" "files": [
} {
"file": "WP02_SAIL_WILK_matched_drug_codes_with_categories.xlsx"
}
]
}
] ]
``` ```
### Columns in Files: ### Column Definitions in Files
The `"columns"` property within a file object specifies the type and corresponding names of columns in the input file. Each key in the object represents a column type, while the associated value denotes the name of the column in the input file.
The supported column types include:
- **`read2_code`**: Read Version 2 codes
- **`read3_code`**: Read Version 3 codes
- **`icd10_code`**: International Classification of Diseases, 10th Revision
- **`snomed_code`**: SNOMED-CT codes
- **`opcs4_code`**: OPCS Classification of Interventions and Procedures, Version 4
- **`atc_code`**: Anatomical Therapeutic Chemical classification codes
Additionally, the `"metadata"` object ensures that any remaining columns not explicitly categorized by the supported column types are preserved in the output file. These columns are specified as an array of column names to be copied directly.
```json ```json
"columns":{ "files": [
"read2_code":"READCODE", {
"metadata":["DESCRIPTION"] "file":"WP02_SAIL_WILK_matched_drug_codes_with_categories.xlsx",
}, "columns": {
"read2_code": "READCODE",
"metadata": ["DESCRIPTION"]
}
}
]
``` ```
### Concept Set Assigment ### Concept Set Assigment
The `"concept_sets"` object defines the structure and rules for grouping input codes into concept sets based on a source CSV file. Key elements include:
- **`file`**: Specifies the CSV file used as the input for defining concept sets.
- **`version`**: Identifies the version of the concept set definitions being used. This can help track changes over time.
- **`columns`**: Describes the mapping of specific column names in the CSV file to attributes of the concept sets. Supported keys are:
- **`concept_set_name`**: Maps to the column specifying the name of the concept set.
- **`concept_set_status`**: Maps to the column indicating the status of the concept set. Only concept sets the **"AGREED"** status will be outputted!
- **`metadata`**: A list of additional columns in the CSV file that should be copied to the output for descriptive or contextual purposes.
The `"codes"` object specifies the source files containing input codes and assigns them to the corresponding concept sets through the `"meldb_phenotypes"` field.
- **`meldb_phenotypes`**: Lists the concept sets to which all codes within this file will be assigned.
```json ```json
"meldb_phenotypes": ["ALL_MEDICATIONS"] {
"concept_sets": {
"file":"PHEN_summary_working.csv",
"version":"3.2.10",
"columns":{
"concept_set_name":"CONCEPT NAME ",
"concept_set_status":"AGREED",
"metadata":["CONCEPT TYPE"]
}
},
"codes":[
{
"folder": "codes/Medication code source",
"description": "Medication Codes - downloaded 15/12/23",
"files": [
{
"file": "WP02_SAIL_WILK_matched_drug_codes_with_categories.xlsx",
"meldb_phenotypes": ["ALL_MEDICATIONS"]
}
]
}
]
}
``` ```
### Additional preprocessing (if required): ### Additional preprocessing (if required):
...@@ -152,9 +231,8 @@ Script preprocess code lists and to map to given concept/phenotype ...@@ -152,9 +231,8 @@ Script preprocess code lists and to map to given concept/phenotype
### Execute Command Line ### Execute Command Line
Execute via shell with customizable parameters: Execute via shell with customizable parameters:
```bash ```bash
python main.py [OPTIONS] mapping_file python main.py [-h] [-r2] [-r3] [-i] [-s] [-o] [-a] [--no-translate] [--no-verify] [--output] [--error-log] mapping_file
``` ```
usage: `python main.py [-h] [-r2] [-r3] [-i] [-s] [-o] [-a] [--no-translate] [--no-verify] [--output] [--error-log] mapping_file`
**Required Arguments:** **Required Arguments:**
- `mapping_file` Concept/Phenotype Assignment File (json) - `mapping_file` Concept/Phenotype Assignment File (json)
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment