From 38fc682b472cdac04a0698982092e03a6388930c Mon Sep 17 00:00:00 2001
From: Jakub Dylag <jjd1c23@soton.ac.uk>
Date: Fri, 24 Jan 2025 10:56:47 +0000
Subject: [PATCH] Update README

---
 README.md | 112 +++++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 95 insertions(+), 17 deletions(-)

diff --git a/README.md b/README.md
index f0c7138..c267558 100644
--- a/README.md
+++ b/README.md
@@ -86,32 +86,111 @@ Processed tables will be saved as `.parquet` files in the `maps/processed/` dire
    - Install vocabularies using:  
      `python omop_api.py --install <PATH_TO_DOWNLOADED_FILES>`
 
-## Configuration
+## Configuration 
 
-The mappings from imported code lists to outputted MELD-B concept code lists are defined in JSON format in `PHEN_assign_v3.json`.
+The JSON configuration file specifies how input codes are grouped into **concept sets**, which are collections of related codes used for defining phenotypes or other data subsets. The configuration is divided into two main components: the `"concept_sets"` object and the `"codes"` object. The `"codes"` objects specifies the inputted codes; their filepaths, column names and code types, as well as any formatting actions that maybe be neccessary. The `"concept_sets"` object defines the concept groups each of the inputted codes will be assigned to. All files must be formatted as shown below. 
+```json
+{
+	"concept_sets": {
+	},
+	"codes":[
+	]
+}
+```
+
+> **_EXAMPLE:_**  Configuration file used in the MELD-B project: https://git.soton.ac.uk/meldb/concepts/-/blob/main/PHEN_assign_v3.json?ref_type=heads 
+
+
+### Folder and File Definitions
+
+The `"codes"` section defines the location and description of all input files required for processing. Each `"folder"` is defined as an object of within the `"codes"` list. Similarily all files are objects within the `"files"` list.
+
+- **`folder`**: Specifies the directory containing the input files.  
+- **`description`**: Provides a brief summary of the content or purpose of the files, often including additional context such as the date the data was downloaded.  
+- **`files`**: Lists the files within the specified folder. Each file is represented as an object with the key `"file"` and the file name as its value. Definitions of the columns in each file are detailed below.
 
-### Folder and File Definitions:
 ```json
-"folder":"codes/Medication code source",
-"description":"Medication Codes - downloaded 15/12/23",
-"files": [
-		{
-			"file":"WP02_SAIL_WILK_matched_drug_codes_with_categories.xlsx"
-		}
+"codes":[
+	{
+		"folder": "codes/Medication code source",
+		"description": "Medication Codes - downloaded 15/12/23",
+		"files": [
+			{
+				"file": "WP02_SAIL_WILK_matched_drug_codes_with_categories.xlsx"
+			}
+		]
+	}
 ]
 ```
 
-### Columns in Files:
+### Column Definitions in Files
+The `"columns"` property within a file object specifies the type and corresponding names of columns in the input file. Each key in the object represents a column type, while the associated value denotes the name of the column in the input file. 
+
+The supported column types include:
+- **`read2_code`**: Read Version 2 codes  
+- **`read3_code`**: Read Version 3 codes  
+- **`icd10_code`**: International Classification of Diseases, 10th Revision  
+- **`snomed_code`**: SNOMED-CT codes  
+- **`opcs4_code`**: OPCS Classification of Interventions and Procedures, Version 4  
+- **`atc_code`**: Anatomical Therapeutic Chemical classification codes  
+
+Additionally, the `"metadata"` object ensures that any remaining columns not explicitly categorized by the supported column types are preserved in the output file. These columns are specified as an array of column names to be copied directly.
+
 ```json
-"columns":{
-	"read2_code":"READCODE",
-	"metadata":["DESCRIPTION"]
-},
+"files": [
+	{
+		"file":"WP02_SAIL_WILK_matched_drug_codes_with_categories.xlsx",
+		"columns": {
+			"read2_code": "READCODE",
+			"metadata": ["DESCRIPTION"]
+		}
+	}
+]
 ```
 
+
+
 ### Concept Set Assigment
+
+The `"concept_sets"` object defines the structure and rules for grouping input codes into concept sets based on a source CSV file. Key elements include:
+
+- **`file`**: Specifies the CSV file used as the input for defining concept sets. 
+
+- **`version`**: Identifies the version of the concept set definitions being used. This can help track changes over time.  
+
+- **`columns`**: Describes the mapping of specific column names in the CSV file to attributes of the concept sets. Supported keys are:
+  - **`concept_set_name`**: Maps to the column specifying the name of the concept set.  
+  - **`concept_set_status`**: Maps to the column indicating the status of the concept set. Only concept sets the **"AGREED"** status will be outputted! 
+  - **`metadata`**: A list of additional columns in the CSV file that should be copied to the output for descriptive or contextual purposes.
+
+The `"codes"` object specifies the source files containing input codes and assigns them to the corresponding concept sets through the `"meldb_phenotypes"` field. 
+
+ - **`meldb_phenotypes`**: Lists the concept sets to which all codes within this file will be assigned.
+
 ```json
-"meldb_phenotypes": ["ALL_MEDICATIONS"]
+{
+	"concept_sets": {
+		"file":"PHEN_summary_working.csv",
+		"version":"3.2.10",
+		"columns":{
+			"concept_set_name":"CONCEPT NAME ",
+			"concept_set_status":"AGREED",
+			"metadata":["CONCEPT TYPE"]
+		}
+	},
+	"codes":[
+		{
+			"folder": "codes/Medication code source",
+			"description": "Medication Codes - downloaded 15/12/23",
+			"files": [
+				{
+					"file": "WP02_SAIL_WILK_matched_drug_codes_with_categories.xlsx",
+					"meldb_phenotypes": ["ALL_MEDICATIONS"]
+				}
+			]
+		}
+	]
+}
 ```
 
 ### Additional preprocessing (if required):
@@ -152,9 +231,8 @@ Script preprocess code lists and to map to given concept/phenotype
 ### Execute Command Line
 Execute via shell with customizable parameters:
 ```bash
-python main.py [OPTIONS] mapping_file
+python main.py [-h] [-r2] [-r3] [-i] [-s] [-o] [-a] [--no-translate] [--no-verify] [--output] [--error-log] mapping_file
 ```
-usage: `python main.py [-h] [-r2] [-r3] [-i] [-s] [-o] [-a] [--no-translate] [--no-verify] [--output] [--error-log] mapping_file`
 
 **Required Arguments:**
   - `mapping_file`         Concept/Phenotype Assignment File (json)
-- 
GitLab