From 31a8ef935ed63e0261dbc31e75c6aafe0ad76f0c Mon Sep 17 00:00:00 2001
From: Michael Boniface <m.j.boniface@soton.ac.uk>
Date: Tue, 25 Feb 2025 12:37:14 +0000
Subject: [PATCH] docs: added directory docs and yaml config doc file

---
 docs/index.md | 48 +++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 47 insertions(+), 1 deletion(-)

diff --git a/docs/index.md b/docs/index.md
index ae1a347..19af06c 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -13,7 +13,7 @@
   - [Change Log](./changelog.md)
   - [Troubleshooting](./troubleshooting.md)
 
-### Overview
+## Overview
 
 ### Supported Medical Coding Standards
 
@@ -39,5 +39,51 @@ The tool supports verification and mapping across diagnostic coding formats belo
 
 *Note: NHS TRUD provides one-way mappings. To reverse mappings, duplicate the `.parquet` file and reverse the filename (e.g., `read2_code_to_snomed_code.parquet` to `snomed_code_to_read2_code.parquet`).*
 
+## Phenotype Definition
+
+### **Phenotype directory structure
+
+```markdown
+```
+workspace/                          # Default workspace directory
+├── phen/                           # Default phenotype directory
+│   ├── codes/                      # Phenotype source concept code lists directory
+│   ├── concept-set/                # Processed phenotype concept sets in CSV format
+│   ├── map/                         # Process mapping from source to target code types
+│   │   ├── errors/                 # Errors recorded during mapping
+│   ├── omop/                        # Processed phenotype concept sets in OMOP database CSV files
+│   ├── config.yaml                  # Phenotype configuration file
+│   ├── vocab_versions.yaml          # Versions file for vocabularies used to generate concept sets
+```
+```  
+
+### **Configuration File**
+
+Phenotype configuration is stored in the root of the phenotype directory in `config.yaml`. The file is yaml format. 
+
+#### **Root Element**  
+- `phenotype`: **(object)** The root element containing all phenotype-related concept sets and metadata.
+
+#### **Phenotype Attributes**  
+- `version`: **(string)** Specifies the version of the phenotype definition.  
+- `omop`: **(object)** Metadata related to OMOP vocabulary.  
+  - `vocabulary_id`: **(string)** Identifier for the vocabulary.  
+  - `vocabulary_name`: **(string)** Human-readable name of the vocabulary.  
+  - `vocabulary_reference`: **(string, URL)** A reference URL for the vocabulary source.  
+
+#### **Concept Sets**  
+- `concept_sets`: **(array)** A list of concept set definitions, where each item has the following attributes:  
+  - `name`: **(string)** Unique name of the concept set.  
+  - `file`: **(object)** Contains file-related metadata.  
+    - `path`: **(string, file path)** Relative path to the source concepts coding list file, relative to `<phen_directory>/codes`
+    - `columns`: **(object)** Key-value pairs mapping column names in the file to coding list types 
+  - `category` **(optional, string)** A categorical identifier for processing files containing multiple concept sets.  
+  - `actions` **(optional, object)** Additional transformations on data.  
+    - `divide_col`: **(string)** Specifies a column name in the source concept file to group on.  
+  - `metadata`: **(object)** Reserved for additional metadata.  
+
+
+
+
 
 
-- 
GitLab