diff --git a/docs/index.md b/docs/index.md index ae1a3478131d698e6083c46fa053bbe8f7c83bca..19af06c7b34d650090844d40f4b89bea7619d48f 100644 --- a/docs/index.md +++ b/docs/index.md @@ -13,7 +13,7 @@ - [Change Log](./changelog.md) - [Troubleshooting](./troubleshooting.md) -### Overview +## Overview ### Supported Medical Coding Standards @@ -39,5 +39,51 @@ The tool supports verification and mapping across diagnostic coding formats belo *Note: NHS TRUD provides one-way mappings. To reverse mappings, duplicate the `.parquet` file and reverse the filename (e.g., `read2_code_to_snomed_code.parquet` to `snomed_code_to_read2_code.parquet`).* +## Phenotype Definition + +### **Phenotype directory structure + +```markdown +``` +workspace/ # Default workspace directory +├── phen/ # Default phenotype directory +│ ├── codes/ # Phenotype source concept code lists directory +│ ├── concept-set/ # Processed phenotype concept sets in CSV format +│ ├── map/ # Process mapping from source to target code types +│ │ ├── errors/ # Errors recorded during mapping +│ ├── omop/ # Processed phenotype concept sets in OMOP database CSV files +│ ├── config.yaml # Phenotype configuration file +│ ├── vocab_versions.yaml # Versions file for vocabularies used to generate concept sets +``` +``` + +### **Configuration File** + +Phenotype configuration is stored in the root of the phenotype directory in `config.yaml`. The file is yaml format. + +#### **Root Element** +- `phenotype`: **(object)** The root element containing all phenotype-related concept sets and metadata. + +#### **Phenotype Attributes** +- `version`: **(string)** Specifies the version of the phenotype definition. +- `omop`: **(object)** Metadata related to OMOP vocabulary. + - `vocabulary_id`: **(string)** Identifier for the vocabulary. + - `vocabulary_name`: **(string)** Human-readable name of the vocabulary. + - `vocabulary_reference`: **(string, URL)** A reference URL for the vocabulary source. + +#### **Concept Sets** +- `concept_sets`: **(array)** A list of concept set definitions, where each item has the following attributes: + - `name`: **(string)** Unique name of the concept set. + - `file`: **(object)** Contains file-related metadata. + - `path`: **(string, file path)** Relative path to the source concepts coding list file, relative to `<phen_directory>/codes` + - `columns`: **(object)** Key-value pairs mapping column names in the file to coding list types + - `category` **(optional, string)** A categorical identifier for processing files containing multiple concept sets. + - `actions` **(optional, object)** Additional transformations on data. + - `divide_col`: **(string)** Specifies a column name in the source concept file to group on. + - `metadata`: **(object)** Reserved for additional metadata. + + + +