Skip to content
Snippets Groups Projects
Commit e350060b authored by mjbonifa's avatar mjbonifa
Browse files

(docs) testing mkdocs and readthedocs integratino

parent d366cf07
No related branches found
No related tags found
No related merge requests found
# ACMC Documentation
# User Guide
## Contents
- [Phenotype Workflow](#phenotype-workflow)
- [Phenotype Definition](#phenotype-definition)
- [Version Control](#version-control)
- [Installation](./installation.md)
- [User Guide](./user-guide.md)
- [Usage](./usage.md)
- Tutorials
- [Example 1 - Basic local phenotype](./tutorials/example1.md)
- [Example 2 - More complex local phenotype](./tutorials/example2.md)
- [Example 3 - Using a remote git repository](./tutorials/example3.md)
- [API Reference](./api/acmc.html)
- [Change Log](./changelog.md)
- [Troubleshooting](./troubleshooting.md)
## Phenotype Workflow
ACMC has a five step workflow to create a phenotype including steps initialise, validate, map, publish and export
### Step 1: Initialise
The `initialise` step creates a phenotype directory within the acmc workspace. The outcome will be a directory with all
required subdirectories and files, see [directory structure](#phenotype-directory-structure)
```bash
acmc phen init
```
### Step 2: Validate
The `validate` step checks that the phenotype configuration is valid including verification of the
configuration file according to schema and consistency between concept sets and source concept coding lists. The outcome will be notifications of the validity of the phenotype configuration.
```bash
acmc phen validate
```
### Step 3: Map
The `map` step performs code translations between source and target coding types. The outcome will be a concept sets defined for the target coding types stored in CSV files.
```bash
acmc phen map
```
### Step 4: Publish
The `publish` step commits the phenotype to a git repo and increments the version number. The outcome will be a published phenotype at the next version number
```bash
acmc phen publish
```
### Step 5: Export
The `export` step creates an OMOP database for the phenotype. The outcome will be an OMOP database including concept sets for all target coding types exported as CSV files
```bash
acmc phen export
```
## Phenotype Definition
### **Phenotype directory structure**
```
workspace/ # Default workspace directory
├── phen/ # Default phenotype directory
│ ├── concepts/ # Phenotype source concept code lists directory
│ ├── concept-sets/ # Processed phenotype concept sets
│ │ ├── csv/ # Processed phenotype concept sets in ACMC CSV format
│ │ ├── omop/ # Processed phenotype concept sets in OMOP CDM database exported as CSV files
│ ├── map/ # Process mapping from source to target code types
│ │ ├── errors/ # Errors recorded during mapping process
│ ├── config.yaml # Phenotype configuration file
│ ├── vocab_versions.yaml # Versions file for vocabularies used to generate concept sets
```
### **Configuration File**
Phenotype configuration is stored in the root of the phenotype directory in `config.yaml`. The file is yaml format.
#### **Root Phenotype Element**
- `phenotype`: **(object)** The root element containing all phenotype-related concept sets and metadata.
#### **Phenotype Attributes**
- `version`: **(string)** Specifies the version of the phenotype definition.
- `omop`: **(object)** Metadata related to OMOP vocabulary.
- `vocabulary_id`: **(string)** Identifier for the vocabulary.
- `vocabulary_name`: **(string)** Human-readable name of the vocabulary.
- `vocabulary_reference`: **(string, URL)** A reference URL for the vocabulary source.
#### **Concept Sets**
- `concept_sets`: **(array)** A list of concept set definitions, where each item has the following attributes:
- `name`: **(string)** Unique name of the concept set.
- `file`: **(object)** Contains file-related metadata.
- `path`: **(string, file path)** Relative path to the source concepts coding list file, relative to `<phen_directory>/concepts`
- `columns`: **(object)** Key-value pairs mapping column names in the file to coding list types
- `category` **(optional, string)** A categorical identifier for processing files containing multiple concept sets.
- `actions` **(optional, object)** Additional transformations on data.
- `divide_col`: **(string)** Specifies a column name in the source concept file to group on.
- `metadata`: **(object)** Reserved for additional metadata.
## **Version Control**
ACMC uses [Git](https://git-scm.com/) to support versioning of phenotypes. Git is a version control system that track changes in documents such as source coding lists, coding maps or configuration files. Using git allows ACMC to track versions and changes.
When a phenotype is initialised the directory is configured as a Git repository. ACMC then provides a simplified interaction with Git through a specific workflow using ACMC commands including integrate with remote Git services such as [GitLab](https://about.gitlab.com/) or [GitHub](https://github.com/).
ACMC does not currently support merging contributions from multiple collaborators on a phenotype through ACMC commands. This has to be done using existing Git tools.
### **Version Numbers**
ACMC uses [semantic versioning](https://semver.org/) to version phenotypes. Semantic versioning uses three numbers MAJOR.MINOR.PATCH where each number is incremented depending on the significance of the change. Although semantic versioning is designed for sofware the idea of major, minor and patch changes is retained for the phenotype as per the following
- MAJOR version when you make changes to concept sets
- MINOR version when you make changes to coding list concepts
- PATCH version when you make other minor changes such as documentation
### **Workflows**
ACMC supports local and remote repositories
#### **Local Workflow**
A local phenotype is only stored within a directory on a filesystem. The following command will create a git repository with the initial phenotype directory structure and make a commit to the git repository.
```bash
acmc phen init
```
You can then configure your phenotype and generate maps to other coding types as required. When you are finished and happy to publish a version of your phenotype, you run the following command
```bash
acmc phen publish
```
This will commit the changes to the git repository and generate a new version number. If this is the first publish the initial version will be `0.0.1`. You can tell ACMC how to increment the version using the `-i` argument with either major, minor or patch. The defaul is a patch change, i.e. incrementing the patch number. Using the following command will create a major release `1.0.0`
```bash
acmc phen publish -i major
```
#### **Remote Workflow**
A remote phenotype is stored on a central server that can be accessed remotely by others. Common central services include GitHub or GitLab (public or private). You can connect your local phenotype to a remote repository during initialisation or publication. When connecting to a remote repository it is important and recommended that you connect to an empty repo without any previous commits. Do not initialise it with a readme.md file, which is often the default. If there are commits you will need to resolve the conflicts manually before ACMC will work.
To initiatise a phenotype with a remote Git repository using the following command replacing the git URL with the URL to your remote repo.
```bash
acmc phen init -r https://git.soton.ac.uk/meldb/remote-phenotype.git
```
If you have a local phenotype and later want to connect it to a remote phenotype you can do this when it's published
```bash
acmc phen publish -r https://git.soton.ac.uk/meldb/remote-phenotype.git
```
#### **Fork Remote Workflow**
If there is an existing published remote phenotype that you want to use as a starting point you can fork the upstream repo and create a new phenotype. To do this you can run the following command to create a fork of the remote repo in a local directory
```bash
acmc fork -u https://git.soton.ac.uk/meldb/forked-phenotype.git -v 1.0.0
```
If you want to fork the repo and connect this to a remote repo you can run the following command.
```bash
acmc fork -u https://git.soton.ac.uk/meldb/forked-phenotype.git -v 1.0.0 -r https://git.soton.ac.uk/meldb/remote-phenotype.git
```
Alternatively you can connect the remote repo later when you publish
```bash
acmc phen publish -r https://git.soton.ac.uk/meldb/remote-phenotype.git
```
### Supported Medical Coding Standards
The tool supports verification and mapping across diagnostic coding formats below:
| Medical Code | Verification | Translation to |
|---------------|--------------|-----------------------------------|
| Readv2 | NHS TRUD | Readv3, SNOMED, ICD10, OPCS4, ATC |
| Readv3 (CTV3) | NHS TRUD | Readv3, SNOMED, ICD10, OPCS4 |
| ICD10 | NHS TRUD | None |
| SNOMED | NHS TRUD | None |
| OPCS4 | NHS TRUD | None |
| ATC | None | None |
- [**Read V2:**](https://digital.nhs.uk/services/terminology-and-classifications/read-codes) NHS clinical terminology standard used in primary care and replaced by SNOMED-CT in 2018; Still supported by some data providers as widely used in primary care, e.g. [SAIL Databank](https://saildatabank.com/)
- [**SNOMED-CT:**](https://icd.who.int/browse10/2019/en) international standard for clinical terminology for Electronic Healthcare Records adopted by the NHS in 2018; Mappings to Read codes are partially provided by [Clinical Research Practice Database (CPRD)](https://www.cprd.com/) and [NHS Technology Reference Update Distribution (TRUD)](https://isd.digital.nhs.uk/trud).
- [**ICD-10:**](https://icd.who.int/browse10/2019/en) International Classification of Diseases (ICD) is a medical classification list from the World Health Organization (WHO) and widely used in hospital settings, e.g. Hospital Episode Statistics (HES).
- [**ATC Codes:**](https://www.who.int/tools/atc-ddd-toolkit/atc-classification) Anatomical Therapeutic Chemical (ATC) Classification is a drug classification list from the World Health Organization (WHO)
## Notes
Processed resources will be saved in the `build/maps/processed/` directory.
*Note: NHS TRUD provides one-way mappings. To reverse mappings, duplicate the `.parquet` file and reverse the filename (e.g., `read2_code_to_snomed_code.parquet` to `snomed_code_to_read2_code.parquet`).*
# User Guide
- [Phenotype Workflow](#phenotype-workflow)
- [Phenotype Definition](#phenotype-definition)
- [Version Control](#version-control)
## Phenotype Workflow
ACMC has a five step workflow to create a phenotype including steps initialise, validate, map, publish and export
### Step 1: Initialise
The `initialise` step creates a phenotype directory within the acmc workspace. The outcome will be a directory with all
required subdirectories and files, see [directory structure](#phenotype-directory-structure)
```bash
acmc phen init
```
### Step 2: Validate
The `validate` step checks that the phenotype configuration is valid including verification of the
configuration file according to schema and consistency between concept sets and source concept coding lists. The outcome will be notifications of the validity of the phenotype configuration.
```bash
acmc phen validate
```
### Step 3: Map
The `map` step performs code translations between source and target coding types. The outcome will be a concept sets defined for the target coding types stored in CSV files.
```bash
acmc phen map
```
### Step 4: Publish
The `publish` step commits the phenotype to a git repo and increments the version number. The outcome will be a published phenotype at the next version number
```bash
acmc phen publish
```
### Step 5: Export
The `export` step creates an OMOP database for the phenotype. The outcome will be an OMOP database including concept sets for all target coding types exported as CSV files
```bash
acmc phen export
```
## Phenotype Definition
### **Phenotype directory structure**
```
workspace/ # Default workspace directory
├── phen/ # Default phenotype directory
│ ├── concepts/ # Phenotype source concept code lists directory
│ ├── concept-sets/ # Processed phenotype concept sets
│ │ ├── csv/ # Processed phenotype concept sets in ACMC CSV format
│ │ ├── omop/ # Processed phenotype concept sets in OMOP CDM database exported as CSV files
│ ├── map/ # Process mapping from source to target code types
│ │ ├── errors/ # Errors recorded during mapping process
│ ├── config.yaml # Phenotype configuration file
│ ├── vocab_versions.yaml # Versions file for vocabularies used to generate concept sets
```
### **Configuration File**
Phenotype configuration is stored in the root of the phenotype directory in `config.yaml`. The file is yaml format.
#### **Root Phenotype Element**
- `phenotype`: **(object)** The root element containing all phenotype-related concept sets and metadata.
#### **Phenotype Attributes**
- `version`: **(string)** Specifies the version of the phenotype definition.
- `omop`: **(object)** Metadata related to OMOP vocabulary.
- `vocabulary_id`: **(string)** Identifier for the vocabulary.
- `vocabulary_name`: **(string)** Human-readable name of the vocabulary.
- `vocabulary_reference`: **(string, URL)** A reference URL for the vocabulary source.
#### **Concept Sets**
- `concept_sets`: **(array)** A list of concept set definitions, where each item has the following attributes:
- `name`: **(string)** Unique name of the concept set.
- `file`: **(object)** Contains file-related metadata.
- `path`: **(string, file path)** Relative path to the source concepts coding list file, relative to `<phen_directory>/concepts`
- `columns`: **(object)** Key-value pairs mapping column names in the file to coding list types
- `category` **(optional, string)** A categorical identifier for processing files containing multiple concept sets.
- `actions` **(optional, object)** Additional transformations on data.
- `divide_col`: **(string)** Specifies a column name in the source concept file to group on.
- `metadata`: **(object)** Reserved for additional metadata.
## **Version Control**
ACMC uses [Git](https://git-scm.com/) to support versioning of phenotypes. Git is a version control system that track changes in documents such as source coding lists, coding maps or configuration files. Using git allows ACMC to track versions and changes.
When a phenotype is initialised the directory is configured as a Git repository. ACMC then provides a simplified interaction with Git through a specific workflow using ACMC commands including integrate with remote Git services such as [GitLab](https://about.gitlab.com/) or [GitHub](https://github.com/).
ACMC does not currently support merging contributions from multiple collaborators on a phenotype through ACMC commands. This has to be done using existing Git tools.
### **Version Numbers**
ACMC uses [semantic versioning](https://semver.org/) to version phenotypes. Semantic versioning uses three numbers MAJOR.MINOR.PATCH where each number is incremented depending on the significance of the change. Although semantic versioning is designed for sofware the idea of major, minor and patch changes is retained for the phenotype as per the following
- MAJOR version when you make changes to concept sets
- MINOR version when you make changes to coding list concepts
- PATCH version when you make other minor changes such as documentation
### **Workflows**
ACMC supports local and remote repositories
#### **Local Workflow**
A local phenotype is only stored within a directory on a filesystem. The following command will create a git repository with the initial phenotype directory structure and make a commit to the git repository.
```bash
acmc phen init
```
You can then configure your phenotype and generate maps to other coding types as required. When you are finished and happy to publish a version of your phenotype, you run the following command
```bash
acmc phen publish
```
This will commit the changes to the git repository and generate a new version number. If this is the first publish the initial version will be `0.0.1`. You can tell ACMC how to increment the version using the `-i` argument with either major, minor or patch. The defaul is a patch change, i.e. incrementing the patch number. Using the following command will create a major release `1.0.0`
```bash
acmc phen publish -i major
```
#### **Remote Workflow**
A remote phenotype is stored on a central server that can be accessed remotely by others. Common central services include GitHub or GitLab (public or private). You can connect your local phenotype to a remote repository during initialisation or publication. When connecting to a remote repository it is important and recommended that you connect to an empty repo without any previous commits. Do not initialise it with a readme.md file, which is often the default. If there are commits you will need to resolve the conflicts manually before ACMC will work.
To initiatise a phenotype with a remote Git repository using the following command replacing the git URL with the URL to your remote repo.
```bash
acmc phen init -r https://git.soton.ac.uk/meldb/remote-phenotype.git
```
If you have a local phenotype and later want to connect it to a remote phenotype you can do this when it's published
```bash
acmc phen publish -r https://git.soton.ac.uk/meldb/remote-phenotype.git
```
#### **Fork Remote Workflow**
If there is an existing published remote phenotype that you want to use as a starting point you can fork the upstream repo and create a new phenotype. To do this you can run the following command to create a fork of the remote repo in a local directory
```bash
acmc fork -u https://git.soton.ac.uk/meldb/forked-phenotype.git -v 1.0.0
```
If you want to fork the repo and connect this to a remote repo you can run the following command.
```bash
acmc fork -u https://git.soton.ac.uk/meldb/forked-phenotype.git -v 1.0.0 -r https://git.soton.ac.uk/meldb/remote-phenotype.git
```
Alternatively you can connect the remote repo later when you publish
```bash
acmc phen publish -r https://git.soton.ac.uk/meldb/remote-phenotype.git
```
### Supported Medical Coding Standards
The tool supports verification and mapping across diagnostic coding formats below:
| Medical Code | Verification | Translation to |
|---------------|--------------|-----------------------------------|
| Readv2 | NHS TRUD | Readv3, SNOMED, ICD10, OPCS4, ATC |
| Readv3 (CTV3) | NHS TRUD | Readv3, SNOMED, ICD10, OPCS4 |
| ICD10 | NHS TRUD | None |
| SNOMED | NHS TRUD | None |
| OPCS4 | NHS TRUD | None |
| ATC | None | None |
- [**Read V2:**](https://digital.nhs.uk/services/terminology-and-classifications/read-codes) NHS clinical terminology standard used in primary care and replaced by SNOMED-CT in 2018; Still supported by some data providers as widely used in primary care, e.g. [SAIL Databank](https://saildatabank.com/)
- [**SNOMED-CT:**](https://icd.who.int/browse10/2019/en) international standard for clinical terminology for Electronic Healthcare Records adopted by the NHS in 2018; Mappings to Read codes are partially provided by [Clinical Research Practice Database (CPRD)](https://www.cprd.com/) and [NHS Technology Reference Update Distribution (TRUD)](https://isd.digital.nhs.uk/trud).
- [**ICD-10:**](https://icd.who.int/browse10/2019/en) International Classification of Diseases (ICD) is a medical classification list from the World Health Organization (WHO) and widely used in hospital settings, e.g. Hospital Episode Statistics (HES).
- [**ATC Codes:**](https://www.who.int/tools/atc-ddd-toolkit/atc-classification) Anatomical Therapeutic Chemical (ATC) Classification is a drug classification list from the World Health Organization (WHO)
## Notes
Processed resources will be saved in the `build/maps/processed/` directory.
*Note: NHS TRUD provides one-way mappings. To reverse mappings, duplicate the `.parquet` file and reverse the filename (e.g., `read2_code_to_snomed_code.parquet` to `snomed_code_to_read2_code.parquet`).*
......@@ -7,11 +7,10 @@ theme:
nav:
- Home: index.html
- Installation: installation.html
- User Guide: user-guide.md
- Usage: usage.md
- Tutorial 1 Basic local phenotype: tutorials/example1.md
- Tutorial 2 More complex local phenotype: tutorials/example2.md
- Tutorial 3 Using a remote git repository: tutorials/example3.md
- Comand Line Reference: usage.md
- API Reference: api/acmc.html
- Change Log: changelog.md
- Troubleshooting: troubleshooting.md
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment