diff --git a/.gitignore b/.gitignore
index 69fafb5db55832f139fe4cec5d3941698d4630c0..2424433f464d54ba4b0b92d79bfcdc2c346d6161 100644
--- a/.gitignore
+++ b/.gitignore
@@ -13,6 +13,7 @@ __pycache__
 ~$*
 
 # Build
+build/*
 output/
 concepts-output/
 archive/
diff --git a/README.md b/README.md
index 358b96de7f5980df141f322362534fd78cc342e2..955bb3b7e6cf367e0ce23f11e10fa83e526ece97 100644
--- a/README.md
+++ b/README.md
@@ -3,7 +3,7 @@
   <img src="img/swansea-university-logo-vector.png" height="100" />
 </center>
 
-# A Tool for Automating the Curation of Medical Concepts derived from Coding Lists
+# A Tool for Automating the Curation of Medical Concepts derived from Coding Lists (ACMC)
 
 ### Jakub J. Dylag <sup>1</sup>, Roberta Chiovoloni <sup>3</sup>, Ashley Akbari <sup>3</sup>, Simon D. Fraser <sup>2</sup>, Michael J. Boniface <sup>1</sup>
 
@@ -16,60 +16,72 @@
 ### Citation
 > Dylag JJ, Chiovoloni R, Akbari A, Fraser SD, Boniface MJ. A Tool for Automating the Curation of Medical Concepts derived from Coding Lists. GitLab [Internet]. May 2024. Available from: https://git.soton.ac.uk/meldb/concepts-processing
 
+## Introduction
 
-## Introduction 
-This tool automates the verification, translation and organisation of medical coding lists defining cohort phenotypes for inclusion criteria. By processing externally sourced clinical inclusion criteria into actionable code lists, this tool ensures consistent and efficient curation of cohort definitions. These code lists can be subsequently used by data providers (e.g. SAIL) to construct study cohorts.
+This tool automates the verification, translation and organisation of medical coding lists defining  phenotypes for inclusion criteria in cohort analysis. By processing externally sourced clinical inclusion criteria into actionable code lists, this tool ensures consistent and efficient curation of cohort definitions. These code lists can be subsequently used by data providers to construct study cohorts.
 
+## Overview
 
-## Methods
+### Workflow
 
-### Workflow Overview
-1. Approved concept sets are outlined in a CSV spreadsheet (e.g., `PHEN_summary_working.csv`).
-2. Imported code lists in the `/src` directory are validated against NHS TRUD-registered codes.
-3. Mappings from imported code lists to outputted concept sets are defined in the `PHEN_assign_v3.json` file.
-	- See "JSON Phenotype Mapping" section for more details 
-4. The process is executed via command-line. Refer to the "Usage" section for execution instructions.
-5. Outputted concept set codes lists are saved to the `/concepts` Git repository, with all changes tracked.
-6. The code lists can be exported to SAIL or any other Data Bank.
+The high level steps to use the tools are outlined below:
+
+**1. Define concept sets:** A domain expert defines a list of [concept sets](#concept-set-assigment) for each observable characteristic of the phenotype using CSV file format (e.g., `PHEN_concept_sets.csv`).
+
+**2. Define code lists for concept sets:** A domain expert defines [code lists](#???) for each concept set within the phenotype using supported coding list formats and stores them in the `/src` directory.
+
+**3. Define mapping from code lists to concept sets:** A domain expert defines a [phenotype mapping](#???) that maps code lists to concept sets in JSON file format (PHEN_assign_v3.json)
+
+**4. Generate versioned phenotype coding lists and translations:** A domain expert or data engineer processes the phenotype mappings [using the command line tool](#usage) to validate against NHS TRUD-registered codes and mapping and to generate versioned concept set code lists with translations between coding standards. 
 
 ### Supported Medical Coding Standards
-The tool supports verification and mapping across various diagnostic coding formats:
+
+The tool supports verification and mapping across diagnostic coding formats below:
 
 | Medical Code  | Verification | Translation to                    |
 |---------------|--------------|-----------------------------------|
 | Readv2        | NHS TRUD     | Readv3, SNOMED, ICD10, OPCS4, ATC |
 | Readv3 (CTV3) | NHS TRUD     | Readv3, SNOMED, ICD10, OPCS4      |
-| ICD10         | NHS TRUD     |                                   |
-| SNOMED        | NHS TRUD     |                                   |
-| OPCS4         | NHS TRUD     |                                   |
-| ATC           | None         |                                   |
+| ICD10         | NHS TRUD     | None                              |
+| SNOMED        | NHS TRUD     | None                              |
+| OPCS4         | NHS TRUD     | None                              |
+| ATC           | None         | None                              |
 
-#### Notes on Code Systems:
-- **Read V2:** Replaced by SNOMED-CT in 2018, but still supported by SAIL (restricted to five-character codes).
-- **SNOMED-CT:** Adopted widely by the NHS in 2018; mappings to Read codes are partially provided by CPRD and NHS TRUD.
-- **ICD-10:** Widely used in hospital settings and critical for HES-linked datasets.
-- **ATC Codes:** Maintained by WHO and used internationally for medication classification.
+- [**Read V2:**](https://digital.nhs.uk/services/terminology-and-classifications/read-codes) NHS clinical terminology standard used in primary care and replaced by SNOMED-CT in 2018; Still supported by some data providers as widely used in primary care, e.g. [SAIL Databank](https://saildatabank.com/)
+- [**SNOMED-CT:**](https://icd.who.int/browse10/2019/en) international standard for clinical terminology for Electronic Healthcare Records adopted by the NHS in 2018; Mappings to Read codes are partially provided by [Clinical Research Practice Database (CPRD)](https://www.cprd.com/) and [NHS Technology Reference Update Distribution (TRUD)](https://isd.digital.nhs.uk/trud).
+- [**ICD-10:**](https://icd.who.int/browse10/2019/en) International Classification of Diseases (ICD) is a medical classification list from the World Health Organization (WHO) and widely used in hospital settings, e.g. Hospital Episode Statistics (HES).
+- [**ATC Codes:**](https://www.who.int/tools/atc-ddd-toolkit/atc-classification) Anatomical Therapeutic Chemical (ATC) Classification is a drug classification list from the World Health Organization (WHO)
 
 ## Installation
-1. **Setup Conda Enviroment:** Download and Install Python Enviroment. Follow insturctions to install minicoda from [https://docs.conda.io/en/latest/miniconda.html](https://docs.conda.io/en/latest/miniconda.html). 
-	- Run the following command to recreate the environment: `conda env create -f conda.yaml`.
-	- Activate the environment: `conda activate base`
 
-2. **Sign Up:** Register at [NHS TRUD](https://isd.digital.nhs.uk/trud/user/guest/group/0/account/form) and accept the following licenses:
+1. **Setup Conda Enviroment:** Download and Install Python Enviroment. Follow insturctions to install minicoda from [https://docs.conda.io/en/latest/miniconda.html](https://docs.conda.io/en/latest/miniconda.html).
+
+ - Run the following command to recreate the environment: `conda env create -f conda.yaml`.
+ - Activate the environment: `conda activate acmc`
+
+2. **Sign Up:** Register at [NHS TRUD](https://isd.digital.nhs.uk/trud/user/guest/group/0/account/form)
+
+3. **Subscribe** and accept the following licenses:
+
    - [NHS Read Browser](https://isd.digital.nhs.uk/trud/users/guest/filters/2/categories/9/items/8/releases)
    - [NHS Data Migration](https://isd.digital.nhs.uk/trud/users/guest/filters/0/categories/9/items/9/releases)
+   - https://isd.digital.nhs.uk/trud/users/authenticated/filters/0/categories/8/items/9/releases
    - [ICD10 Edition 5 XML](https://isd.digital.nhs.uk/trud/users/authenticated/filters/0/categories/28/items/259/releases)
    - [OPCS-4.10 Data Files](https://isd.digital.nhs.uk/trud/users/guest/filters/0/categories/10/items/119/releases)
    	<!-- - [BNF/Snomed Mapping data.xlsx](https://www.nhsbsa.nhs.uk/prescription-data/understanding-our-data/bnf-snomed-mapping) -->
+
+Each data file has a "Subscribe" link that will take you to the licence. You will need to "Tell us about your subscription request" that summarises why you need access to the data. Your subscription will not be approved immediately and will remain in the "pending" state until it is. This is usually approved within 24 hours. 
 	
-3. **Obtain API Key:** Retrieve your API key from [NHS TRUD Account Management](https://isd.digital.nhs.uk/trud/users/authenticated/filters/0/account/manage).
+4. **Get API Key:** Retrieve your API key from [NHS TRUD Account Management](https://isd.digital.nhs.uk/trud/users/authenticated/filters/0/account/manage).
+
+5. **Install TRUD:** Download and install NHS TRUD medical code resources.
+
+Executing the script using the command: `python trud_api.py --key <API_KEY>`.
 
-4. **Install TRUD:** Download and Install NHS TRUD medical code resources. 
-Executing the script using the command: `python trud_api.py --key <API_KEY>`. 
 Processed tables will be saved as `.parquet` files in the `maps/processed/` directory.
 	- *Note: NHS TRUD defines one-way mappings and does <b>NOT ADVISE</b> reversing the mappings. If you still wish to reverse these into two-way mappings, duplicate the given `.parquet` table and reverse the filename (e.g. `read2_code_to_snomed_code.parquet` to `snomed_code_to_read2_code.parquet`)*
 
-5. ***Optional: Install OMOP Database:** Download and install OMOP vocabularies from [Athena OHDSI](https://athena.ohdsi.org/vocabulary/list). 
+6. ***Optional: Install OMOP Database:** Download and install OMOP vocabularies from [Athena OHDSI](https://athena.ohdsi.org/vocabulary/list). 
 	- Required vocabularies include:
    		- 1) SNOMED
 		- 2) ICD9CM
diff --git a/conda.yaml b/conda.yaml
index cc563eaa7013f3cee3142089ab0655e91d5dd6be..e261a9027ebc65e80efabe6b53607fb9ca925927 100644
--- a/conda.yaml
+++ b/conda.yaml
@@ -1,4 +1,4 @@
-name: base
+name: acmc
 channels:
   - conda-forge
 dependencies:
diff --git a/trud_api.py b/trud_api.py
index a7eade3d5ebf2a54509ece2c6cd4d62870ae4b9e..815403136e754cb96781752d3082354be932d37f 100644
--- a/trud_api.py
+++ b/trud_api.py
@@ -3,6 +3,7 @@ import sys
 import requests
 import json
 import argparse
+import shutil
 from pathlib import Path
 
 from base import bcolors
@@ -29,9 +30,10 @@ def get_releases(item_id, API_KEY, latest=False):
     url = f"https://{FQDN}/trud/api/v1/keys/{API_KEY}/items/{item_id}/releases"
     if latest:
         url += "?latest"
+
     response = requests.get(url)
     if response.status_code != 200:
-        error_exit(f"Failed to fetch releases for item {item_id}. Status code: {response.status_code}")
+        error_exit(f"Failed to fetch releases for item {item_id}. Status code: {response.status_code}, error {response.json()['message']}. If no releases found for API key, please ensure you are subscribed to the data release and that it is not pending approval")
 
     data = response.json()
     if data.get("message") != "OK":
@@ -39,7 +41,7 @@ def get_releases(item_id, API_KEY, latest=False):
 
     return data.get("releases", [])
 
-def download_release_file(item_id, release_ordinal, release, file_json_prefix, file_type=None, items_folder="maps"):
+def download_release_file(item_id, release_ordinal, release, file_json_prefix, file_type=None, items_folder="build/maps/downloads"):
     """Download specified file type for a given release of an item."""
     file_type = file_type or file_json_prefix
     file_url = release.get(f"{file_json_prefix}FileUrl")
@@ -49,7 +51,7 @@ def download_release_file(item_id, release_ordinal, release, file_json_prefix, f
     if not file_url or not file_name:
         error_exit(f"Missing {file_type} file information for release {release_ordinal} of item {item_id}.")
 
-    print(f"Downloading item {item_id} {file_type} file: {file_name}")
+    print(f"Downloading item {item_id} {file_type} file: {file_name} from {file_url} to {file_destination}")
     response = requests.get(file_url, stream=True)
     
     if response.status_code == 200:
@@ -68,137 +70,176 @@ def validate_download_hash(file_destination:str, item_hash:str):
     else:
         error_exit(f"Could not validate origin of {file_destination}. The SHA-256 hash should be: {item_hash}, but got {hash} instead")
 
-def unzip_download(file_destination:str, items_folder="maps"):
+def unzip_download(file_destination:str, items_folder="build/maps/downloads"):
     with zipfile.ZipFile(file_destination, 'r') as zip_ref:
         zip_ref.extractall(items_folder)
 
 def extract_icd10():
     #ICD10_edition5
-    df = pd.read_xml("maps/ICD10_Edition5_XML_20160401/Content/ICD10_Edition5_CodesAndTitlesAndMetadata_GB_20160401.xml",)
+    file_path = Path('build') / 'maps' / 'downloads' / 'ICD10_Edition5_XML_20160401' / 'Content' / 'ICD10_Edition5_CodesAndTitlesAndMetadata_GB_20160401.xml'
+
+    df = pd.read_xml(file_path)
     df = df[["CODE", "ALT_CODE", "DESCRIPTION"]]
     df = df.rename(columns={"CODE":"icd10_code",
                             "ALT_CODE":"icd10_alt_code",
                             "DESCRIPTION":"description"
                         })
-    df.to_parquet("maps/processed/icd10_code.parquet", index=False)
-    print("Extracted ", "maps/processed/icd10_code.parquet")
+    df.to_parquet("build/maps/processed/icd10_code.parquet", index=False)
+    print("Extracted ", "build/maps/processed/icd10_code.parquet")
 
 def extract_opsc4():
-    df = pd.read_csv("maps/OPCS410 Data files txt/OPCS410 CodesAndTitles Nov 2022 V1.0.txt", sep='\t', dtype=str, header=None)
+    file_path = Path('build') / 'maps' / 'downloads' / 'OPCS410 Data files txt' / 'OPCS410 CodesAndTitles Nov 2022 V1.0.txt'
+    
+    df = pd.read_csv(file_path, sep='\t', dtype=str, header=None)
     df = df.rename(columns={0:"opcs4_code", 1:"description"})
-    df.to_parquet("maps/processed/opcs4_code.parquet", index=False)
-    print("Extracted ", "maps/processed/opcs4_code.parquet")
+    df.to_parquet("build/maps/processed/opcs4_code.parquet", index=False)
+    print("Extracted ", "build/maps/processed/opcs4_code.parquet")
 
 def extract_nhs_data_migrations():
     #NHS Data Migrations
+    file_path = Path('build') / 'maps' / 'downloads' / 'Mapping Tables' / 'Updated' / 'Clinically Assured' / 'sctcremap_uk_20200401000001.txt'
 
     #snomed only
-    df = pd.read_csv('maps/Mapping Tables/Updated/Clinically Assured/sctcremap_uk_20200401000001.txt', sep='\t')
+    df = pd.read_csv(file_path, sep='\t')
     df = df[["SCT_CONCEPTID"]]
     df = df.rename(columns={"SCT_CONCEPTID":"snomed_code"})
     df = df.drop_duplicates()
     df = df.astype(str)
-    df.to_parquet("maps/processed/snomed_code.parquet", index=False)
-    print("Extracted ", "maps/processed/snomed_code.parquet")
+    df.to_parquet("build/maps/processed/snomed_code.parquet", index=False)
+    print("Extracted ", "build/maps/processed/snomed_code.parquet")
 
     #r2 -> r3
-    df = pd.read_csv('maps/Mapping Tables/Updated/Clinically Assured/rctctv3map_uk_20200401000001.txt', sep='\t')
+    file_path = Path('build') / 'maps' / 'downloads' / 'Mapping Tables' / 'Updated' / 'Clinically Assured' / 'rctctv3map_uk_20200401000001.txt'
+
+    df = pd.read_csv(file_path, sep='\t')
     df = df[["V2_CONCEPTID", "CTV3_CONCEPTID"]]
     df = df.rename(columns={"V2_CONCEPTID":"read2_code",
                             "CTV3_CONCEPTID":"read3_code"})
-    df.to_parquet("maps/processed/read2_code_to_read3_code.parquet", index=False)
-    print("Extracted ", "maps/processed/read2_code_to_read3_code.parquet")
+    df.to_parquet("build/maps/processed/read2_code_to_read3_code.parquet", index=False)
+    print("Extracted ", "build/maps/processed/read2_code_to_read3_code.parquet")
 
     #r3->r2
-    df = pd.read_csv('maps/Mapping Tables/Updated/Clinically Assured/ctv3rctmap_uk_20200401000002.txt', sep='\t')
+    file_path = Path('build') / 'maps' / 'downloads' / 'Mapping Tables' / 'Updated' / 'Clinically Assured' / 'ctv3rctmap_uk_20200401000002.txt'
+
+    df = pd.read_csv(file_path, sep='\t')
     df = df[["CTV3_CONCEPTID", "V2_CONCEPTID"]]
     df = df.rename(columns={"CTV3_CONCEPTID":"read3_code", 
                             "V2_CONCEPTID":"read2_code"})
     df = df.drop_duplicates()
     df = df[~df["read2_code"].str.match("^.*_.*$")] #remove r2 codes with '_'
-    df.to_parquet("maps/processed/read3_code_to_read2_code.parquet", index=False)
-    print("Extracted ", "maps/processed/read3_code_to_read2_code.parquet")
+    df.to_parquet("build/maps/processed/read3_code_to_read2_code.parquet", index=False)
+    print("Extracted ", "build/maps/processed/read3_code_to_read2_code.parquet")
 
     #r2 -> snomed
-    df = pd.read_csv('maps/Mapping Tables/Updated/Clinically Assured/rcsctmap2_uk_20200401000001.txt', sep='\t', dtype=str)
+    file_path = Path('build') / 'maps' / 'downloads' / 'Mapping Tables' / 'Updated' / 'Clinically Assured' / 'rcsctmap2_uk_20200401000001.txt'
+
+    df = pd.read_csv(file_path, sep='\t', dtype=str)
     df = df[["ReadCode", "ConceptId"]]
     df = df.rename(columns={"ReadCode":"read2_code",
                             "ConceptId":"snomed_code"})
-    df.to_parquet("maps/processed/read2_code_to_snomed_code.parquet", index=False)
-    print("Extracted ", "maps/processed/read2_code_to_snomed_code.parquet")
+    df.to_parquet("build/maps/processed/read2_code_to_snomed_code.parquet", index=False)
+    print("Extracted ", "build/maps/processed/read2_code_to_snomed_code.parquet")
 
     #r3->snomed
-    df = pd.read_csv('maps/Mapping Tables/Updated/Clinically Assured/ctv3sctmap2_uk_20200401000001.txt', sep='\t')
+    file_path = Path('build') / 'maps' / 'downloads' / 'Mapping Tables' / 'Updated' / 'Clinically Assured' / 'ctv3sctmap2_uk_20200401000001.txt'
+
     df = df[["CTV3_TERMID", "SCT_CONCEPTID"]]
     df = df.rename(columns={"CTV3_TERMID":"read3_code",
                             "SCT_CONCEPTID":"snomed_code"})
     df["snomed_code"] = df["snomed_code"].astype(str)
     df = df[~df["snomed_code"].str.match("^.*_.*$")] #remove snomed codes with '_'
-    df.to_parquet("maps/processed/read3_code_to_snomed_code.parquet", index=False)
-    print("Extracted ", "maps/processed/read3_code_to_snomed_code.parquet")
+    df.to_parquet("build/maps/processed/read3_code_to_snomed_code.parquet", index=False)
+    print("Extracted ", "build/maps/processed/read3_code_to_snomed_code.parquet")
 
 def extract_nhs_read_browser():
     #r2 only
-    df = simpledbf.Dbf5('maps/Standard/V2/ANCESTOR.DBF').to_dataframe()
+    df = simpledbf.Dbf5('build/maps/downloads/Standard/V2/ANCESTOR.DBF').to_dataframe()
     df = pd.concat([df['READCODE'], df['DESCENDANT']])
     df = pd.DataFrame(df.drop_duplicates())
     df = df.rename(columns={0:"read2_code"})
-    df.to_parquet("maps/processed/read2_code.parquet", index=False)
-    print("Extracted ", "maps/processed/read2_code.parquet")
+    df.to_parquet("build/maps/processed/read2_code.parquet", index=False)
+    print("Extracted ", "build/maps/processed/read2_code.parquet")
 
     #r2 -> atc
-    df = simpledbf.Dbf5('maps/Standard/V2/ATC.DBF').to_dataframe()
+    df = simpledbf.Dbf5('build/maps/downloads/Standard/V2/ATC.DBF').to_dataframe()
     df = df[["READCODE", "ATC"]]
     df = df.rename(columns={"READCODE":"read2_code", "ATC":"atc_code"})
-    df.to_parquet("maps/processed/read2_code_to_atc_code.parquet", index=False)
-    print("Extracted ", "maps/processed/read2_code_to_atc_code.parquet")
+    df.to_parquet("build/maps/processed/read2_code_to_atc_code.parquet", index=False)
+    print("Extracted ", "build/maps/processed/read2_code_to_atc_code.parquet")
 
     #r2 -> icd10
-    df = simpledbf.Dbf5('maps/Standard/V2/ICD10.DBF').to_dataframe()
+    df = simpledbf.Dbf5('build/maps/downloads/Standard/V2/ICD10.DBF').to_dataframe()
     df = df[["READ_CODE", "TARG_CODE"]]
     df = df.rename(columns={"READ_CODE":"read2_code", "TARG_CODE":"icd10_code"})
     df = df[~df["icd10_code"].str.match("^.*-.*$")] #remove codes with '-'
     df = df[~df["read2_code"].str.match("^.*-.*$")] #remove codes with '-'
-    df.to_parquet("maps/processed/read2_code_to_icd10_code.parquet", index=False)
-    print("Extracted ", "maps/processed/read2_code_to_icd10_code.parquet")
+    df.to_parquet("build/maps/processed/read2_code_to_icd10_code.parquet", index=False)
+    print("Extracted ", "build/maps/processed/read2_code_to_icd10_code.parquet")
 
     #r2 -> opcs4
-    df = simpledbf.Dbf5('maps/Standard/V2/OPCS4V3.DBF').to_dataframe()
+    df = simpledbf.Dbf5('build/maps/downloads/Standard/V2/OPCS4V3.DBF').to_dataframe()
     df = df[["READ_CODE", "TARG_CODE"]]
     df = df.rename(columns={"READ_CODE":"read2_code", "TARG_CODE":"opcs4_code"})
     df = df[~df["opcs4_code"].str.match("^.*-.*$")] #remove codes with '-'
     df = df[~df["read2_code"].str.match("^.*-.*$")] #remove codes with '-'
-    df.to_parquet("maps/processed/read2_code_to_opcs4_code.parquet", index=False)
-    print("Extracted ", "maps/processed/read2_code_to_opcs4_code.parquet")
+    df.to_parquet("build/maps/processed/read2_code_to_opcs4_code.parquet", index=False)
+    print("Extracted ", "build/maps/processed/read2_code_to_opcs4_code.parquet")
 
     #r3 only
-    df = simpledbf.Dbf5('maps/Standard/V3/ANCESTOR.DBF').to_dataframe()
+    df = simpledbf.Dbf5('build/maps/downloads/Standard/V3/ANCESTOR.DBF').to_dataframe()
     df = pd.concat([df['READCODE'], df['DESCENDANT']])
     df = pd.DataFrame(df.drop_duplicates())
     df = df.rename(columns={0:"read3_code"})
-    df.to_parquet("maps/processed/read3_code.parquet", index=False)
-    print("Extracted ", "maps/processed/read3_code.parquet")
+    df.to_parquet("build/maps/processed/read3_code.parquet", index=False)
+    print("Extracted ", "build/maps/processed/read3_code.parquet")
 
     #r3 -> icd10
-    df = simpledbf.Dbf5('maps/Standard/V3/ICD10.DBF').to_dataframe()
+    df = simpledbf.Dbf5('build/maps/downloads/Standard/V3/ICD10.DBF').to_dataframe()
     df = df[["READ_CODE", "TARG_CODE"]]
     df = df.rename(columns={"READ_CODE":"read3_code", "TARG_CODE":"icd10_code"})
     df = df[~df["icd10_code"].str.match("^.*-.*$")] #remove codes with '-'
     df = df[~df["read3_code"].str.match("^.*-.*$")] #remove codes with '-'
-    df.to_parquet("maps/processed/read3_code_to_icd10_code.parquet", index=False)
-    print("Extracted ", "maps/processed/read3_code_to_icd10_code.parquet")
+    df.to_parquet("build/maps/processed/read3_code_to_icd10_code.parquet", index=False)
+    print("Extracted ", "build/maps/processed/read3_code_to_icd10_code.parquet")
 
     #r3 -> icd9
-    # dbf = simpledbf.Dbf5('maps/Standard/V3/ICD9V3.DBF')
+    # dbf = simpledbf.Dbf5('build/maps/downloads/Standard/V3/ICD9V3.DBF')
 
     #r3 -> opcs4
-    df = simpledbf.Dbf5('maps/Standard/V3/OPCS4V3.DBF').to_dataframe()
+    df = simpledbf.Dbf5('build/maps/downloads/Standard/V3/OPCS4V3.DBF').to_dataframe()
     df = df[["READ_CODE", "TARG_CODE"]]
     df = df.rename(columns={"READ_CODE":"read3_code", "TARG_CODE":"opcs4_code"})
     df = df[~df["opcs4_code"].str.match("^.*-.*$")] #remove codes with '-'
     df = df[~df["read3_code"].str.match("^.*-.*$")] #remove codes with '-'
-    df.to_parquet("maps/processed/read3_code_to_opcs4_code.parquet", index=False)
-    print("Extracted ", "maps/processed/read3_code_to_opcs4_code.parquet")
+    df.to_parquet("build/maps/processed/read3_code_to_opcs4_code.parquet", index=False)
+    print("Extracted ", "build/maps/processed/read3_code_to_opcs4_code.parquet")
+
+def create_build_directories(build_dir='build'):
+    """Create build directories.""" 
+    build_path = Path(build_dir)
+
+    # Check if build directory exists
+    create_build_dirs = False   
+    if build_path.exists() and build_path.is_dir():
+        user_input = input(f"The build directory {build_path} already exists. Do you want to delete and recreate all data? (y/n): ").strip().lower()
+        if user_input == "y":
+            # delete all build files
+            shutil.rmtree(build_path)
+            create_build_dirs = True
+    else:
+        create_build_dirs = True  
+
+    if create_build_dirs:
+        # create build directory
+        build_path.mkdir(parents=True, exist_ok=True)
+
+        # create maps directories
+        maps_path = build_path / 'maps'
+        maps_path.mkdir(parents=True, exist_ok=True)
+        maps_download_path = maps_path / 'downloads'
+        maps_download_path.mkdir(parents=True, exist_ok=True)            
+        maps_processed_path = maps_path / 'processed'
+        maps_processed_path.mkdir(parents=True,exist_ok=True)                                 
 
 def main():
     parser = argparse.ArgumentParser(
@@ -214,8 +255,10 @@ def main():
     
     args = parser.parse_args()
 
+    create_build_directories()
+
     items_latest = True
-    items_folder = "maps"
+    items_folder = "build/maps/downloads"
     items = [
         {
             "id": 259,