The MELDB 3.2.10 phenotype and concept sets need to be upgraded using the refactored acmc concept processing tool published in pypi, and then republished at v4.0.0
The task list
check latest tag in repo is 3.2.10, acmc now updated to deal with last tags beginning with or without a v
Checkout latest concepts used for SAIL into workspace/phen directory
Manually create the acmc directory structure
rename codes to concepts
add concept-sets directory
Write converter from concepts configuration file to acmc configuration file config.yaml
need to split excel sheets for drugs as not supported by acmc
need to duplicate hanlon for 1-1 mapping between concept-set and codes
Add desired maps to config.yaml, read2, read3, snomed
validate phenotype: acmc phen validate
map phenotype: acmc map
check consistency of concepts of v3.2.10 to the new maps, ideally using: acmc phen diff, to do this will require restructuring of the 3.2.10 directories but the maps should be the same
delete all of the old files from 3.2.10
publish phen as major release, should increment from last v3.2.10 tag to 4.0.0 : acmc phen publish -i major
7 of 10 checklist items completed
· Edited
Designs
Child items
...
Show closed items
Linked items
0
Link issues together to show that they're related.
Learn more.
mjbonifachanged title from Upgrade to pypi acmc v0.1.1 to Upgrade to pypi acmc v0.1.2 (latest)
changed title from Upgrade to pypi acmc v0.1.1 to Upgrade to pypi acmc v0.1.2 (latest)
Jakub Dylagmarked the checklist item check latest tag in repo is 3.2.10, acmc now updated to deal with last tags beginning with or without a v as completed
marked the checklist item check latest tag in repo is 3.2.10, acmc now updated to deal with last tags beginning with or without a v as completed
Jakub Dylagmarked the checklist item Checkout latest concepts used for SAIL into workspace/phen directory as completed
marked the checklist item Checkout latest concepts used for SAIL into workspace/phen directory as completed
Jakub Dylagmarked the checklist item Manually create the acmc directory structure as completed
marked the checklist item Manually create the acmc directory structure as completed
Jakub Dylagmarked the checklist item Write converter from concepts configuration file to acmc configuration file config.yaml as completed
marked the checklist item Write converter from concepts configuration file to acmc configuration file config.yaml as completed
Jakub Dylagmarked the checklist item validate phenotype: acmc phen validate as completed
marked the checklist item validate phenotype: acmc phen validate as completed
@mjbonifa I have restructured the MELDB files into acmc 0.1.12 format.
So far my conversion script from the JSON file () has only been able to map 97/161 concept sets successfully.
I found the following issues stopping further conversions:
Duplicate Concept Set definitions needed - fails validation
have multiple files list under a single concept set
Error all codes fail QA resulting empty concept set
Had to remove “PLASMACELL” concept set
Error log
[INFO] - Processing read2 codes for ..\workspace\phen\concepts\GitHub_TG_repository\plasmacell_neoplasm_birm_cam\plasmacell_neoplasm_birm_cam_IMRD.csv
[WARNING] - Codes validation failed with 1 errors
[INFO] - Converting to target code type read2
[INFO] - Converting to target code type read2
[ERROR] - The map processing has 52 errors
Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "C:\Users\jjd1c23\AppData\Local\hatch\env\virtual\acmc\9dLazAmo\acmc\Scripts\acmc.exe\__main__.py", line 7, in
File "C:\Users\jjd1c23\Documents\MELDB\concepts-processing\acmc\main.py", line 336, in main
args.func(args)
File "C:\Users\jjd1c23\Documents\MELDB\concepts-processing\acmc\main.py", line 61, in _phen_map
phen.map(args.phen_dir, args.target_coding)
File "C:\Users\jjd1c23\Documents\MELDB\concepts-processing\acmc\phen.py", line 801, in map
_map_target_code_type(phen_path, phenotype, target_code_type)
File "C:\Users\jjd1c23\Documents\MELDB\concepts-processing\acmc\phen.py", line 930, in _map_target_code_type
final_out = pd.concat(result_list, ignore_index=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\jjd1c23\AppData\Local\hatch\env\virtual\acmc\9dLazAmo\acmc\Lib\site-packages\pandas\core\reshape\concat.py", line 382, in concat
op = _Concatenator(
^^^^^^^^^^^^^^
File "C:\Users\jjd1c23\AppData\Local\hatch\env\virtual\acmc\9dLazAmo\acmc\Lib\site-packages\pandas\core\reshape\concat.py", line 445, in __init__
objs, keys = self._clean_keys_and_objs(objs, keys)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\jjd1c23\AppData\Local\hatch\env\virtual\acmc\9dLazAmo\acmc\Lib\site-packages\pandas\core\reshape\concat.py", line 507, in _clean_keys_and_objs
raise ValueError("No objects to concatenate")
ValueError: No objects to concatenate
Above I think you are referring to Concept Set definitions and notConcept definitions.
We explicitly do not want duplicate concept-set definitions. This is confusing for the users, therefore the change you have made is incorrect. We need validation re-added.
This leads to needing to understand what you want to do and how you are mapping concepts to concept sets in the coding files.
You can see that the concept coding file is duplicated but the category is used to select the concept codes to map to the concept set. Previously, you would have had hanlon/Read_codes_for_diagnoses.csv definied once in the codes block but a mapping of categories to multiple concept sets.
This duplicates a few elements for hanlon but it is easier to understand the definition with this 1-1 mapping.
@jjd1c23 and @mjbonifa discussed the issue. The scenario is that a concept set can have multiple source concept coding lists. This is not a scenario supported by acmc but was supported by the initial protocol. We agreed to maintain the single concept set but to allow for multiple files within a concept set.
@mjbonifa I believe the best solution would be to change the "file" object from a dict to list. This would also require "metadata" to be moved inside of the columns object. This may require major code changes!
This way multiple source concepts, in addition to multiple categories within source concept code list, can map to a single concept set. In effect recreating the old protocol, but turning it from file centric json to concept set centric yaml.
@jjd1c23 Yes, file should be a list. The metadata object is not used and should be deleted from the example. Right now acmc includes all of the columns that are not explicit code columns in the result. So major code changes might not be required.
@jjd1c23 metadata element now removed from examples config.yml as this was legacy and not used in the map process. concepts-processing#58 (closed) commited and merged into dev on acmc
@jjd1c23 I’m not sure why you have removed this code. We still keep the metadata. I think you’ll need to put it back.
Your first implementation used the specify the metadata fields to keep. It used to add those selected fields to the outputs.
The current implementation assumed all fields in the file that are not defined as codes are kept along with the output. It’s still called metadata but there’s no need to specify it explicitly as it just retains the columns in the output.