@mjbonifa Allowing multiple files per concept - causes issue with diff command in phen.py. Deepdiff uses dictionaries to compare old and new phenotypes. Python does not allow duplicate keys in dictionaries. In order to store concept set name to file path relation, must use list of tuples instead.
@jjd1c23 I have updated branch 57-fix-convertion-of-meldb-sail-phenotypes-to-acmc-0-1-12 with the following
Merged dev into 57 so that it had some of the latest changes with the tests
Updated phen.py def extract_concepts... so that it works with a list of paths
Ran all tests and they pass as previously defined (although noting that the testing is limited and many things are not checked)
You should pull the branch before you start again. This is good practice anyway as you'll get the latest changes.
We need to talk about the metadata support as you've commented out a load of stuff during the implementation of the path list. It will be easier for chat for 15 to understand what we have both done to ensure we're on the same page.
Thanks @mjbonifa. I have deleted the duplicate 57-fix-convertion-of-meldb-sail-phenotypes-to-acmc-0-1-12-2, as all commits have been merged.
The commented out code metadata code didn't seem to do much, the result of the for loop were overwritten and only the last element was actually used because of the missing indentation. However by this commit df2271e2, I understand metadata shouldn't be included in the example yamls?
From by understanding the "metadata" was a static property for every concept set. e.g. the tracking columns from the excel sheet. I have created a simple add_metadata() function (commit: 62c4b649), which duplicates these for all rows in the concept set, just after translation.
There may also be a need for a "metacolumn", where values for each row are not constant and copied from the source concept tables. e.g. "WILK_READCODE" column for the ALL_MEDICATIONS concept for the workaround WP3 wanted.
Happy to discuss further in Fridays meeting if need be.
"metadata" may clutter the output -> add flag to exclude metadata from output csv
"metacolumn" requires reimplementation. Difficulty in preserving consistent metacolumn names in output (e.g. "Description" and "descriptions" metacolumns). Would require merging with database? -> deemed outside of scope, and remove implementation.