Phecode1.2 match with diseaseIds

Is there a lack of data connecting Phecode 1.2 and disease IDs?

Hi @hanl3,

Please can you provide some more information about your issue? e.g. which datasets are you using, what are you trying to do?

Best wishes,

Helena

Thanks, Helena. I currently have GWAS results for a subset of phecode 1.2 codes. To identify novel GWAS signals, I matched these phecodes to EFO terms or traitFromSource entries. Based on online searches for mappings between diseaseIDs and phecode 1.2 codes, I noticed that some traitFromSource phecode strings do not match the corresponding phecode 1.2 codes. Should I remove these mismatched pairs from the OTG-derived subset, or is there a better approach for mapping phecode 1.2 codes to diseaseIDs?

Hello @hanl3, the column traitFromSource is Molecular or phenotypic trait, derived from source, analysed in the study. The phecodes derived from that field will reflect only a subset of GWAS studies that used phecode for the phenotype annotation. I had checked and it narrows the scope to

publicationTitle
Multi-domain rule-based phenotyping algorithms enable improved GWAS signal.
A generalized linear mixed model association tool for biobank-scale data.
Pan-UK Biobank genome-wide association analyses enhance discovery and resolution of ancestry-enriched effects.
Diversity and scale: Genetic architecture of 2068 traits in the VA Million Veteran Program.
Diversity and longitudinal records: Genetic architecture of disease associations and polygenic risk in the Taiwanese Han population.
Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies.

a few biobanks (UKBB and MVP including). Based on the thing you want to do, I would first search for the publications above, which phecode version have the authors used to verify the mapping.

The other option is to search in the disease dataset for the dbXRefs, which also have the mapping from EFO to other ontologies (ICD, MEDGEN, etc)

Do you think dbXRefs includes phecode 1.2? I know it contains ICD codes.

No, there is no direct map between EFO and phecode in the disease dataset as far as I know. There should be some ways to make the two step mappings EFOICD → phecodes. Example mapping between ICD and phecodes can be found in PhecodeX repo.

In the OTG dataset, only 5,593 diseaseIDs are represented in the diseaseID–ICD-9/10 mappings, whereas the diseaseID–variantID component contains 10,515 distinct diseaseIDs. As a result, many standard ICD-9/10 codes cannot be linked to corresponding diseaseIDs. Do you know of any reliable approaches for constructing additional ICD–diseaseID pairs?

I do not know of any direct mapping for other phenotypes.

We are ingesting the disease id cross references from the EFOs, provided that there are no phecodes available, it would be useful if you could request that feature there, then once that is done by the EFO team, we could ingest it to the Platform. The relevant issue tracker → GitHub · Where software is built

With kind regards
Szymon Szyszkowski

Hi @hanl3 ,

You can try to map the phecode phenotype descriptions.
If you are familiar with Python, you can try using OnToma, our Python package for entity mapping.
Otherwise, you can also try using ZOOMA which has a web interface.

Best wishes,
Vivien