How is ClinVar evidence mapped to disease terms?

For THRB, one of the reported genetic associations is Glycogen storage disease due to glycogen branching enzyme deficiency, an association which is solely supported by ClinVar evidence.

However, checking the source reveals ClinVar variants refer to a different phenotype of thyroid hormone resistance, which makes more sense.

If we check the ClinVar evidence in the Platform, it indeed shows that the reported disease was Thyroid Hormone Resistance. So how is ClinVar evidence mapped to disease terms?

This question was sent to the Open Targets helpdesk.

1 Like

This appears to be a genuine bug due to an incorrect manual string-to-ontology mapping. It’s coming from the eva_clinvar curated dataset: https://ftp.ebi.ac.uk/pub/databases/eva/ClinVar/latest/eva_clinvar.txt. As far as I can see, this mistake dates back to at least August 2018 (possibly earlier).

We will discuss within the data team today how to proceed here. I will most likely just fix this particular error manually, but we need to decide what to do with EVA (and potentially other) curation sources in general to avoid this in the future.

After some internal discussions, the following conclusion was reached. At this moment, I’ve fixed the incorrect mapping manually. This should take effect from the next release.

In the future, we have some plans to increase the quality of manual string-to-ontology disease mappings, including subjecting them to periodic re-curation. This should improve reliability of those mappings in the long term.