Duplicate genes in the disease-association record with inconsistent scores


I just want to open a new post to highlight this comment in the other post I created. Basically, there are multiple association records for the same gene, and they do not align well with one another. I am assuming I can assume the UNION of all of those records as the actual association (I only need genetic association in my current task). Please correct me if this understanding is wrong, or this situation is by design.

The Open Targets Platform includes all genes that are in the Ensembl canonical build (GRCh38), as well as a small list of genes that are not in the canonical build but there is enough evidence of their existence at the protein level (Documentation). As you well noticed some of these approved symbols are repeatedly associated with multiple genes.

If we take MKKS as a reference, there are 3 targets in the Platform:

linked to the respective Ensembl genes:

Regarding the availability of evidence for these 3 targets, you are right to assume it’s a complicated matter. Whereas some of the Platform data sources depend on chromosomic locations that might (perhaps) disambiguate the genes, in most cases (e.g. literature mining) we won’t be able to tell whether the evidence is linked to one of the alternative genes.

It would be good to have some feedback on some individual cases and whether you thought that the union of evidence was the most appropriate answer for your hypothesis.

1 Like