Opentargets.targets.homologues fields description

Hello, may I ask for descriptions of the fields of homologues? where do they come from (data source) . and what does it mean the querypercentageidentity and targetpercentageidentity?

Hi @koryclick,

This refers to the Comparative Genomics information we have in our target profile pages (see below, example for HCN1).

This is information we pull from Ensembl Compara. From the Ensembl Compara documentation:

the percent of identical amino acids in the paralogue compared with the gene of interest (Target %ID). The identity of the gene of interest when compared with the paralogue is the query %ID.

1 Like

Also, as a side note:

We’re in the process of annotating the datasets with descriptions for the columns, among other metadata. We’ll make sure provenance information is there.

If you feel you’d benefit from any other metadata, let us know!

2 Likes

Sure! many thanks :slight_smile: It would be great to know also about open.targets.class . i will raise another ticket to address the docs about that one

Hello again @hcornu and @Javi . I was going through the documentation and the link attached ( Help - Homo_sapiens - Ensembl genome browser 113) describes homologues “paralogues”. but i see there are also homologues “orthologues”. do their columns have their same description?
I mean,

  • querypercentageidentity : The identity of the gene of interest when compared with the paralogue. (IF my gene of interest is human, i should not retrieve non-human paralogues, then this should be 0, right?)
  • targetpercentageidentity : The percent of identical amino acids in the paralogue compared with the gene of interest. (the gene of interest is HUMAN, so this percentage is the aminoacids in the “orthologue” identical? ) . ← in the case of orthologues, do I have to look into other piece of documentation? also the “priority” and the “is highConfidence” or “targetGeneId” what exactly mean? could I assume that the target GeneId, is the gene “homologue” that match with my initial search by “gene symbol” that is a human gene, in opentargets?

E.g: I look for HCN1
→ I get a targetGeneID EG:… for Homologue paralogue, with Query ID%90 and Target Id%90.
→ Also I get another row that is a targetGeneID EM.:… No Human, Homologue Orthologue, with Query ID 50% and Target ID% — (this should be nothing, because my target gene is human and I am looking for Human-human paralogues, or Human-nohuman orthologues)

I think I found part of my answer here: Homology types I understand that there are between species paralogues too.
With those descriptions in mind, my interpretation of the example above:
E.g: I look for HCN1
→ I get a targetGeneID EG:00001 for Homologue paralogue, with Query ID%90 and Target Id%80.

  • The QueryPercentageIdentity means that “There is 90% of gene identity of the HUMAN GENE HCN1 when compared to the PARALOGUE human EG:00001”
  • The TargetPercentageIdentity means that “There is 80% of identical aminoacids in the human paralogue compared with the HUMAN GENE HCN1” (How could we get the size of aas chains compared? I assume in this case the size chains are the same length, right?)
    → Also I get another row that is a targetGeneID EM.:000001 No Human, Homologue Orthologue, with Query ID 50% and Target ID% 70
  • The QueryPercentageIdentity means that “There is 50% of gene identity of the HUMAN GENE HCN1 when compared to the ORTHOLOGUE-no human EM:00001”
  • The TargetPercentageIdentity means that “There is 70% of identical aminoacids in the nohuman orthologue compared with the HUMAN GENE HCN1” (**How could we get the size of aas chains compared? **

Hi @koryclick,

Paralogues and orthologues are different types of homologues. The difference is their origin: orthologues are genes separated by speciation, paralogues are genes separated by gene duplication events. The columns have the same description.

The Comparative Genomics widget is used to answer two questions:

  • Is there a similar gene in a species that I can study? (other species orthologues)
  • Is there a similar gene in humans that could be a safety liability if I choose this gene as a target? (human paralogues)

The widget doesn’t have data on the amino acid chain sizes, but that’s something you should be able to get from Ensembl. I don’t believe that the chains are necessarily of the same length.

Target%id and Query%id differ because it depends which homologue you use as your base for comparison.

Let me know if you have further questions or a specific example that we can look at.