How to find targets associated with a disease using the new GraphQL API or Google BigQuery

Problem

The Open Targets helpdesk recently received the following question:

I have used the older version of the Open Targets Platform (21.02), and the previous API technique, to get targets associated with disease and their overall scores.

But now after updating (to version 21.04) I can’t do it with the new GraphQL API technique. Could you help me with this? How can I get associated targets with disease and their overall scores using the new GraphQL API?

To make it easier, I need to get the first two columns of the table on, for example, targets associated with COVID, but without downloading it, just getting info from API inside Python script.

Using the GraphQL API

In order to get targets associated with a specific disease, you will need to start your query with the disease ID and then use the associatedTargets field to access the relevant target data. For example, to get the first 10 targets associated with COVID (MONDO_0100096), you would run the following query:

query targetsAssociatedWithCOVID {
  disease(efoId: "MONDO_0100096") {
    id
    name
    associatedTargets(page: { index: 0, size: 10 }) {
      count
      rows {
        target {
          id
          approvedSymbol
        }
        score
      }
    }
  }
}

You can try this query in our GraphQL API playground by pressing the triangle play button.

Please note that when using the GraphQL API, you will need to construct your code and adjust the index and size parameters in order to access all of the data. The GraphQL API defaults to returning only 25 entries and so index:0, size:25 will return the first 25 entries, index:1, size:25 will return the next 25 entries, etc.

Using BigQuery

To support more complex and systematic queries, we recommend that you use our BigQuery instance — open-targets-prod. To access the associations data using BigQuery, you can use the following SQL query with the associationByOverallDirect dataset.

SELECT
  associations.diseaseId AS disease_id,
  diseases.name AS disease_name,
  associations.targetId AS target_id,
  targets.approvedSymbol AS target_approved_symbol,
  associations.score AS overall_association_score,
  evidenceCount AS number_of_evidence_strings
FROM
  `open-targets-prod.platform.associationByOverallDirect` AS associations
JOIN
  `open-targets-prod.platform.diseases` AS diseases
ON
  associations.diseaseId = diseases.id
JOIN
  `open-targets-prod.platform.targets` AS targets
ON
  associations.targetId = targets.id
WHERE
  associations.diseaseId='MONDO_0100096'
ORDER BY
  associations.score DESC

You can access the query here and download the data as JSON or CSV format.

Please note that the association scores returned by the GraphQL API and those available in BigQuery (or via our data downloads) are known to be different due to a modified algorithm and harmonic sum strategy. This will be fixed in our upcoming release, scheduled for the end of June — see our GitHub issue tracker #1508 for more information.

Other examples

For further information and sample scripts, take a look at the Platform documentation. We also have a few other example scripts from @irene and @ahercules: