Get marketed drugs for a set of targets with BigQuery

At Open Targets, we have several ways of accessing our data. In this post, we illustrate how to make the best use of it from the BigQuery workspace.

Batch search using BigQuery

This query is an example of how to extract the drugs that are on the market for a restricted list of targets.

SELECT 
targets.approvedSymbol,
targets.id,
ARRAY_AGG(drugs.id) AS drugIds
FROM `open-targets-prod.platform.molecule` AS drugs,
    UNNEST(linkedTargets.rows.list) AS linkedTarget
    JOIN `open-targets-prod.platform.targets` AS targets 
    ON targets.id = linkedTarget.element
WHERE drugs.maximumClinicalTrialPhase = 4 AND 
    targets.approvedSymbol IN (
        'ERBB3',
        'EGFR'
    )
GROUP BY targets.id, targets.approvedSymbol

This is an alternative to the batch search functionality of the previous version so that we allow our users to have more control and to elaborate more complex hypotheses with the vast amount of data available in the Platform.

If the query becomes more complex or the set of targets is very large, we encourage you to see how these same steps would be done using our data downloads in this post.

Confused? Let’s break it down:

Step 1: Select the datasets

We use 2 datasets to establish the relationship between the drug and the target:

  • Target, which contains the HGNC Symbols;
  • Molecule, with the annotation of the drug molecules.

Step 2: Join the data

The molecule dataset comprises in linkedTargets.rows all the targets (in the form of Ensembl gene IDs) on which a drug acts according to its mechanism of action.

When joining the two tables, the only thing we have to take into account is that this field is an array that we have to flatten with UNNEST to establish the pairwise relationship between each array element and the id from target.

Step 3: Select marketed drugs

We want to filter out drugs with a maximum phase observed in clinical trials below < 4 to perform the analysis only on those drugs that are available on the market.

Step 4: Aggregate results by target

Grouping first by the symbol and the ENSID of the target, we collect all the resulting drugs in the array drugIds.

Download the data

Both the query and our datasets are publicly available at this link.
To download the results simply click Run > Save Results under the Query results section.
Consult the BigQuery documentation if you have problems accessing the workspace.

1 Like