How to find known drugs for a given disease

Objective

We are currently working through a bug (#1565) that prevents users from downloading certain data tables from the user interface, such as known drugs for a given disease. We aim to have it resolved for our next release, at the end of June.

In the meantime, how else can you access this data?

Data downloads

One way to access the data is to use our data downloads, specifically the KnownDrugsAggregated dataset, available in JSON or Parquet format from the FTP. For more information, and for sample Python and R scripts, check out our dataset documentation.

BigQuery

You can also use our BigQuery instance — open-targets-prod — to retrieve the data using an SQL query.

Here is an example query for a single disease, EFO_1001947, which you can run directly here:

SELECT
targetId,
approvedSymbol,
approvedName,
diseaseId,
label,
drugId,
prefName,
drugType,
mechanismOfAction,
phase,
status,
urlList.element.niceName,
urlList.element.url,
FROM `open-targets-prod.platform.knownDrugsAggregated`,
    UNNEST (urls.list) as urlList
WHERE diseaseId='EFO_1001947'
ORDER BY phase desc

You’ll notice that the number returned by BigQuery (175) is different from the number returned on the childhood T acute lymphoblastic leukemia profile page (138). This is because in BigQuery, the script has been set to return separate rows for each source entry in the urls.list array, whereas in the web interface, we aggregate the data in the Source column (e.g. 2 references).

Further reading

@irene recently resolved a similar question, this time extracting the list of known drugs for multiple targets:

Get marketed drugs for a set of targets with BigQuery
Get marketed drugs for a a set of targets with the data downloads

1 Like