Counts of drugs and indicates for human genes

Dear Community,

I am trying to collect the counts of drugs and indicates for each human gene, as the below. Anyone can share a R-API or Python-API script to achieve it?



Hi Shicheng

The API does not support mass data queries. You’re best using Google Cloud’s Big Query (BQ) to run those kind of queries, or downloading the raw data and using a tool such as Spark. If you want to use BQ, the following query you give you what you need:

WITH drugs as (
  SELECT targetId, array_length(array_agg(DISTINCT drugId)) as d
  FROM `bigquery-public-data.open_targets_platform.knownDrugsAggregated`
  GROUP BY targetId
), indications AS (
  SELECT targetId, array_length(array_agg(DISTINCT diseaseId)) as i
  FROM `bigquery-public-data.open_targets_platform.knownDrugsAggregated`
  GROUP BY targetId
SELECT drugs.targetId, d, i FROM drugs 
JOIN indications 
ON drugs.targetId = indications.targetId

The raw data is available from EBI’s FTP server in Parquet and JSON formats.


Thank you @JarrodBaker. I’m going to mark this as resolved for now. @Shicheng_Guo please let us know if you have any further questions

1 Like