Hi @ncoish!
Welcome to the Open Targets Community!
Unfortunately the targets
GraphQL API endpoint is not actively maintained and will be deprecated in a subsequent release as the API has been optimised to work for a single query.
Instead, I would recommend that you use our data downloads or BigQuery to run a more performant and systematic query for all Known Drugs data.
Finding Known Drugs data for a list of targets with BigQuery
Using our open-targets-prod instance, you can query the knownDrugsAggregated
dataset for a list of Ensembl IDs and export into JSON, CSV, Google Sheets, or BigQuery table format.
Please see below for a sample query for EGFR and ESR1:
SELECT
targetId,
approvedSymbol,
approvedName,
diseaseId,
label,
drugId,
prefName,
drugType,
mechanismOfAction,
phase,
status,
urlList.element.niceName,
urlList.element.url,
FROM
`open-targets-prod.platform.knownDrugsAggregated`,
UNNEST (urls.list) AS urlList
WHERE
targetId IN (
'ENSG00000146648',
'ENSG00000091831')
ORDER BY
phase DESC
Run this query - returns a table with 4,484 rows
Please note that if you run the query, you will return a table with more rows than what you see in the user interface. That is expected behaviour as the query duplicates the row content for each source URL, provided the original row has more than 1 source in the user interface.
Finding Known Drugs data for a list of targets with our dataset downloads
Using our knownDrugsAggregated
dataset available via in JSON or Parquet formats through our FTP service, you can access and query the dataset using your programming language of choice.
Please see below for a sample Python + PySpark script that will return a dataframe with data for ESR1 and EGFR:
# import relevant libraries
from pyspark import SparkConf
from pyspark.sql import SparkSession
import pyspark.sql.functions as F
import pandas as pd
# create Spark session
spark = (
SparkSession.builder
.master('local[*]')
.getOrCreate()
)
# set location of known drugs dataset (Parquet format)
known_drugs_data_path = "path to local directory with downloaded files (e.g. /Users/Downloads/knownDrugsAggregated)"
# read known drugs dataset
known_drugs_data = spark.read.parquet(known_drugs_data_path)
# print known drugs dataset schema
known_drugs_data.printSchema()
# convert to Pandas dataframe
known_drugs_df = known_drugs_data.toPandas()
# declare list of targets
my_target_list = ['ENSG00000146648','ENSG00000091831']
# filter original dataframe with list of targets
known_drugs_df_subset = known_drugs_df[known_drugs_df['targetId'].isin(my_target_list)]
# explode list of urls so that each row contains one URL
known_drugs_df_subset = known_drugs_df_subset.explode('urls')
# print length of dataframe
print(len(known_drugs_df_subset))
As with the BigQuery example, the Python script will generate a pandas dataframe with 4,484 rows, where each row will have its own source URL.
I hope this helps — feel free to comment below if you have further questions!
Cheers,
Andrew