GraphQL Interface failing to fetch 'knownDrugs'

I’ve been exploring the GraphQL interface, trying to fetch information on drugs associated with particular targets. Yesterday I tried the following query:

  targets(ensemblIds: ["ENSG00000001561", "ENSG00000004142", "ENSG00000112164"]) {
    knownDrugs {

Which was working fine for a time. Unfortunately when I ran the same request later, the GraphQL browser ran for about 10 seconds, before strangely finally returning me a JSON.parse error, which I think must be a mistake in the playground:

  "error": "JSON.parse: unexpected character at line 2 column 1 of the JSON data"

Additionally, when I tried to automate this request in python (with a larger number of targets) I continuously get a 502 HTTP error after about a 10 second delay. Am I just be throttled by something?

Thank you for the great service, and please let me know if there’s anything I can do to address this problem.

Hi @ncoish! :wave:

Welcome to the Open Targets Community! :tada:

Unfortunately the targets GraphQL API endpoint is not actively maintained and will be deprecated in a subsequent release as the API has been optimised to work for a single query.

Instead, I would recommend that you use our data downloads or BigQuery to run a more performant and systematic query for all Known Drugs data.

Finding Known Drugs data for a list of targets with BigQuery

Using our open-targets-prod instance, you can query the knownDrugsAggregated dataset for a list of Ensembl IDs and export into JSON, CSV, Google Sheets, or BigQuery table format.

Please see below for a sample query for EGFR and ESR1:

  UNNEST (urls.list) AS urlList
  targetId IN (
  phase DESC

Run this query - returns a table with 4,484 rows

Please note that if you run the query, you will return a table with more rows than what you see in the user interface. That is expected behaviour as the query duplicates the row content for each source URL, provided the original row has more than 1 source in the user interface.

Finding Known Drugs data for a list of targets with our dataset downloads

Using our knownDrugsAggregated dataset available via in JSON or Parquet formats through our FTP service, you can access and query the dataset using your programming language of choice.

Please see below for a sample Python + PySpark script that will return a dataframe with data for ESR1 and EGFR:

# import relevant libraries
from pyspark import SparkConf
from pyspark.sql import SparkSession
import pyspark.sql.functions as F
import pandas as pd

# create Spark session
spark = (

# set location of known drugs dataset (Parquet format)
known_drugs_data_path = "path to local directory with downloaded files (e.g. /Users/Downloads/knownDrugsAggregated)"

# read known drugs dataset
known_drugs_data =

# print known drugs dataset schema

# convert to Pandas dataframe
known_drugs_df = known_drugs_data.toPandas()

# declare list of targets
my_target_list = ['ENSG00000146648','ENSG00000091831']

# filter original dataframe with list of targets
known_drugs_df_subset = known_drugs_df[known_drugs_df['targetId'].isin(my_target_list)]

# explode list of urls so that each row contains one URL
known_drugs_df_subset = known_drugs_df_subset.explode('urls')

# print length of dataframe

As with the BigQuery example, the Python script will generate a pandas dataframe with 4,484 rows, where each row will have its own source URL.

I hope this helps — feel free to comment below if you have further questions!


Andrew :slight_smile:

1 Like

Thank you so much for the help Andrew! I’ll give this a try.

All the best,