GraphQL Interface failing to fetch 'knownDrugs'

ncoish · 10 August 2021 18:41

I’ve been exploring the GraphQL interface, trying to fetch information on drugs associated with particular targets. Yesterday I tried the following query:

{
  targets(ensemblIds: ["ENSG00000001561", "ENSG00000004142", "ENSG00000112164"]) {
    id
    knownDrugs {
      uniqueDrugs
    }
  }
}

Which was working fine for a time. Unfortunately when I ran the same request later, the GraphQL browser ran for about 10 seconds, before strangely finally returning me a JSON.parse error, which I think must be a mistake in the playground:

{
  "error": "JSON.parse: unexpected character at line 2 column 1 of the JSON data"
}

Additionally, when I tried to automate this request in python (with a larger number of targets) I continuously get a 502 HTTP error after about a 10 second delay. Am I just be throttled by something?

Thank you for the great service, and please let me know if there’s anything I can do to address this problem.

ahercules · 10 August 2021 20:03

Hi @ncoish!

Welcome to the Open Targets Community!

Unfortunately the targets GraphQL API endpoint is not actively maintained and will be deprecated in a subsequent release as the API has been optimised to work for a single query.

Instead, I would recommend that you use our data downloads or BigQuery to run a more performant and systematic query for all Known Drugs data.

Finding Known Drugs data for a list of targets with BigQuery

Using our open-targets-prod instance, you can query the knownDrugsAggregated dataset for a list of Ensembl IDs and export into JSON, CSV, Google Sheets, or BigQuery table format.

Please see below for a sample query for EGFR and ESR1:

SELECT
  targetId,
  approvedSymbol,
  approvedName,
  diseaseId,
  label,
  drugId,
  prefName,
  drugType,
  mechanismOfAction,
  phase,
  status,
  urlList.element.niceName,
  urlList.element.url,
FROM
  `open-targets-prod.platform.knownDrugsAggregated`,
  UNNEST (urls.list) AS urlList
WHERE
  targetId IN (
    'ENSG00000146648',
    'ENSG00000091831')
ORDER BY
  phase DESC

Run this query - returns a table with 4,484 rows

Please note that if you run the query, you will return a table with more rows than what you see in the user interface. That is expected behaviour as the query duplicates the row content for each source URL, provided the original row has more than 1 source in the user interface.

Finding Known Drugs data for a list of targets with our dataset downloads

Using our knownDrugsAggregated dataset available via in JSON or Parquet formats through our FTP service, you can access and query the dataset using your programming language of choice.

Please see below for a sample Python + PySpark script that will return a dataframe with data for ESR1 and EGFR:

# import relevant libraries
from pyspark import SparkConf
from pyspark.sql import SparkSession
import pyspark.sql.functions as F
import pandas as pd

# create Spark session
spark = (
    SparkSession.builder
    .master('local[*]')
    .getOrCreate()
)

# set location of known drugs dataset (Parquet format)
known_drugs_data_path = "path to local directory with downloaded files (e.g. /Users/Downloads/knownDrugsAggregated)"

# read known drugs dataset
known_drugs_data = spark.read.parquet(known_drugs_data_path)

# print known drugs dataset schema
known_drugs_data.printSchema()

# convert to Pandas dataframe
known_drugs_df = known_drugs_data.toPandas()

# declare list of targets
my_target_list = ['ENSG00000146648','ENSG00000091831']

# filter original dataframe with list of targets
known_drugs_df_subset = known_drugs_df[known_drugs_df['targetId'].isin(my_target_list)]

# explode list of urls so that each row contains one URL
known_drugs_df_subset = known_drugs_df_subset.explode('urls')

# print length of dataframe
print(len(known_drugs_df_subset))

As with the BigQuery example, the Python script will generate a pandas dataframe with 4,484 rows, where each row will have its own source URL.

I hope this helps — feel free to comment below if you have further questions!

Cheers,

Andrew

ncoish · 20 August 2021 15:36

Thank you so much for the help Andrew! I’ll give this a try.

All the best,
Nick

Topic		Replies	Views
Using the API to search for drugs by target GraphQL API	6	589	17 September 2021
Retrieving >25 associated drugs GraphQL API	4	277	17 July 2024
Issue retrieving data via API GraphQL API	1	397	18 March 2022
Issue getting data with the GraphQL API using Python GraphQL API	4	557	8 March 2022
Help with using the new GraphQL API to pull targets associated with few diseases GraphQL API ot-platform	2	528	28 October 2021

GraphQL Interface failing to fetch 'knownDrugs'

Finding Known Drugs data for a list of targets with BigQuery

Finding Known Drugs data for a list of targets with our dataset downloads

Related topics