Hi @ayush_gener8tor!
Welcome to the Open Targets Community!
Our GraphQL API is optimised for single entity or target-disease association queries. For more systematic queries (e.g. for hundreds of entities), you should use either our BigQuery instance or dataset downloads.
How do I use BigQuery to find drug information for a list of drug trade names?
Using our open-targets-prod instance, you can query the molecule
dataset with your list of drug trade names and retrieve data that can be exported in JSON, CSV, Google Sheets, or BigQuery table formats.
Please see below for a sample query for three specific drug trade names — ‘Premarin’, ‘Calcium disodium versenate’, ‘Keytruda’.
DECLARE
my_drug_list ARRAY<STRING>;
SET
my_drug_list = [ 'Premarin',
'Calcium disodium versenate',
'Keytruda' ];
SELECT
id AS drug_id,
name AS drug_chembl_name,
tradeNameList.element as drug_trade_name,
drugType AS drug_type,
isApproved AS drug_is_approved,
blackBoxWarning AS drug_blackbox_warning,
hasBeenWithdrawn AS drug_withdrawn,
FROM
`open-targets-prod.platform.molecule`,
UNNEST (tradeNames.list) AS tradeNameList
WHERE
(tradeNameList.element) IN UNNEST(my_drug_list)
Run this query in BigQuery
How do I use downloadable datasets to find drug information for a list of drug trade names?
Using our molecule
dataset available via in JSON or Parquet formats through our FTP service , you can access and query the dataset using your programming language of choice.
Please see below for a sample Python + PySpark script that uses our Parquet files and returns a dataframe with data for ‘Premarin’, ‘Calcium disodium versenate’, ‘Keytruda’.
# import relevant libraries
from pyspark import SparkConf
from pyspark.sql import SparkSession
import pyspark.sql.functions as F
import pandas as pd
# create Spark session
spark = (
SparkSession.builder
.master('local[*]')
.getOrCreate()
)
# set location of drugs dataset (Parquet format)
drugs_data_path = "path to local directory with downloaded files (e.g. /Users/Downloads/molecule)"
# read drugs dataset
drugs_data = spark.read.parquet(drugs_data_path)
# print drugs dataset schema
drugs_data.printSchema()
# convert to Pandas dataframe
drugs_df = drugs_data.toPandas()
# declare list of drug trade names
my_drug_list = ['Premarin', 'Calcium disodium versenate', 'Keytruda']
# explode tradeNames column in original data frame
drugs_df = drugs_df.explode('tradeNames')
# filter tradeNames column by my_drug_list
drugs_df = drugs_df[drugs_df['tradeNames'].isin(my_drug_list)]
# print length of dataframe
print(len(drugs_df))
# print first 5 rows of dataframe
drugs_df.head(5)
What else can I do?
Once you have queried the molecule
dataset with your list of drug trade names and have access to the ChEMBL ID — found in the id
field — you can use many of our other datasets to enrich your analyses:
mechanismOfAction
indications
knownDrugsAggregated
drugWarnings
faersSignificant
These datasets mirror what is available on the drug profile page (e.g. Premarin - CHEMBL1201649).
Let me know if this helps — and feel free to respond below with any further questions.
Cheers,
Andrew