How to query API by drug trade name?

Hi all,

I have a list of several hundred drug trade names (e.g., “PREMARIN,” “CALCIUM
DISODIUM VERSENATE,” etc.) and would like to retrieve the Open Targets records for each drug.

I can make a successful query using the ChEMBL IDs:

query drug{
  drug(chemblId:"CHEMBL1201649") {
    name
    synonyms
    tradeNames
    description
  }
}

Since I am new to GraphQL, I am wondering whether the same query can be made using (in pseudocode):

query drug{
  drug(tradeNames CONTAINS "Premarin") {
    description
    ...
  }

I would be grateful for any guidance on how to do this. Thank you!

Hi @ayush_gener8tor! :wave:

Welcome to the Open Targets Community! :tada:

Our GraphQL API is optimised for single entity or target-disease association queries. For more systematic queries (e.g. for hundreds of entities), you should use either our BigQuery instance or dataset downloads.

How do I use BigQuery to find drug information for a list of drug trade names?

Using our open-targets-prod instance, you can query the molecule dataset with your list of drug trade names and retrieve data that can be exported in JSON, CSV, Google Sheets, or BigQuery table formats.

Please see below for a sample query for three specific drug trade names — ‘Premarin’, ‘Calcium disodium versenate’, ‘Keytruda’.

DECLARE
  my_drug_list ARRAY<STRING>;
SET
  my_drug_list = [ 'Premarin',
  'Calcium disodium versenate',
  'Keytruda' ];
SELECT
  id AS drug_id,
  name AS drug_chembl_name,
  tradeNameList.element as drug_trade_name,
  drugType AS drug_type,
  isApproved AS drug_is_approved,
  blackBoxWarning AS drug_blackbox_warning,
  hasBeenWithdrawn AS drug_withdrawn,
FROM
  `open-targets-prod.platform_21_06.molecule`,
  UNNEST (tradeNames.list) AS tradeNameList
WHERE
  (tradeNameList.element) IN UNNEST(my_drug_list)

Run this query in BigQuery

How do I use downloadable datasets to find drug information for a list of drug trade names?

Using our molecule dataset available via in JSON or Parquet formats through our FTP service , you can access and query the dataset using your programming language of choice.

Please see below for a sample Python + PySpark script that uses our Parquet files and returns a dataframe with data for ‘Premarin’, ‘Calcium disodium versenate’, ‘Keytruda’.

# import relevant libraries
from pyspark import SparkConf
from pyspark.sql import SparkSession
import pyspark.sql.functions as F
import pandas as pd

# create Spark session
spark = (
    SparkSession.builder
    .master('local[*]')
    .getOrCreate()
)

# set location of drugs dataset (Parquet format)
drugs_data_path = "path to local directory with downloaded files (e.g. /Users/Downloads/molecule)"

# read drugs dataset
drugs_data = spark.read.parquet(drugs_data_path)

# print drugs dataset schema
drugs_data.printSchema()

# convert to Pandas dataframe
drugs_df = drugs_data.toPandas()

# declare list of drug trade names
my_drug_list = ['Premarin', 'Calcium disodium versenate', 'Keytruda']

# explode tradeNames column in original data frame
drugs_df = drugs_df.explode('tradeNames')

# filter tradeNames column by my_drug_list
drugs_df = drugs_df[drugs_df['tradeNames'].isin(my_drug_list)]

# print length of dataframe
print(len(drugs_df))

# print first 5 rows of dataframe
drugs_df.head(5)

What else can I do?

Once you have queried the molecule dataset with your list of drug trade names and have access to the ChEMBL ID — found in the id field — you can use many of our other datasets to enrich your analyses:

  • mechanismOfAction
  • indications
  • knownDrugsAggregated
  • drugWarnings
  • faersSignificant

These datasets mirror what is available on the drug profile page (e.g. Premarin - CHEMBL1201649).

Let me know if this helps — and feel free to respond below with any further questions.

Cheers,

Andrew :slight_smile:

1 Like

Hi Andrew (@ahercules), this is remarkably helpful - thanks for your guidance. I’m having some trouble downloading the dataset from the data downloads page on a Windows machine. The solutions in the tutorial seem to be Unix specific. For example, running:

wget --recursive --no-parent --no-host-directories --cut-dirs 8 ftp://ftp.ebi.ac.uk/pub/databases/opentargets/platform/21.06/output/etl/json/molecule

returns the following:

Invoke-WebRequest : A positional parameter cannot be found that accepts argument '--no-parent'.
At line:1 char:1
+ wget --recursive --no-parent --no-host-directories --cut-dirs 8 ftp:/ ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidArgument: (:) [Invoke-WebRequest], ParameterBindingException
    + FullyQualifiedErrorId : PositionalParameterNotFound,Microsoft.PowerShell.Commands.InvokeWebRequestCommand

Any recommendations for a good Windows utility for this?

Hi @ayush_gener8tor!

To access our FTP using a Windows machine, I would recommend:

You can also use Google Cloud and download directly from our Cloud Storage buckets.

Hi @ahercules,

Thanks for your guidance. I tried FileZilla but was unable to install due to permission errors. I then tried Google Cloud but realized it is paywalled.

Finally, I tried ftp as follows:

PS C:\Users\ayush> ftp
ftp> open ftp.ebi.ac.uk
Connected to ftp.g.ebi.ac.uk.
220-Welcome to ftp.ebi.ac.uk
220
200 Always in UTF8 mode.
User (ftp.g.ebi.ac.uk:(none)): get pub/databases/opentargets/platform/21.06/output/etl/json/molecule
530 This FTP server is anonymous only.
Login failed.
ftp>

Am I doing something wrong here?

Hi @ayush_gener8tor!

The EBI public FTP is anonymous only and it looks like you passed a username:

User (ftp.g.ebi.ac.uk:(none)): get pub/databases/opentargets/platform/21.06/output/etl/json/molecule

The username should be blank, which could be why you see the 530 This FTP server is anonymous only. error message. See point #2 in this blog post by Sijin George for bobcares.com for more information.

Can you please try and login again, but leave the username blank — just press “Enter”? And if prompted for a password, also leave blank and press “Enter”.

If it works, you should then be able to run a command like lcd /pub/databases/opentargets/platform/21.06/output/etl/json/molecule to see all of the files available for download in the molecule directory.

Good luck! :crossed_fingers:

Hi @ahercules, when running the ftp command, I wasn’t explicitly prompted for a username/password, so not sure how I passed a username. I did, however, find a workaround. For future Windows users, perhaps the simplest approach is to enter the FTP address ftp://ftp.ebi.ac.uk/pub/databases/opentargets/platform/21.06/output/etl/json/molecule/ in the File Explorer search bar, which will open an FTP connection in File Explorer (essentially turning File Explorer into an FTP client).

Thanks again for all your help, @ahercules. I will post on this thread again if I have any additional questions.

1 Like

Hi @ayush_gener8tor!

That’s great news — glad to hear that you can access the FTP using Windows. I have update our dataset downloads documentation with a link to this thread in case other users want to access our FTP service with a Windows machine.

Cheers,

Andrew :slight_smile: