How to select all diseases using the Open Targets Platform GraphQl API?

In order to get some diseases I would run the query

query Query($diseasesEfoIds: [String!]!) {
diseases(efoIds: “MONDO_0100096”) {

This query works fine.
If I want to get all diseases, I try to run something like

query Query($diseasesEfoIds: [String!]!) {
diseases(efoIds: ‘*’) {

But this query runs with error.Please help me to construct my code in order to access all of the data (diseases).
Thanks and best regards

Hi @mose_rab! :wave:

Welcome to the Open Targets Community! :tada:

Unfortunately our GraphQL API does not allow you to access data about all of the diseases/phenotypes contained in the Platform. Instead, you will need to use BigQuery or our dataset downloads.

Below, I have included instructions on how to use BigQuery and our dataset downloads. If you use our data, please cite our latest publication - Ochoa, D et al, 2021


Andrew :slight_smile:

Accessing disease/phenotype data with BigQuery

Using our BigQuery instance - open-targets-prod - you can generate an export of disease data by querying our diseases dataset with the following query:


After running the query, you can export the results in JSON or CSV format or import into another BigQuery instance or Google Sheets file.

Accessing disease/phenotype data with Platform dataset downloads

Using our FTP server, you can download our diseases dataset in either Parquet or JSON format.

Once you have downloaded the files, you can then parse using the programming language and libraries of your choice.

Please see below for an example using Python, PySpark, and pandas.

# import relevant libraries
from pyspark import SparkConf
from pyspark.sql import SparkSession
import pyspark.sql.functions as F
import pandas as pd

# create Spark session
spark = (

# set location of diseases dataset downloaded in Parquet format
disease_data_path = "/Users/amh/Downloads/platform-data-analysis/data/diseases"

# read diseases dataset
disease_data =

# print diseases dataset schema

# generate subset of diseases dataset with relevant fields
disease_data_subset = ("id").alias("disease_id"), "name", "description"))

# convert to Pandas dataframe
disease_df = disease_data_subset.toPandas()

# print first 5 rows of disease dataframe

Our dataset downloads documentation also includes a sample sparklyR script that you can also use to access and parse the diseases dataset.