How to select all diseases using the Open Targets Platform GraphQl API?

Hi,
In order to get some diseases I would run the query

query Query($diseasesEfoIds: [String!]!) {
diseases(efoIds: “MONDO_0100096”) {
id
name
description
}
}

This query works fine.
If I want to get all diseases, I try to run something like

query Query($diseasesEfoIds: [String!]!) {
diseases(efoIds: ‘*’) {
id
name
description
}
}

But this query runs with error.Please help me to construct my code in order to access all of the data (diseases).
Thanks and best regards
Tatjana

Hi @mose_rab! :wave:

Welcome to the Open Targets Community! :tada:

Unfortunately our GraphQL API does not allow you to access data about all of the diseases/phenotypes contained in the Platform. Instead, you will need to use BigQuery or our dataset downloads.

Below, I have included instructions on how to use BigQuery and our dataset downloads. If you use our data, please cite our latest publication - Ochoa, D et al, 2021

Cheers,

Andrew :slight_smile:

Accessing disease/phenotype data with BigQuery

Using our BigQuery instance - open-targets-prod - you can generate an export of disease data by querying our open-targets-prod.platform_21_06.diseases dataset with the following query:

SELECT
  id,
  name,
  description,
FROM
  `open-targets-prod.platform_21_06.diseases`

After running the query, you can export the results in JSON or CSV format or import into another BigQuery instance or Google Sheets file.

Accessing disease/phenotype data with Platform dataset downloads

Using our FTP server, you can download our diseases dataset in either Parquet or JSON format.

Once you have downloaded the files, you can then parse using the programming language and libraries of your choice.

Please see below for an example using Python, PySpark, and pandas.

# import relevant libraries
from pyspark import SparkConf
from pyspark.sql import SparkSession
import pyspark.sql.functions as F
import pandas as pd

# create Spark session
spark = (
    SparkSession.builder
    .master('local[*]')
    .getOrCreate()
)

# set location of diseases dataset downloaded in Parquet format
disease_data_path = "/Users/amh/Downloads/platform-data-analysis/data/diseases"

# read diseases dataset
disease_data = spark.read.parquet(disease_data_path)

# print diseases dataset schema
disease_data.printSchema()

# generate subset of diseases dataset with relevant fields
disease_data_subset = (disease_data.select(F.col("id").alias("disease_id"), "name", "description"))

# convert to Pandas dataframe
disease_df = disease_data_subset.toPandas()

# print first 5 rows of disease dataframe
disease_df.head(5)

Our dataset downloads documentation also includes a sample sparklyR script that you can also use to access and parse the diseases dataset.