How to select all diseases using the Open Targets Platform GraphQl API?

mose_rab · 9 August 2021 12:02

Hi,
In order to get some diseases I would run the query

query Query($diseasesEfoIds: [String!]!) {
diseases(efoIds: “MONDO_0100096”) {
id
name
description
}
}

This query works fine.
If I want to get all diseases, I try to run something like

query Query($diseasesEfoIds: [String!]!) {
diseases(efoIds: ‘*’) {
id
name
description
}
}

But this query runs with error.Please help me to construct my code in order to access all of the data (diseases).
Thanks and best regards
Tatjana

ahercules · 9 August 2021 12:26

Hi @mose_rab!

Welcome to the Open Targets Community!

Unfortunately our GraphQL API does not allow you to access data about all of the diseases/phenotypes contained in the Platform. Instead, you will need to use BigQuery or our dataset downloads.

Below, I have included instructions on how to use BigQuery and our dataset downloads. If you use our data, please cite our latest publication - Ochoa, D et al, 2021

Cheers,

Andrew

Accessing disease/phenotype data with BigQuery

Using our BigQuery instance - open-targets-prod - you can generate an export of disease data by querying our diseases dataset with the following query:

SELECT
  id,
  name,
  description,
FROM
  `open-targets-prod.platform.diseases`

After running the query, you can export the results in JSON or CSV format or import into another BigQuery instance or Google Sheets file.

Accessing disease/phenotype data with Platform dataset downloads

Using our FTP server, you can download our diseases dataset in either Parquet or JSON format.

Once you have downloaded the files, you can then parse using the programming language and libraries of your choice.

Please see below for an example using Python, PySpark, and pandas.

# import relevant libraries
from pyspark import SparkConf
from pyspark.sql import SparkSession
import pyspark.sql.functions as F
import pandas as pd

# create Spark session
spark = (
    SparkSession.builder
    .master('local[*]')
    .getOrCreate()
)

# set location of diseases dataset downloaded in Parquet format
disease_data_path = "/Users/amh/Downloads/platform-data-analysis/data/diseases"

# read diseases dataset
disease_data = spark.read.parquet(disease_data_path)

# print diseases dataset schema
disease_data.printSchema()

# generate subset of diseases dataset with relevant fields
disease_data_subset = (disease_data.select(F.col("id").alias("disease_id"), "name", "description"))

# convert to Pandas dataframe
disease_df = disease_data_subset.toPandas()

# print first 5 rows of disease dataframe
disease_df.head(5)

Our dataset downloads documentation also includes a sample sparklyR script that you can also use to access and parse the diseases dataset.

Topic		Replies	Views
Query disease association for target list in BigQuery Google BigQuery/Cloud	3	545	21 September 2022
How to find targets associated with a disease using the new GraphQL API or Google BigQuery Data Access ot-platform	0	1026	1 June 2021
Help with using the new GraphQL API to pull targets associated with few diseases GraphQL API ot-platform	2	516	28 October 2021
Returning all associations data using the Platform API GraphQL API	3	1427	11 October 2021
Excluding Non-Regular Diseases from Open Targets API Results GraphQL API	3	50	29 July 2024

How to select all diseases using the Open Targets Platform GraphQl API?

Accessing disease/phenotype data with BigQuery

Accessing disease/phenotype data with Platform dataset downloads

Related topics