Returning all associations data using the Platform API

Thank you for submitting your question about how to access associations data using our GraphQL API. I have responded below and also included two other ways of accessing the data for more systematic and comprehensive queries.

How do I access associations data using the Open Targets Platform GraphQL API?

To access all associations data for a given target or disease/phenotype through our GraphQL API, you you will need to iterate through each page of results using the associatedDiseases field and the page argument.

associatedDiseases(page: { size: x, index:y })

Using your example above β€” diseases associated with CAV1 (ENSG00000105974) β€” your first API query would be:

query targetDiseaseAssociations {
  target(ensemblId: "ENSG00000105974") {
    associatedDiseases(page: { size: 50, index: 0 }) {
      rows {
        disease {

This query will return the first 50 results that you see in the web interface associations page.

To access the next 50 results, you will need to update your query and set index to 1.

query targetDiseaseAssociations {
  target(ensemblId: "ENSG00000105974") {
    associatedDiseases(page: { size: 50, index: 1 }) {
      rows {
        disease {

You would need to continue to update index until you obtain all results. Because our API returns a maximum of 50 records for a given query, you would need to run the query 19 times to get all 947 results.

As noted in our API documentation, the GraphQL API is optimised for querying a single target, disease/phenotype, drug, or target-disease association. It is not suitable for running for loops to iterate through pages and pages of results.

Instead, for the more programmatic, systematic, and comprehensive use case that you have, I would strongly recommend using our associations datasets - associationByOverallDirect, associationByOverallIndirect, associationByDatasourceDirect and associationByDatasourceIndirect. These datasets can be accessed using our BigQuery instance - open-targets-prod - or our dataset downloads.

How do I access associations data using BigQuery?

In our BigQuery open-targets-prod instance, you will find different associations files for direct and indirect associations and by overall and datasource scores. You can query these datasets using SQL.

For example, using our associationByOverallDirect dataset, you can run the following query and return all 947 associations and the overall target-disease association scores:

  associations.targetId AS target_id,
  targets.approvedSymbol AS target_approved_symbol,
  associations.diseaseId AS disease_id, AS disease_name,
  associations.score AS overall_association_score
  `open-targets-prod.platform.associationByOverallDirect` AS associations
  `open-targets-prod.platform.diseases` AS diseases
  associations.diseaseId =
  `open-targets-prod.platform.targets` AS targets
  associations.targetId =
ORDER BY associations.score desc

You can then export these results to CSV, JSON, or Google Sheets formats or import into your own BigQuery table.

How do I access associations data using dataset downloads?

Using our FTP service, you can download our datasets in either JSON or Parquet formats and use your programming language of choice to query and analyse the data.

For example, to generate a CSV with all 947 associations, you can use PySpark, Python, and pandas to process the associationByOverallDirect and diseases datasets.

# import relevant libraries
from pyspark import SparkConf
from pyspark.sql import SparkSession
import pyspark.sql.functions as F
import pandas as pd

# create Spark session
spark = (

# set location of associations dataset (Parquet format)
associations_data_path = "path to directory with associations dataset files"

# read associations dataset
associations_data =

# print associations dataset schema
# associations_data.printSchema()

# create subset with relevant fields
associations_data_subset = ("targetId","diseaseId", F.col("score").alias("overallAssociationScore")))

# set location of diseases dataset (Parquet format)
disease_data_path = "path to directory with diseases dataset files"

# read diseases dataset
disease_data =

# print diseases dataset schema
# disease_data.printSchema()

# create subset with relevant fields
disease_data_subset = ("id").alias("diseaseId"), F.col("name").alias("diseaseLabel")))

# merge associations and diseases data
output = (associations_data_subset
              .join(disease_data_subset, on="diseaseId", how="inner")

# show output of merged data

# convert output to pandas dataframe
output_df = output.toPandas()

# filter dataframe for CAV1 (ENSG00000105974)
output_df = output_df[output_df["targetId"] == "ENSG00000105974"]

# print length of filtered dataframe

# export dataframe to CSV

Apologies for such a long reply :sweat_smile: but I wanted to show you that you can also answer your research question using BigQuery or our dataset downloads.

Good luck :crossed_fingers: β€” and feel free to comment below if you have any further questions!


