Accessing locus-to-gene (L2G) and colocalisation data from the Genetics Portal by querying the API using Python

A user recently got in touch with the helpdesk to find out how they could obtain locus-to-gene (L2G) and colocalisation data for a specific gene in the Open Targets Genetics Portal.

Here we present two ways of obtaining the data displayed on the gene profile page, using NFAT5 as an example.

Using the GraphQL API

In order to access the data, you will need to access our GraphQL API endpoint and construct a query using different endpoints, parameters, fields noted in our schema.

The Genetics Portal GraphQL playground is a good place to try out different queries. For a step-by-step walkthrough of the GraphQL API playground, check out @JarrodBaker’s post on the Open Targets blog: Accessing Open Targets Genetics using GraphQL.

To obtain the data in the “Associated studies: locus-to-gene pipeline” table, you will need to query the ‘studiesAndLeadVariantsForGeneByL2G’ field and sub-fields. Below is a sample query for NFAT5, which you can also try directly in the playground by pressing the play button.

query GenePageQuery {
  geneInfo(geneId:"ENSG00000102908") {
    id
    symbol
  }
  studiesAndLeadVariantsForGeneByL2G(geneId: "ENSG00000102908") {
    pval
    yProbaModel
    study {
      studyId
      traitReported
      pubAuthor
      pubDate
      pmid
      nInitial
      nReplication
      hasSumsStats
    }
    variant {
      rsId
      id
    }
    odds{
      oddsCI
      oddsCILower
      oddsCIUpper
    }
    beta{
      betaCI
      betaCILower
      betaCIUpper
      direction
    }
  }
}

Similarly, to access the data in the “Associated studies: Colocalisation analysis” table, you will need to query the ‘colocalisationForGene’ field and sub-fields. Below is a sample query using NFAT5, which you can also try out directly in the playground.

query GenePageQuery {
  geneInfo(geneId:"ENSG00000102908") {
    id
    symbol
  }
  colocalisationsForGene(geneId: "ENSG00000102908") {
    leftVariant {
      id
      rsId
    }
    study {
      studyId
      traitReported
      pubAuthor
      pubDate
      pmid
      hasSumsStats
    }
    qtlStudyId
    phenotypeId
    tissue {
      id
      name
    }
    h3
    h4
    log2h4h3
  }
}

Accessing the same data using Python

The same queries can be constructed in Python. Below are sample scripts that construct the query string from NFAT5, execute the query using ‘requests’, and print the first element of the response data in JSON format.

Sample script for querying L2G data:

#import libraries to test solution
import requests
import json 

def associated_studies_l2g_query(gene_id):

  #construct query string and declare variables that will be sent in query
  api_query = """
    query GenePageQuery($geneId: String!) {
      geneInfo(geneId: $geneId) {
        id
        symbol
      }
      studiesAndLeadVariantsForGeneByL2G(geneId: $geneId) {
        pval
        yProbaModel
        study {
          studyId
          traitReported
          pubAuthor
          pubDate
          pmid
          nInitial
          nReplication
          hasSumsStats
        }
        variant {
          rsId
          id
        }
        odds{
          oddsCI
          oddsCILower
          oddsCIUpper
        }
        beta{
          betaCI
          betaCILower
          betaCIUpper
          direction
        }
      }
    }
  """ 

  #set base_url for Open Targets Genetics Portal API
  base_url = "http://genetics-api.opentargets.io/graphql"

  #set variables object
  variables = {"geneId": gene_id}

  #perform API call using query string and variables object 
  r = requests.post(base_url, json={"query": api_query, "variables": variables})

  #check status code of GraphQL API response and print error message if code == 400
  if str(r.status_code) == "400":
    print(f"{gene_id} query status code: {r.status_code}")
  else:
    pass

  #transform API response into JSON 
  api_response_as_json = json.loads(r.text)
  
  #print first element of JSON response data
  print(api_response_as_json["data"]["studiesAndLeadVariantsForGeneByL2G"][0])

  #return entire JSON response data
  # return api_response_as_json



# execute function with sample gene - NFAT5 (ENSG00000102908)
associated_studies_l2g_query("ENSG00000102908")

Sample script for querying colocalisation data:

#import libraries to test solution
import requests
import json 

def associated_studies_coloc_query(gene_id):

  #construct query string and declare variables that will be sent in query
  api_query = """
    query GenePageQuery($geneId: String!) {
      geneInfo(geneId: $geneId) {
        id
        symbol
      }
      colocalisationsForGene(geneId: $geneId) {
        leftVariant {
          id
          rsId
        }
        study {
          studyId
          traitReported
          pubAuthor
          pubDate
          pmid
          hasSumsStats
        }
        qtlStudyId
        phenotypeId
        tissue {
          id
          name
        }
        h3
        h4
        log2h4h3
      }
    }
  """ 

  #set base_url for Open Targets Genetics Portal API
  base_url = "http://genetics-api.opentargets.io/graphql"

  #set variables object
  variables = {"geneId": gene_id}

  #perform API call using query string and variables object 
  r = requests.post(base_url, json={"query": api_query, "variables": variables})

  #check status code of GraphQL API response and print error message if code == 400
  if str(r.status_code) == "400":
    print(f"{gene_id} query status code: {r.status_code}")
  else:
    pass

  #transform API response into JSON 
  api_response_as_json = json.loads(r.text)
  
  #print first element of JSON response data
  print(api_response_as_json["data"]["colocalisationsForGene"][0])

  #return entire JSON response data
  # return api_response_as_json



# execute function with sample gene - NFAT5 (ENSG00000102908)
associated_studies_coloc_query("ENSG00000102908")

Sample scripts courtesy of @ahercules.