Variant to Gene Query for multiple variants in OT Genetics

Dear Open Target team,

First of all, thank you very much for your valuable work.

I need to perform a Variant to Gene search for multiple variants (>200).

I am trying to find a way to do it through a script that generates a table with the results containing the same parameters as when I search the rsid individually through the browser but I’ve been not able to find what I am looking for in the documentation.

Your guidance would be very appreciated.

Thank you very much!

Paloma

Thanks for your question Paloma. For a moderate number of variants, you could use the GraphQL API, which can be access via a browser, or programmatically.

For example, a query in the browser might look something like this:
GraphQL variant to gene query.

You can use a language of your choice to do the query programmatically, e.g. python or R. Here is an example in R:

# Install relevant library for HTTP requests
library(httr)

# Set gene_id variable
variantId <- "17_44352876_C_T"

# Build query string
query_string = "
query v2g($variantId: String!) {
  genesForVariant(variantId: $variantId) {
    gene {
      id
    }
    variant
    overallScore
    distances {
      sourceId
      aggregatedScore
      tissues {
      	distance
      }
    }
  }
}"

# Set base URL of GraphQL API endpoint
base_url <- "https://api.genetics.opentargets.org/graphql"

# Set variables object of arguments to be passed to endpoint
variables <- list("variantId" = variantId)

# Construct POST request body object with query string and variables
post_body <- list(query = query_string, variables = variables)

# Perform POST request
r <- POST(url=base_url, body=post_body, encode='json')

df = content(r)
# Print first entry of V2G data console
head(content(r)$data$genesForVariant, 1)

# Flatten the nested result fields into a dataframe
library(rlist)
list_result = content(r)$data$genesForVariant
x = lapply(list_result, list.flatten)

library(dplyr)
df = bind_rows(x)

Note that if you want to get the information for individual QTL associations, you would need to do a bit more to flatten the resulting nested lists.

Thank you very much for your detailed and very helpful response Jeremy! It worked!

Hi JeremyS, thank you for your valuable information!

How should I change “query_string” to get the data shown as the following?

I tried with the full text shown in your example, however, I got a lot of qtls, I did not see anything such as PCHi-C, DHS-promoter corr, VEP… Maybe I missed it.

I tried to find the information on the data structure and failed. Maybe I missed it too.

Hi llg, in the v2g_scored file, you will find this information, each row represents an evidence source for a given variant to gene pair (as indicated by the type_id and source_id columns), so if you were particularly interested in pchic evidence sources, you can filter the v2g_scored file based on the type_id column.

Hope this helps,
Xiangyu

root
 |-- chr_id: string (nullable = true)
 |-- position: long (nullable = true)
 |-- ref_allele: string (nullable = true)
 |-- alt_allele: string (nullable = true)
 |-- gene_id: string (nullable = true)
 |-- feature: string (nullable = true)
 |-- type_id: string (nullable = true)
 |-- source_id: string (nullable = true)
 |-- fpred_labels: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- fpred_scores: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- fpred_max_label: string (nullable = true)
 |-- fpred_max_score: double (nullable = true)
 |-- qtl_beta: double (nullable = true)
 |-- qtl_se: double (nullable = true)
 |-- qtl_pval: double (nullable = true)
 |-- qtl_score: double (nullable = true)
 |-- interval_score: double (nullable = true)
 |-- qtl_score_q: double (nullable = true)
 |-- interval_score_q: double (nullable = true)
 |-- d: long (nullable = true)
 |-- distance_score: double (nullable = true)
 |-- distance_score_q: double (nullable = true)
 |-- overall_score: double (nullable = true)
 |-- source_list: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- source_score_list: array (nullable = true)
 |    |-- element: double (containsNull = true)