Downloading variant information

Gpathak · 16 June 2022 14:14

Hi!
I used to be able run the following in R to download details on variants. It doesn’t work with the new API link.

I also cannot find the following in the schema.

qry <- Query$new()
   # using this
   id <- "rs12345" # variant ID
   
   
   #====================
   # build query, e.g.:
   qry$query('my_query', paste0('{search(queryString:', "\"",id, "\"", ') {
     
    variants {
      id
      rsId
      chromosome
      position
      refAllele
      altAllele
      nearestGeneDistance
      nearestCodingGeneDistance
 
}
  
    
  }}'))
   
   ##===================
   # run query and format results:
   res1 <- fromJSON(cli$exec(qry$queries$my_query), flatten = TRUE)$data$search$variants

irene · 17 June 2022 12:06

Hi @Gpathak and wellcome to the Community!

There have not been any changes to the endpoints you are describing. I am not familiar with the process of inserting a subquery inside a query, and I wonder if the problem is coming from the R library that is handling this type of query.

You can obtain the same result if you make 2 consecutive queries. This is a snippet of how I would do the process in Python:

import requests

url = 'https://api.genetics.opentargets.org/graphql'

search_query  = """
	query searchRsId($rsId: String!) {
		search(queryString: $rsId) {
	    variants {
	      id
	    }
	  }
	}
"""

variant_query = """
	query variantInfo($variantId: String!) {
	  variantInfo(variantId: $variantId) {
		id
	    rsId
	    chromosome
	    position
	    refAllele
	    altAllele
	    nearestGeneDistance
	    nearestCodingGeneDistance
	  }
	}
"""

variables = {'rsId': 'rs10469840'} # Your rsID of interest

search_result = requests.post(url, json={'query': search_query, 'variables':variables}).json()
# This returns: {'data': {'search': {'variants': [{'id': '2_102476784_T_C'}]}}}

variables.update({'variantId': search_result['data']['search']['variants'][0]['id']})

variant_result = requests.post(url, json={'query': variant_query, 'variables':variables}).json()
'''
This returns: {'data': {'variantInfo': {'id': '2_102476784_T_C',
   'rsId': 'rs10469840',
   'chromosome': '2',
   'position': 102476784,
   'refAllele': 'T',
   'altAllele': 'C',
   'nearestGeneDistance': 3558,
   'nearestCodingGeneDistance': 3558}}}
'''

Can you try a similar approach in R and let me know if it works?

At the same time, I also want to let you know that if you need to process multiple rsIDs it is worth checking out our variant dataset, downloadable from EBI’s FTP at: Index of /pub/databases/opentargets/genetics/latest/variant-index/

This dataset is the one feeding our API, so you can expect having the same information.

Thank you for your question!
Irene

Gpathak · 17 June 2022 15:02

Thank you so much Irene, this works. I didn’t understand the use of variantInfo
I used the following for another query and it doesn’t work again. How do I find the top level schema for such queries.

query variantInfo {
	  indexVariantsAndStudiesForTagVariant(variantId: "1_46810098_T_C") {
	associations {
    
     study {
      studyId
      traitReported
      traitCategory
      pmid
      pubAuthor
      pubDate
    }
    indexVariant {
      id
      rsId
      refAllele
        altAllele
        nearestGeneDistance
        nearestGene {
          symbol
        }
    }
 pval
      nTotal
      nCases
      overallR2
      afr1000GProp
      amr1000GProp
      eas1000GProp
      eur1000GProp
      sas1000GProp
      log10Abf
      posteriorProbability
      pvalMantissa
      pvalExponent
      oddsRatio
      oddsRatioCILower
      oddsRatioCIUpper
      beta
      direction
      betaCILower
      betaCIUpper
  }
  
}
	  }

dsuveges · 17 June 2022 21:49

Hi @Gpathak ,

You can explore the queries and the schema on the graphql palyground. See this example query on variantInfo.

On the graphql playground page you can click the docs on the upper-left corner and explore the available objects.

Dimitris_Zisis · 4 April 2023 15:22

I tried to run a the indexVariantsAndStudiesForTagVariant and it doesn’t work for my input (1_17500056_C_T) also for me. It works only for sample query in graphiQL API
How did you manage to make it work in R ?
I need to create a query with variantID and get the studyID in order to use them as input in another query for genePrioritisationUsingL2G.

dsuveges · 14 April 2023 13:48

The query works. Your variant might return empty array though. If you visit the relevant page on OT genetics, you’ll find the variant of interest is not a tag or lead variant of any peak.

You can wrap this query in an R similarly to other requests you have been working on.

Dimitris_Zisis · 20 April 2023 17:40

Thank you for your reply. Yes i am trying queries with different variants related to rsIDs and all return empty arrays. For example if i have a variant like this 1_17500056_C_T , 1_2909753_G_A and i want to check for ensemble GeneID or rsID related to this variant what kind of query i have to use?
In general i want to use as input rsIDs and get the maximum annotation for this from the query, so i was thinking to use search like i did for rsIDs and then use the information from search for each rsID (variantID or nearest gene with ensemblID ) for other kind of queries. Is there any query or combination of queries in which i can go from rsID to annotation information like enhancers or promoters ?
Thank you in advance for your help

irene · 2 May 2023 10:14

Hi @Dimitris_Zisis,

in your case, if your starting point is a list of rsIDs, I would do 2 queries:

Use the search endpoint to find the variant ID associated with your rsID. Something like:

`
                query searchTerm {
                    search(queryString: $rsId){
                      variants {
                        id
                      }
                    }
                  }

Use the variantInfo endpoint to get the variant related information you need. For example, for closest gene to a variant:

query Variant {
	variantInfo(variantId:$variantId) {
  	id
        rsId
        nearestGene
  }
}

Alternatively, if your list of rsIDs is large, I’d suggest you operate with our bulk datasets; the approach would be to use only the variant index dataset.

Best,
Irene

Topic		Replies	Views
Variant to Gene Query for multiple variants in OT Genetics GraphQL API genetics-portal	4	537	20 August 2024
R script for GraphQL query: query multiple rsID and get gene IDs and Annotation information GraphQL API	17	548	18 May 2023
Sample R script to query GraphQL API GraphQL API	5	820	11 August 2023
PheWAS / GWAS lead variants & Tag variants query for multiple snps GraphQL API genetics-portal	1	602	31 August 2022
Query to retrieve studyID from variants not working GraphQL API genetics-portal	1	412	11 May 2023

Downloading variant information

Related topics