Downloading variant information

Hi!
I used to be able run the following in R to download details on variants. It doesn’t work with the new API link.

I also cannot find the following in the schema.

qry <- Query$new()
   # using this
   id <- "rs12345" # variant ID
   
   
   #====================
   # build query, e.g.:
   qry$query('my_query', paste0('{search(queryString:', "\"",id, "\"", ') {
     
    variants {
      id
      rsId
      chromosome
      position
      refAllele
      altAllele
      nearestGeneDistance
      nearestCodingGeneDistance
 
}
  
    
  }}'))
   
   ##===================
   # run query and format results:
   res1 <- fromJSON(cli$exec(qry$queries$my_query), flatten = TRUE)$data$search$variants

Hi @Gpathak and wellcome to the Community!

There have not been any changes to the endpoints you are describing. I am not familiar with the process of inserting a subquery inside a query, and I wonder if the problem is coming from the R library that is handling this type of query.

You can obtain the same result if you make 2 consecutive queries. This is a snippet of how I would do the process in Python:

import requests

url = 'https://api.genetics.opentargets.org/graphql'

search_query  = """
	query searchRsId($rsId: String!) {
		search(queryString: $rsId) {
	    variants {
	      id
	    }
	  }
	}
"""

variant_query = """
	query variantInfo($variantId: String!) {
	  variantInfo(variantId: $variantId) {
		id
	    rsId
	    chromosome
	    position
	    refAllele
	    altAllele
	    nearestGeneDistance
	    nearestCodingGeneDistance
	  }
	}
"""

variables = {'rsId': 'rs10469840'} # Your rsID of interest

search_result = requests.post(url, json={'query': search_query, 'variables':variables}).json()
# This returns: {'data': {'search': {'variants': [{'id': '2_102476784_T_C'}]}}}

variables.update({'variantId': search_result['data']['search']['variants'][0]['id']})

variant_result = requests.post(url, json={'query': variant_query, 'variables':variables}).json()
'''
This returns: {'data': {'variantInfo': {'id': '2_102476784_T_C',
   'rsId': 'rs10469840',
   'chromosome': '2',
   'position': 102476784,
   'refAllele': 'T',
   'altAllele': 'C',
   'nearestGeneDistance': 3558,
   'nearestCodingGeneDistance': 3558}}}
''' 

Can you try a similar approach in R and let me know if it works?

At the same time, I also want to let you know that if you need to process multiple rsIDs it is worth checking out our variant dataset, downloadable from EBI’s FTP at: Index of /pub/databases/opentargets/genetics/latest/variant-index/

This dataset is the one feeding our API, so you can expect having the same information.

Thank you for your question!
Irene

2 Likes

Thank you so much Irene, this works. I didn’t understand the use of variantInfo
I used the following for another query and it doesn’t work again. How do I find the top level schema for such queries.

query variantInfo {
	  indexVariantsAndStudiesForTagVariant(variantId: "1_46810098_T_C") {
	associations {
    
     study {
      studyId
      traitReported
      traitCategory
      pmid
      pubAuthor
      pubDate
    }
    indexVariant {
      id
      rsId
      refAllele
        altAllele
        nearestGeneDistance
        nearestGene {
          symbol
        }
    }
 pval
      nTotal
      nCases
      overallR2
      afr1000GProp
      amr1000GProp
      eas1000GProp
      eur1000GProp
      sas1000GProp
      log10Abf
      posteriorProbability
      pvalMantissa
      pvalExponent
      oddsRatio
      oddsRatioCILower
      oddsRatioCIUpper
      beta
      direction
      betaCILower
      betaCIUpper
  }
  
}
	  }

Hi @Gpathak ,

You can explore the queries and the schema on the graphql palyground. See this example query on variantInfo.

On the graphql playground page you can click the docs on the upper-left corner and explore the available objects.

I tried to run a the indexVariantsAndStudiesForTagVariant and it doesn’t work for my input (1_17500056_C_T) also for me. It works only for sample query in graphiQL API
How did you manage to make it work in R ?
I need to create a query with variantID and get the studyID in order to use them as input in another query for genePrioritisationUsingL2G.

1 Like

The query works. Your variant might return empty array though. If you visit the relevant page on OT genetics, you’ll find the variant of interest is not a tag or lead variant of any peak.

You can wrap this query in an R similarly to other requests you have been working on.

2 Likes

Thank you for your reply. Yes i am trying queries with different variants related to rsIDs and all return empty arrays. For example if i have a variant like this 1_17500056_C_T , 1_2909753_G_A and i want to check for ensemble GeneID or rsID related to this variant what kind of query i have to use?
In general i want to use as input rsIDs and get the maximum annotation for this from the query, so i was thinking to use search like i did for rsIDs and then use the information from search for each rsID (variantID or nearest gene with ensemblID ) for other kind of queries. Is there any query or combination of queries in which i can go from rsID to annotation information like enhancers or promoters ?
Thank you in advance for your help

Hi @Dimitris_Zisis,

in your case, if your starting point is a list of rsIDs, I would do 2 queries:

  1. Use the search endpoint to find the variant ID associated with your rsID. Something like:
`
                query searchTerm {
                    search(queryString: $rsId){
                      variants {
                        id
                      }
                    }
                  }
  1. Use the variantInfo endpoint to get the variant related information you need. For example, for closest gene to a variant:
query Variant {
	variantInfo(variantId:$variantId) {
  	id
        rsId
        nearestGene
  }
}

Alternatively, if your list of rsIDs is large, I’d suggest you operate with our bulk datasets; the approach would be to use only the variant index dataset.

Best,
Irene