Updating API code query from OTG to OpenTargets Platform

This is something of a follow-up to an old (and resolved) question I asked on here over a year ago: List of commands and options available in OTG GraphQL for running PheWAS

Essentially, I had been using the OTG API from an R session for searching for variants I had in a GWAS, obtaining information about those variants and then conducting an OTG PheWAS.

I have already figured out that I needed to update the old code to now have the new API location:

base_url = "https://api.platform.opentargets.org/api/v4/graphql"

However, here is how I had previously been conducting the API queries for my SNPs:

id_query_string = "query useSearchToConvertRSIDIntoIDFormat($query_rsID: String!) {
								search(queryString: $query_rsID) {
									variant {
										id
										rsId
										nearestGene {
											id
											start
											symbol
											tss
											description
											chromosome
											exons
										}
										nearestGeneDistance
									}
								}
							}"

### Set variables object of arguments to be passed to endpoint
id_variables = list("query_rsID" = query_rsID)

### Construct POST request body object with query string and variables
id_search_body = list(query = id_query_string, variables = id_variables)

### Perform OpenTargets search request
id_search_out = POST(url = base_url, body = id_search_body, encode = "json")

Most specifically I had been searching the platform by rsID (query_rsID was the string for the rsID, e.g. “rs9273078”). However, when I look at content(id_search_out)this now contains the following error:

[1] "Cannot query field 'variant' on type 'SearchResults'. (line 3, column 10):\n\t\t\t\t\t\t\t\t\tvariant {\n         ^"

It seems to me that the command “useSearchToConvertRSIDIntoIDFormat” must be the problem now, but I can’t see what is the way to fix this in the OpenTargets playground. Any help resolving this would be greatly appreciated!

Hi,

The recent rewrite and integration of genetics data with Platform necessitated to re-design all API endpoints, so changing any tools built on top of the old API needs to be changed as well. Also, due to the massive increase of the size of genetics data, currently, we cannot support pheWAS queries as we used to on the old Genetics Portal. However, you can retrieve all credible sets, with the variant of your interest.

Variant metadata, such as rsId, is separated from association data, which means GWAS credible sets cannot be queried directly by rsIDs. Instead, you can use rsIDs to retrieve variant IDs [1], then variant ids can be used to resolve credible sets [2] with single point statistics.

I hope this helps. Please let me know if you need further help.

[1] Finding variant identifiers for rsIds:

query SearchQuery($queryString: String!, $index: Int!, $entityNames: [String!]!) {
  search(
    queryString: $queryString
    entityNames: $entityNames
    page: {index: $index, size: 10}
  ) {
    total
    hits {
      id
      highlights
      object {
        ... on Variant {
          id
          variantDescription
          referenceAllele
          alternateAllele
          rsIds
          __typename
        }
      }
    }
  }
}

Variables:

{
    "queryString": "rs12354",
    "index": 0,
    "entityNames": ["Variant"]
}

It is very import to note that this query also returns targets overlapping with the variants, so you’ll need to filter objects with ”Variant” as their typename. Also one rsId might return more than one variants.

[2] Finding credible sets based on variant identifier:

query GWASCredibleSetsQuery($variantId: String!, $size: Int!, $index: Int!) {
  variant(variantId: $variantId) {
    id
    referenceAllele
    alternateAllele
    gwasCredibleSets: credibleSets(studyTypes: [gwas], page: { size: $size, index: $index }) {
      count
      rows {
        studyLocusId
        pValueMantissa
        pValueExponent
        beta
        finemappingMethod
        confidence
        variant {
          id
          chromosome
          position
          referenceAllele
          alternateAllele
        }
        study {
          traitFromSource
          id
          diseases {
            name
            id
          }
        }
        locus(variantIds: [$variantId]) {
          rows {
            posteriorProbability
            pValueExponent
            pValueMantissa
            beta
            is95CredibleSet
            is99CredibleSet
            variant {
              id
              rsIds
            }
          }
        }
      }
    }
  }
}

Variables:

{
  "variantId": "12_6390939_T_G",
  "size": 500,
  "index": 0
}

Depending on how busy is the locus, the returned number of credible sets might be high and you might need to paginate over the results. You might need more information on either the variant or the credible sets or the study, I highly recommend to explore and optimise your queries on our API playground that provides examples, documentations on the available fields and you can try the queries immediately.

2 Likes

Hi Daniel,

Thanks for gettiing back to me about this, unfortunately this isn’t really super clear to me in terms of my usage of the API here.

As I mention that I am doing this via R, I have to save the search query as a string (so I can perform it recurvisely) and then perform the API search with this code:

### Set variables object of arguments to be passed to endpoint
id_variables = list("query_rsID" = query_rsID)

### Construct POST request body object with query string and variables
id_search_body = list(query = id_query_string, variables = id_variables)

### Perform OpenTargets search request
id_search_out = POST(url = base_url, body = id_search_body, encode = "json")

As far as I understand, this remains completely unchanged. But what’s not clear to me is what string to use here. I have tried now to use this command as you provided it:

id_query_string = "query SearchQuery($queryString: String!, $index: Int!, $entityNames: [String!]!) {
								search(
								queryString: $queryString
								entityNames: $entityNames
								page: {index: $index, size: 10}
								) {
									total
									hits {
										id
										highlights
										object {
										... on Variant {
											id
											variantDescription
											referenceAllele
											alternateAllele
											rsIds
											__typename
										}
										}
									}
									}
								}"

However, obviously that doesn’t work.

As I am also always looking for variants and I’m not sure what the '“index” is searching for, I tried to do the following as well:

id_query_string = "query SearchQuery($queryString: String!, $index: Int!, $entityNames: [String!]!) {
								search(
								queryString: $query_rsID
								entityNames: ['String']
								page: {index: 0, size: 10}
								) {
									total
									hits {
										id
										highlights
										object {
										... on Variant {
											id
											variantDescription
											referenceAllele
											alternateAllele
											rsIds
											__typename
										}
										}
									}
									}
								}"

Essentially, you provided a set of variables in an piece of code like this:

{
    "queryString": "rs12354",
    "index": 0,
    "entityNames": ["Variant"]
}

But to be blunt, I’m not sure how to actually apply that to what I was doing in R. I actually don’t know where you are supposed to add these in even when using the API playground as well (which I also could not get to work).

I’m not sure how to actually apply that to what I was doing in R. I actually don’t know where you are supposed to add these in even when using the API playground as well (which I also could not get to work).

So in the old script you had a list of rsIDs: `id_variables = list(“query_rsID” = query_rsID)` Now you cannot retrieve data for a list of rsIDs, you need to iterate over all rsIDs in your list and map each of them to variant identifier.

For each rsId, you need to submit a query and parse the result like this:

library(httr)
library(jsonlite)

# Your GraphQL query
query <- "
    query SearchQuery($queryString: String!, $index: Int!, $entityNames: [String!]!) {
    search(
        queryString: $queryString
        entityNames: $entityNames
        page: {index: $index, size: 10}
    ) {
        total
        hits {
        id
        highlights
        object {
            ... on Variant {
            id
            variantDescription
            referenceAllele
            alternateAllele
            rsIds
            __typename
            }
        }
        }
    }
    }
"


# Variables as a named list -> for each rsIds!
variables <- list(
  queryString = "rs12354", 
  index = 0, 
  entityNames = c("Variant")
)
# Prepare the request body
body <- list(
  query = query,
  variables = variables
)

# Make the POST request
response <- POST(
  url = "https://api.platform.opentargets.org/api/v4/graphql",
  body = toJSON(body, auto_unbox = TRUE),
  add_headers("Content-Type" = "application/json"),
  encode = "raw"
)

# Parse the response
result <- content(response, "parsed")

Does it make more sense?

1 Like

Hi Daniel,

Okay, so conveniently my lack of understanding of the previous version of the API has worked in my favour here, because I was never providing a list of rsIDs, I was only ever searching for one rsID at a time and iterating that search over my list of variants. So in fact, the code doesn’t need to change a lot.

Providing this code has worked almost perfectly! I did make the change of:

id_search_out = POST(url = base_url, body = id_search_body, encode = "json")

To your version:

response <- POST(
  url = "https://api.platform.opentargets.org/api/v4/graphql",
  body = toJSON(body, auto_unbox = TRUE),
  add_headers("Content-Type" = "application/json"),
  encode = "raw"
)

And honestly I don’t really get what the differences are here, but it worked so I won’t question that! :joy:

Where I do have a couple of follow-up questions though is in the output. Specifically I’ve just did a test with one SNP that isn’t in OpenTargets (rs9273078) and one that is (rs1064173). With the SNP that is there, as you previously said it provided me with a list of SNPs which are sort of “close” to that query? This is a tad confusing to me, is there really no option with the API to JUST get the results for that variant?

Yes, you are right, search returns non-exact matches as well. In your case the queried rs1064173is not on the Platform but rs10641736 is, the different is only one character at the end. This means, when you are processing the output, you need to check if the match is exact. I’ll check if it is possible to narrow down the returned data to exact matches only. I’ll get back top you.

1 Like

Oh! I hadn’t even realised that the example SNP I gave wasn’t actually on the platform either and it was ONLY closely related SNPs.

Okay, I’ve made an adjustment so that I perform the API search to get all the different IDs and then extract the correct SNP ID from the result.

This is, however, only halfway to the actual results I needed here. The reason I was performing this search in the first place is that for each variant ID I find I previously performed the two following two API searches based on the variant ID I extracted:

v2g_query_string = "query v2g($variantId: String!) {
									genesForVariant(variantId: $variantId) {
										gene {
											id
											symbol
										}
										variant
										overallScore
										functionalPredictions {
											typeId
											sourceId
											aggregatedScore
										}
										intervals {
											typeId
											sourceId
											aggregatedScore
										}
									}
									variantInfo(variantId: $variantId) {
										mostSevereConsequence
									}
								}"

And:

v2d_query_string = "query pheWASsearch($variantId: String!) {
								pheWAS(variantId: $variantId) {
									associations{
										studyId
										eaf
										beta
										pval
										nTotal
										nCases
										study {
											traitReported
											traitCategory
											pmid
										}
										oddsRatio
									}
								}
							}"

Essentially I was, for each SNP, extracting the V2G scores from the Platform in one search. And in a second search I was extracting a list of known associations with that SNP.

I suspect both of these will need a complete re-write to extract the data I’m hoping to find.

You mentioned, however, that PheWAS queries are no longer supported? Is that really the case for individual SNPs? I wouldn’t be able to extract the associations with a given SNP ID from the API?

Or did you mean that one couldn’t perform a PheWAS with a large list of SNPs?

Also, thank you for the assistance so-far @dsuveges, this has been pretty confusing for me because of unfamiliarity with this language.

Okay, I believe I’m getting closer to figuring aspects of this out. For the V2G thing I was using previously I note that this is now deprecated and you use a “Locus-2-Gene” instead. This is what I’ve currently set up.

query QTLCredibleSetsQuery {
  variant(variantId: "16_57054953_C_T") {
    id
    transcriptConsequences{
      distanceFromFootprint
      target {
        id
        approvedSymbol
      }   
    }
    mostSevereConsequence {
      label
    }
    qtlCredibleSets: credibleSets(
      studyTypes: [gwas]
    ) {
      rows {
        l2GPredictions {
          rows {
            score
            target {
              id
              approvedSymbol
            }
          }
        }
      }
    }
  }
}

There were four pieces of information I really needed here for each variant:

  1. More severe consequence of the variant
  2. V2G gene
  3. V2G score
  4. Closest gene

I believe this accounts nicely for 1-3 but for 4 I couldn’t see an option for that, and it seems like the only option is to get a list of every transcript near that variant and then pick the one with the lowest distanceFromFootprint (which feels like a rather awkward way to do this, I have to say).

I still haven’t even begun to approach the PheWAS though, so would appreciate it some feedback on that. I believe it was have to be another version of the credibleSets query?

hi,

the only option is to get a list of every transcript near that variant and then pick the one with the lowest distanceFromFootprint (which feels like a rather awkward way to do this, I have to say).

You are right, this is the way to go. The argument behind this design is, that the locus to gene prediction is a sophisticated method that links GWAS signals to genes based on a number of underlying evidence, that’s why there’s a dedicated endpoint, which not only returns the L2G predictions but also the normalised features that were considered by the model. On the other hand, the “closest gene” is a simple feature of the variant among many others, that’s why we keep distance information as a variant metadata in the variant index. If you need to retrieve variant, association, study metadata in bulk, I would suggest to use the downloadable files. It might be more straightforward to join different pieces of data together.

I suspect both of these will need a complete re-write to extract the data I’m hoping to find.

I’m afraid so: those API endpoint doesn’t exist anymore and the datamodel has changed completely.

You mentioned, however, that PheWAS queries are no longer supported? Is that really the case for individual SNPs? I wouldn’t be able to extract the associations with a given SNP ID from the API?

You can extract single point statistics for variants, only if the variant is part of a credible set! This is a very serious limitation compared to the previous version of the OTG. You were able to get association statistics from ALL summary statistics, if the strength of the association reached 1e-4, regardless if the variant was part of any credible sets at all. In the current case it would mean we would need to provide all variants from 100s of thousands of summary statistics + the metadata of all these variants (we are talking about many terabytes more data to cater, most of which no one would ever retrieve).

There’s a way around this however: you can access all our summary statistics on Google Cloud, load the entire dataset and filter for your variant of interest, but you need to pay for the compute and/or moving out data from Google infrastructure.

Best,

Daniel

Thanks again @dsuveges, good to have confirmation of these things.

Can I confirm that the code I have sent above would work as another API request via R. It worked in the playground, but I did mention previously having issues translating between the two.

Specifically I have:

v2g_query_string = "query QTLCredibleSetsQuery {
									variant(variantId: $variantId) {
										id
										transcriptConsequences {
											distanceFromFootprint
											target {
												id
												approvedSymbol
											}   
										}
										mostSevereConsequence {
											label
										}
										qtlCredibleSets: credibleSets(studyTypes: [gwas]) {
											rows {
											l2GPredictions {
												rows {
													score
												target {
													id
													approvedSymbol
												}
											}
										}
									}
								}"

But does it also need to have the search ID $variantId named in the first line in some way?

In terms of the “PheWAS” I actually think that credible sets may be even better for my purposes. As much as having all associations with a P < 1e-4 might be useful, it’s also a lot of potentially unneccesary data.

However, once more, it would be amazing if you could help crafting the API command? Here is what I have currently:

query QTLCredibleSetsQuery {
  variant(variantId: "16_57054953_C_T") {
   credibleSets(studyTypes: [gwas]) {
    rows{
      study{
        nSamples
        nCases
        nControls
        id
      }
      beta
      pValueExponent
      studyLocusId
      studyId
    }
  }
}

However, rather importantly, this is lacking missing the traitReported and traitCategory (actually also eaf and oddsRatio) classes which were previously present in the “PheWAS” options. Do you know what they have been replaced with?

Hi @dsuveges. I was wondering if you would possibly be able to have a look at the recent code I sent you for extracting the credible sets as we had discussed relatively recently.

I was just wondering if you could possibly help out with getting the variable names I was trying to find (or equivalents)?

Hi again,

I actually think I need further help with my L2G query as well because the code I shared which I thought worked:

v2g_query_string = "query QTLCredibleSetsQuery {
									variant(variantId: $variantId) {
										id
										transcriptConsequences {
											distanceFromFootprint
											target {
												id
												approvedSymbol
											}   
										}
										mostSevereConsequence {
											label
										}
										qtlCredibleSets: credibleSets(studyTypes: [gwas]) {
											rows {
											l2GPredictions {
												rows {
													score
												target {
													id
													approvedSymbol
												}
											}
										}
									}
								}"

Does not work and throws an error:

[1] "Syntax error while parsing GraphQL query. Invalid input \"]) {\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\trows {\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\tl2GPredictions {\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\trows {\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\tscore\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\ttarget {\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\tid\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\tapprovedSymbol\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t}\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t}\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t}\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t}\\n\\t\\t\\t\\t\\t\\t\\t\\t}\", expected NameChar, Value, Argument or Directives (line 14, column 58):\n\t\t\t\t\t\t\t\t\t\tqtlCredibleSets: credibleSets(studyTypes: [gwas]) {\n

This seems to be an error in the qtlCredibleSets: credibleSets(studyTypes: [gwas]) line which is difficult because that is also the line I understand the least and least know how to fix.

Edit: Also, I feel I should mention that the Playground seems to be very broken in a few ways, making it extremely difficult to test this stuff

Hi @Sabor117,

We are aware of the issues with the playground, they will be fixed in the next release! It’s incredibly frustrating to work with at the moment, I agree :smiley:

I had a quick look at the docs and I think the error is coming from the fact that ``qtlCredibleSets`` is not a query option.

Does this give you the answer you are looking for?

query {
  variant(variantId: "4_1804392_G_A") {
    id
    transcriptConsequences {
      distanceFromFootprint
      target {
        id
        approvedSymbol       
      }
    }
    mostSevereConsequence {
      label
    }
    credibleSets(studyTypes: [gwas]) {
      rows {
        l2GPredictions {
          rows {
            score
            target {
              id
              approvedSymbol
            }
          }
        }
      }
    }
  }
}
1 Like

@hcornu ah! Yes, this seems to now actually not be throwing any errors in terms of the query at least! Thanks very much! I have no idea why I thought that would work, looking back on it…

However, I have been constructing the query like this:

v2g_query_string = "query {
									variant(variantId: $variantId) {
										id
									    transcriptConsequences {
									    	distanceFromFootprint
									    	target {
									        	id
									        	approvedSymbol       
									    	}
									    }
									    mostSevereConsequence {
									    	label
									    }
									    credibleSets(studyTypes: [gwas]) {
									    	rows {
									    		l2GPredictions {
									    			rows {
									    				score
									    				target {
									    					id
									    					approvedSymbol
									    				}
													}
												}
											}
										}
									}
								}"
					
			### Set variables object of arguments to be passed to V2G
			### Where variantID == a string in format "chr_pos_a1_a2"

			v2g_variables = list("variantId" = variantID)

			### Construct POST request body object with query string and variables

			v2g_post_body = list(query = v2g_query_string, variables = v2g_variables) 

And it seems I am passing the variables (the query SNP ID) to the API request incorrectly (I am getting an error that $variantId is not defined. Do you know how I can fix this?

When working with variables in GraphQL, you need to:

  1. Replace the static value in the query with $variableName

  2. Declare $variableName as one of the variables accepted by the query

  3. Pass variableName: value in the separate, transport-specific (usually JSON) variables dictionary

see Queries | GraphQL

query($variantId: String!) {
  variant(variantId: $variantId) {
    id
    transcriptConsequences {
      distanceFromFootprint
      target {
        id
        approvedSymbol
      }
    }
    mostSevereConsequence {
      label
    }
    credibleSets(studyTypes: [gwas]) {
      rows {
        l2GPredictions {
          rows {
            score
            target {
              id
              approvedSymbol
            }
          }
        }
      }
    }
  }
}

1 Like

All of our front-end widgets are constructed with GraphQL queries, and you can see the queries used to build them by clicking on the API Query button.

This is a good starting place to figure out how to build your queries.

1 Like

Ahh, I knew it would be something like this! @dsuveges did demonstrate how I should be doing a similar search (adding in variables I mean) but, I wasn’t sure if needed an additional command after the query part.

I will admit, I hadn’t considered looking up the actual GraphQL docs to solve this. Thanks again for the help! I was just now able to test this and extract what I wanted for the L2G analysis.

I’m really sorry to keep bothering you as well, but if possible could you also have a look at the query/code I shared regarding the PheWAS/credible sets thing I was asking about?

Specifically, I think I have constructed the query correctly, but there are certain pieces of data I would just need to know where to find them and couldn’t.

Again, I really appreciate both of your guys help with all of this.

Hi @hcornu and @dsuveges I just wanted to bump this thread as I still never really figured out the specific GraphQL commands required for an OTG “PheWAS” and so haven’t been able to utilise the API as fully as I would like. Apologies for tagging you both directly.

I just wanted to check if this was something you think you might be able to help me with again?

Daniel can correct me, but I don’t think we support a PheWAS analysis, so there isn’t a straightforward way to do this with the API.

Daniel had said previously that the new API did not include a “PheWAS” in the previous sense, but one could get all “credible sets” above a certain threshold with the new API?

What I was missing from that specifically was that I couldn’t see a way to define study IDs, trait categories and stuff like that which were in the previous PheWAS options.