Hello @jsg201717 and wellcome to the Open Targets community.
Thank you for reporting the issue with the graphql api. Also let me answer your questions and guide you through the next release that will break the functionalities you mention.
Failing queries in OTG
We have experienced the downtime on the clickhouse database, which now should be fixed. Please try running again your queries. Below are some example that I have tested:
library(ghql)
library(jsonlite)
con <- ghql::GraphqlClient$new('https://api.genetics.opentargets.org/graphql')
qry <- ghql::Query$new()
qry$query(
"search_query",
"query searchRsId($rsId: String!) {
search(queryString: $rsId) {
variants {
id
}
}
}"
)
variables <- list(rsId = 'rs789012')
res <- con$exec(qry$queries$search_query, variables) |> jsonlite::fromJSON(res)
# $data
# $data$search
# $data$search$variants
# id
# 1 12_28719649_G_A
qry2 <- ghql::Query$new()
qry2$query(
"v2g",
"query v2g($variantId: String!) {
genesForVariant(variantId: $variantId) {
gene {
symbol
id
}
variant
overallScore
distances {
sourceId
aggregatedScore
tissues {
distance
}
}
}
}"
)
variables2 <- list(variantId = search_query$data$search$variants$id[1])
res2 <- con$exec(qry2$queries$v2g, variables2) |> jsonlite::fromJSON()
# $data
# $data$genesForVariant
# gene.symbol gene.id variant overallScore
# 1 FAR2 ENSG00000064763 12_28719649_G_A 0.006639839
# distances
# 1 canonical_tss, 0.1, 429367
Querying variants in OTP
As you might know, the Open Targets Genetics portal will be deprecated in few weeks.
It’s replacement (merged Genetics to the Open Targets Platform) now links credible sets instead of variants to the affected gene, which effectively means that we no longer maintain the v2g
dataset. This is due to few main reasons:
- The majority of the summary statistics and population database variation (ex. GnomAD) do not bring enough information to be influential on the given phenotype, hence we remove the irrelevant variants to speed up the computations and decrease cost.
- We stick only to the subset of variants that are most probable to be causing given phenotype within the full range of variants in summary statistics. This is achieved with the introduction of fine-mapping. See fine-mapping documentation
To query particular variant in Open Targets Platform (mimic functionality from search_query
you will have to look now into the mapIds
query:
query rsIDMapping {
mapIds (queryTerms:"rs7412" entityNames:["variant"]) {
mappings {
hits {
id
}
}
}
}
Which results in following output
{
"data": {
"mapIds": {
"mappings": [
{
"hits": [
{
"id": "19_44908822_C_T"
}
]
}
]
}
}
}
!! This can be non exhausting solution as !!:
- Platform
variant
dataset now only contains the information about the variants linked to the disease/molecular phenotypes only ` - see documentation. This will effectively mean that many variants that were found in the OTG (variants found within summary statistics without high influence on the trait of interest) will not be in the OTP variant dataset.
- The platform search was not designed for the use case of mapping arbitrary rsIDs to the variantIds. It will work well in case the variant is present in any of the variant datasources described in the documentation. In case of indels the reported variantId will not be relevant as we use in-house identifiers with the inserted/deleted sequence hash.
To sum up there are at least two better ways to make sure your rsIDs are mapped correctly
Mapping using Ensembl REST api
To map the rsIDs to the variants, we encourage users to look over the Ensembl REST API as possible source for for mappings using API calls. Below we provide an example how to map rsIDs using the API:
#' Fetch Variant Information from Ensembl by rsID
#'
#' This function retrieves variant information from the Ensembl REST API using a list of rsIDs.
#' @param rsIds A character vector of rsIDs to fetch variant information for.
#' @param endpoint The URL of the Ensembl REST API endpoint. Default is set to the human variation endpoint.
#' @return A data frame containing variant information including start, end, allele strings, ancestral alleles, location, assembly name, sequence region name, and strand.
#' @examples
#' \dontrun{
#' rsIds <- c("rs789012", "rs74733149")
#' variants <- ensembl_fetch_variant_from_rsid(rsIds)
#' variants
#' A tibble: 2 × 10
#' seq_region_name strand assembly_name end ancestral_allele location coord_system allele_string start rsid
#' <chr> <int> <chr> <int> <chr> <chr> <chr> <chr> <int> <chr>
#' 1 12 1 GRCh38 28719649 G 12:28719649-28719649 chromosome G/A 28719649 rs789012
#' 2 1 1 GRCh38 65609903 A 1:65609903-65609903 chromosome A/G 65609903 rs74733149
#' }
#' @export
ensembl_fetch_variant <- function(
rsids,
endpoint = "http://rest.ensembl.org/variation/homo_sapiens"
) {
cli::cli_alert_info(sprintf("Fetching rsIds from %s", endpoint))
httr::POST(
endpoint,
httr::content_type("application/json"),
httr::accept("application/json"),
body = jsonlite::toJSON(list(ids = rsids))
) |>
httr::stop_for_status() |>
httr::content() |>
jsonlite::toJSON() |>
jsonlite::fromJSON() |>
ensembl_parse_response()
}
#' Parse API Response into a Data Frame
#'
#' Extracts the `mapping` element from each object in a response list and
#' converts the result into a data frame with an added column for the `rsIds`.
#'
#' @param resp A named list where each element is expected to contain a `$mapping` component.
#'
#' @return A `data.frame` with one row per item in `resp`, including the extracted
#' `mapping` fields and an `rsIds` column containing the names of the original list.
#'
#' @examples
#' \dontrun{
#' resp <- list(
#' rs123 = list(mapping = list(seq_region_name = 1, allele_string = 'G/C/T')),
#' )
#' parse_response(resp)
#' }
ensembl_parse_response <- function(resp) {
cli::cli_alert_info("Parsing response from Ensembl API")
if (length(resp) == 0) {
cli::cli_alert_danger("No variants found for the provided rsIDs.")
return(data.frame())
}
purrr::map(resp, function(obj) as.data.frame(obj$mapping)) |>
dplyr::bind_rows() |>
tidyr::unnest(cols = dplyr::everything()) |>
dplyr::mutate(
rsid = names(resp)
)
}
res <- ensembl_fetch_variant(c("rs7412"))
print(res)
This results in
ℹ Fetching rsIds from http://rest.ensembl.org/variation/homo_sapiens
ℹ Parsing response from Ensembl API
> print(res)
# A tibble: 1 × 10
coord_system end assembly_name start seq_region_name strand
<chr> <int> <chr> <int> <chr> <int>
1 chromosome 44908822 GRCh38 44908822 19 1
# ℹ 4 more variables: ancestral_allele <chr>, location <chr>,
# allele_string <chr>, rsid <chr>
You can find whole code in the repo
Querying gene-variant genetic associations
As for the second query, we strongly encourage users to move from v2g
to the l2g
query QTLCredibleSetsQuery {
variant(variantId: "16_57054953_C_T") {
id
qtlCredibleSets: credibleSets(
studyTypes: [gwas]
) {
rows {
l2GPredictions {
rows {
score
target {
id
approvedSymbol
}
}
}
}
}
}
}
The above query will list possible Locus2Gene hits linking credible sets containing the queried variant to the targetId.
The result of the query is
{
"data": {
"variant": {
"id": "16_57054953_C_T",
"qtlCredibleSets": {
"rows": [
{
"l2GPredictions": {
"rows": [
{
"score": 0.11556569731866531,
"target": {
"id": "ENSG00000140853",
"approvedSymbol": "NLRC5"
}
},
{
"score": 0.07723600997802375,
"target": {
"id": "ENSG00000140848",
"approvedSymbol": "CPNE2"
}
}
]
}
}
]
}
}
}
}
I hope this answer your questions, please take a look at the OTP graphql api examples. In case you have more questions, feel free to post them.