Snp to gene query

Hello, open target genetics is using the function to map the gene for snp by doing a snp to gene query. However, maybe because it was renewed, the old code doesn’t seem to work. Below is a part of the query statement.

search_query ← "
query searchRsId($rsId: String!) {
search(queryString: $rsId) {
variants {
id
}
}
}
"

query_string ← "
query v2g($variantId: String!) {
genesForVariant(variantId: $variantId) {
gene {
symbol
id
}
variant
overallScore
distances {
sourceId
aggregatedScore
tissues {
distance
}
}
}
}"

I wonder if it’s not available now or if the code has been changed.
Thank you.

Hello @jsg201717 and wellcome to the Open Targets community.

Thank you for reporting the issue with the graphql api. Also let me answer your questions and guide you through the next release that will break the functionalities you mention.

Failing queries in OTG

We have experienced the downtime on the clickhouse database, which now should be fixed. Please try running again your queries. Below are some example that I have tested:

library(ghql)
library(jsonlite)
con <- ghql::GraphqlClient$new('https://api.genetics.opentargets.org/graphql')
qry <- ghql::Query$new()
qry$query(
    "search_query",
    "query searchRsId($rsId: String!) {
        search(queryString: $rsId) {
            variants {
                id
            }
        }
    }"
)
variables <- list(rsId = 'rs789012')

res <- con$exec(qry$queries$search_query, variables) |> jsonlite::fromJSON(res)
# $data
# $data$search
# $data$search$variants
#                id
# 1 12_28719649_G_A

qry2 <- ghql::Query$new()
qry2$query(
    "v2g",
    "query v2g($variantId: String!) {
        genesForVariant(variantId: $variantId) {
            gene {
                symbol
                id
            }
            variant
            overallScore
            distances {
                sourceId
                aggregatedScore
                tissues {
                    distance
                }
            }
        }
    }"
)

variables2 <- list(variantId = search_query$data$search$variants$id[1])
res2 <- con$exec(qry2$queries$v2g, variables2) |> jsonlite::fromJSON()
# $data
# $data$genesForVariant
#   gene.symbol         gene.id         variant overallScore
# 1        FAR2 ENSG00000064763 12_28719649_G_A  0.006639839
#                    distances
# 1 canonical_tss, 0.1, 429367

Querying variants in OTP

As you might know, the Open Targets Genetics portal will be deprecated in few weeks.

It’s replacement (merged Genetics to the Open Targets Platform) now links credible sets instead of variants to the affected gene, which effectively means that we no longer maintain the v2g dataset. This is due to few main reasons:

  • The majority of the summary statistics and population database variation (ex. GnomAD) do not bring enough information to be influential on the given phenotype, hence we remove the irrelevant variants to speed up the computations and decrease cost.
  • We stick only to the subset of variants that are most probable to be causing given phenotype within the full range of variants in summary statistics. This is achieved with the introduction of fine-mapping. See fine-mapping documentation

To query particular variant in Open Targets Platform (mimic functionality from search_query you will have to look now into the mapIds query:

query rsIDMapping {
  mapIds (queryTerms:"rs7412" entityNames:["variant"]) {
    mappings {
      hits {
        id
      }
    }
  }
  }

Which results in following output

{
  "data": {
    "mapIds": {
      "mappings": [
        {
          "hits": [
            {
              "id": "19_44908822_C_T"
            }
          ]
        }
      ]
    }
  }
}

!! This can be non exhausting solution as !!:

  • Platform variant dataset now only contains the information about the variants linked to the disease/molecular phenotypes only ` - see documentation. This will effectively mean that many variants that were found in the OTG (variants found within summary statistics without high influence on the trait of interest) will not be in the OTP variant dataset.
  • The platform search was not designed for the use case of mapping arbitrary rsIDs to the variantIds. It will work well in case the variant is present in any of the variant datasources described in the documentation. In case of indels the reported variantId will not be relevant as we use in-house identifiers with the inserted/deleted sequence hash.

To sum up there are at least two better ways to make sure your rsIDs are mapped correctly

Mapping using Ensembl REST api

To map the rsIDs to the variants, we encourage users to look over the Ensembl REST API as possible source for for mappings using API calls. Below we provide an example how to map rsIDs using the API:

#' Fetch Variant Information from Ensembl by rsID
#'
#' This function retrieves variant information from the Ensembl REST API using a list of rsIDs.
#' @param rsIds A character vector of rsIDs to fetch variant information for.
#' @param endpoint The URL of the Ensembl REST API endpoint. Default is set to the human variation endpoint.
#' @return A data frame containing variant information including start, end, allele strings, ancestral alleles, location, assembly name, sequence region name, and strand.
#' @examples
#' \dontrun{
#' rsIds <- c("rs789012", "rs74733149")
#' variants <- ensembl_fetch_variant_from_rsid(rsIds)
#' variants
#' A tibble: 2 × 10
#'   seq_region_name strand assembly_name      end ancestral_allele location             coord_system allele_string    start rsid
#'   <chr>            <int> <chr>            <int> <chr>            <chr>                <chr>        <chr>            <int> <chr>
#' 1 12                   1 GRCh38        28719649 G                12:28719649-28719649 chromosome   G/A           28719649 rs789012
#' 2 1                    1 GRCh38        65609903 A                1:65609903-65609903  chromosome   A/G           65609903 rs74733149
#' }
#' @export
ensembl_fetch_variant <- function(
  rsids,
  endpoint = "http://rest.ensembl.org/variation/homo_sapiens"
) {
  cli::cli_alert_info(sprintf("Fetching rsIds from %s", endpoint))
  httr::POST(
    endpoint,
    httr::content_type("application/json"),
    httr::accept("application/json"),
    body = jsonlite::toJSON(list(ids = rsids))
  ) |>
    httr::stop_for_status() |>
    httr::content() |>
    jsonlite::toJSON() |>
    jsonlite::fromJSON() |>
    ensembl_parse_response()
}


#' Parse API Response into a Data Frame
#'
#' Extracts the `mapping` element from each object in a response list and
#' converts the result into a data frame with an added column for the `rsIds`.
#'
#' @param resp A named list where each element is expected to contain a `$mapping` component.
#'
#' @return A `data.frame` with one row per item in `resp`, including the extracted
#' `mapping` fields and an `rsIds` column containing the names of the original list.
#'
#' @examples
#' \dontrun{
#' resp <- list(
#'   rs123 = list(mapping = list(seq_region_name = 1, allele_string = 'G/C/T')),
#' )
#' parse_response(resp)
#' }
ensembl_parse_response <- function(resp) {
  cli::cli_alert_info("Parsing response from Ensembl API")
  if (length(resp) == 0) {
    cli::cli_alert_danger("No variants found for the provided rsIDs.")
    return(data.frame())
  }
  purrr::map(resp, function(obj) as.data.frame(obj$mapping)) |>
    dplyr::bind_rows() |>
    tidyr::unnest(cols = dplyr::everything()) |>
    dplyr::mutate(
      rsid = names(resp)
    )
}

res <- ensembl_fetch_variant(c("rs7412"))
print(res)

This results in

ℹ Fetching rsIds from http://rest.ensembl.org/variation/homo_sapiens
ℹ Parsing response from Ensembl API
> print(res)
# A tibble: 1 × 10
  coord_system      end assembly_name    start seq_region_name strand
  <chr>           <int> <chr>            <int> <chr>            <int>
1 chromosome   44908822 GRCh38        44908822 19                   1
# ℹ 4 more variables: ancestral_allele <chr>, location <chr>,
#   allele_string <chr>, rsid <chr>

You can find whole code in the repo

Querying gene-variant genetic associations

As for the second query, we strongly encourage users to move from v2g to the l2g

query QTLCredibleSetsQuery {
  variant(variantId: "16_57054953_C_T") {
    id
    qtlCredibleSets: credibleSets(
      studyTypes: [gwas]
    ) {
      rows {
        l2GPredictions {
          rows {
            score
            target {
              id
              approvedSymbol
            }
          }
        }
      }
    }
  }
}

The above query will list possible Locus2Gene hits linking credible sets containing the queried variant to the targetId.

The result of the query is

{
  "data": {
    "variant": {
      "id": "16_57054953_C_T",
      "qtlCredibleSets": {
        "rows": [
          {
            "l2GPredictions": {
              "rows": [
                {
                  "score": 0.11556569731866531,
                  "target": {
                    "id": "ENSG00000140853",
                    "approvedSymbol": "NLRC5"
                  }
                },
                {
                  "score": 0.07723600997802375,
                  "target": {
                    "id": "ENSG00000140848",
                    "approvedSymbol": "CPNE2"
                  }
                }
              ]
            }
          }
        ]
      }
    }
  }
}

I hope this answer your questions, please take a look at the OTP graphql api examples. In case you have more questions, feel free to post them.

2 Likes

Thank you for your response !