GWAS lead variants via API

pietznerm · 7 January 2022 17:43

Hi OpenTargets Team,

I am a huge fan of your resource. I am currently working on a script to query GWAS variant associations for a set of genetic variants. I managed to do so for PheWAS associations. However, since most of those are likely due to LD-confounding I am most interest in the link between the queried variant and genuine lead signals at published GWAS loci. Basically the panel called " GWAS lead variants". Would be great to get some sample script using R.

Best,
Maik

ahercules · 11 January 2022 12:18

Hello @pietznerm!

Welcome to the Open Targets Community!

If you want to recreate the “GWAS Lead Variants” table for a number of variants, I would recommend that you use our v2d dataset.

What is the Open Targets Genetics `v2d` dataset?

The v2d dataset is a series of JSON lines files, where each individual JSON line entry includes data on the tag and lead variants for a given study.

{"study_id":"GCST90012110","lead_chrom":"1","lead_pos":1574655,"lead_ref":"GGC","lead_alt":"G","direction":"+","beta":0.00657455,"beta_ci_lower":0.00476262212,"beta_ci_upper":0.00838647788,"pval_mantissa":4.3,"pval_exponent":-13,"pval":4.3E-13,"pmid":"PMID:32042192","pub_date":"2020-02-10","pub_journal":"Nat Med","pub_title":"Using human genetics to understand the disease impacts of testosterone in men and women.","pub_author":"Ruth KS","trait_reported":"Sex hormone-binding globulin levels adjusted for BMI","ancestry_initial":["European=368929"],"ancestry_replication":[],"n_initial":368929,"num_assoc_loci":815,"has_sumstats":true,"source":"GCST","trait_efos":["EFO_0004696"],"trait_category":"measurement","tag_chrom":"1","tag_pos":1537493,"tag_ref":"T","tag_alt":"A","overall_r2":0.868737707721,"AFR_1000G_prop":0.0,"AMR_1000G_prop":0.0,"EAS_1000G_prop":0.0,"EUR_1000G_prop":1.0,"SAS_1000G_prop":0.0,"log10_ABF":19.755002754225544,"posterior_prob":0.016564937297166}

How do I query the `v2d` dataset?

To query the dataset and find the GWAS lead variant information for a given variant, you will need to write an R script that queries the dataset for the following fields: tag_chrom, tag_pos, tag_ref, and tag_alt.

For example, using variant 1_1313807_G_A, the R script would need to query for:

tag_chrom == 1
tag_pos == 1313807
tag_ref == G
tag_alt == A

The script would return 6 JSON objects, including:

{"study_id":"GCST007430","lead_chrom":"1","lead_pos":1384749,"lead_ref":"C","lead_alt":"G","direction":"-","beta":-0.0194,"beta_ci_lower":-0.025672,"beta_ci_upper":-0.013128,"pval_mantissa":1.98,"pval_exponent":-9,"pval":1.98E-9,"pmid":"PMID:30804560","pub_date":"2019-02-25","pub_journal":"Nat Genet","pub_title":"New genetic signals for lung function highlight pathways and chronic obstructive pulmonary disease associations across multiple ancestries.","pub_author":"Shrine N","trait_reported":"Peak expiratory flow","ancestry_initial":["European=321047"],"ancestry_replication":["European=24218"],"n_initial":321047,"n_replication":24218,"num_assoc_loci":265,"has_sumstats":true,"source":"GCST","trait_efos":["EFO_0009718"],"trait_category":"measurement","tag_chrom":"1","tag_pos":1313807,"tag_ref":"G","tag_alt":"A","overall_r2":0.757951324816,"AFR_1000G_prop":0.0,"AMR_1000G_prop":0.0,"EAS_1000G_prop":0.0,"EUR_1000G_prop":1.0,"SAS_1000G_prop":0.0}

Within each of the 6 JSON objects, you will see the published lead variant information in the lead_chrom, lead_pos, lead_ref, and lead_alt fields along with relevant study fields (e.g. study_id).

For example, using the above JSON object, the lead variant is 1_1384749_C_G and it was identified in the GWAS Catalog study GCST007430.

How can I access the `v2d` dataset?

The Open Targets Genetics v2d dataset is available in JSON format from our FTP server:

ftp.ebi.ac.uk/pub/databases/opentargets/genetics/latest/v2d/

Alternatively, you can also access the dataset in our BigQuery open-targets-genetics instance and use SQL to access the relevant fields:

Find GWAS lead variant, study, and reported trait information for a given tag variant with BigQuery

I hope this helps – feel free to post any follow up questions below!

Cheers,

~ Andrew

pietznerm · 12 January 2022 17:01

Thank you, Andrew!! This was really helpful and easy to implement. However, some overall_r2 entries do contain missing values and I was wondering, how the connection between the lead signal and the tag variant has been made in those cases.

Best,
Maik

ahercules · 18 February 2022 17:00

Hello @pietznerm!

My sincere apologies for not responding to your follow up question earlier.

Our Variant2Disease pipeline relies on different methods of expanding lead variants to tag variants, including LD expansion and fine mapping expansion. The FinnGen data that we have integrated also has its own method of lead variant to tag variant expansion.

For more information, please see our Variant2Disease pipeline documentation, which details the different methods we use.

alessandro.testori · 24 October 2024 07:59

Hello! I am trying to get lead variants and studies via API given a tag variant, but I am afraid I was only able to find the code to get tag variants given a lead variant, as in the example you provided at Open Targets Genetics:

query Variant {
tagVariantsAndStudiesForIndexVariant(variantId: “1_55058182_G_A”) {
associations {
tagVariant {
id
rsId
__typename
}
study {
studyId
traitReported
pmid
pubDate
pubAuthor
__typename
}
pval
nTotal
overallR2
posteriorProbability
__typename
}
__typename
}
}

Could you please provide a sample query to get lead variants and studies/traits, given a tag variant?
Thank you!

Xiangyu · 24 October 2024 10:07

Welcome to Open Targets Alessandro!

I’ve written below an example query for lead variants and studies/traits, given a tag variant of interest. Please let us know how it goes!

Best wishes,
Xiangyu Jack

query LeadVariantsForTagVariant {
  indexVariantsAndStudiesForTagVariant(variantId: "1_55058182_G_A") {
    associations {
      indexVariant {
        id
        rsId
      }
      study {
        studyId
        traitReported
        pmid
        pubDate
        pubAuthor
      }
      pval
      nTotal
      overallR2
      posteriorProbability
    }
  }
}

alessandro.testori · 24 October 2024 14:42

Thank you Xiangyu, it works!

Topic		Replies	Views
Accessing PCHi-C, DHS-promoter corr. etc. info via Open Targets Genetics GraphQL API GraphQL API genetics-portal	2	227	9 February 2023
Export GWAS data from Open Targets Genetics Data downloads genetics-portal	1	405	4 May 2023
API request for associations with gene OT Genetics Infrastructure genetics-portal	4	503	26 October 2022
PheWAS / GWAS lead variants & Tag variants query for multiple snps GraphQL API genetics-portal	1	603	31 August 2022
PheWAS Download GraphQL API	8	70	3 April 2025

GWAS lead variants via API

What is the Open Targets Genetics v2d dataset?

How do I query the v2d dataset?

How can I access the v2d dataset?

Related topics

What is the Open Targets Genetics `v2d` dataset?

How do I query the `v2d` dataset?

How can I access the `v2d` dataset?