Hello @pietznerm!
Welcome to the Open Targets Community!
If you want to recreate the “GWAS Lead Variants” table for a number of variants, I would recommend that you use our v2d
dataset.
What is the Open Targets Genetics v2d
dataset?
The v2d
dataset is a series of JSON lines files, where each individual JSON line entry includes data on the tag and lead variants for a given study.
{"study_id":"GCST90012110","lead_chrom":"1","lead_pos":1574655,"lead_ref":"GGC","lead_alt":"G","direction":"+","beta":0.00657455,"beta_ci_lower":0.00476262212,"beta_ci_upper":0.00838647788,"pval_mantissa":4.3,"pval_exponent":-13,"pval":4.3E-13,"pmid":"PMID:32042192","pub_date":"2020-02-10","pub_journal":"Nat Med","pub_title":"Using human genetics to understand the disease impacts of testosterone in men and women.","pub_author":"Ruth KS","trait_reported":"Sex hormone-binding globulin levels adjusted for BMI","ancestry_initial":["European=368929"],"ancestry_replication":[],"n_initial":368929,"num_assoc_loci":815,"has_sumstats":true,"source":"GCST","trait_efos":["EFO_0004696"],"trait_category":"measurement","tag_chrom":"1","tag_pos":1537493,"tag_ref":"T","tag_alt":"A","overall_r2":0.868737707721,"AFR_1000G_prop":0.0,"AMR_1000G_prop":0.0,"EAS_1000G_prop":0.0,"EUR_1000G_prop":1.0,"SAS_1000G_prop":0.0,"log10_ABF":19.755002754225544,"posterior_prob":0.016564937297166}
How do I query the v2d
dataset?
To query the dataset and find the GWAS lead variant information for a given variant, you will need to write an R script that queries the dataset for the following fields: tag_chrom
, tag_pos
, tag_ref
, and tag_alt
.
For example, using variant 1_1313807_G_A, the R script would need to query for:
tag_chrom
== 1
tag_pos
== 1313807
tag_ref
== G
tag_alt
== A
The script would return 6 JSON objects, including:
{"study_id":"GCST007430","lead_chrom":"1","lead_pos":1384749,"lead_ref":"C","lead_alt":"G","direction":"-","beta":-0.0194,"beta_ci_lower":-0.025672,"beta_ci_upper":-0.013128,"pval_mantissa":1.98,"pval_exponent":-9,"pval":1.98E-9,"pmid":"PMID:30804560","pub_date":"2019-02-25","pub_journal":"Nat Genet","pub_title":"New genetic signals for lung function highlight pathways and chronic obstructive pulmonary disease associations across multiple ancestries.","pub_author":"Shrine N","trait_reported":"Peak expiratory flow","ancestry_initial":["European=321047"],"ancestry_replication":["European=24218"],"n_initial":321047,"n_replication":24218,"num_assoc_loci":265,"has_sumstats":true,"source":"GCST","trait_efos":["EFO_0009718"],"trait_category":"measurement","tag_chrom":"1","tag_pos":1313807,"tag_ref":"G","tag_alt":"A","overall_r2":0.757951324816,"AFR_1000G_prop":0.0,"AMR_1000G_prop":0.0,"EAS_1000G_prop":0.0,"EUR_1000G_prop":1.0,"SAS_1000G_prop":0.0}
Within each of the 6 JSON objects, you will see the published lead variant information in the lead_chrom
, lead_pos
, lead_ref
, and lead_alt
fields along with relevant study fields (e.g. study_id
).
For example, using the above JSON object, the lead variant is 1_1384749_C_G
and it was identified in the GWAS Catalog study GCST007430.
How can I access the v2d
dataset?
The Open Targets Genetics v2d
dataset is available in JSON format from our FTP server:
ftp.ebi.ac.uk/pub/databases/opentargets/genetics/latest/v2d/
Alternatively, you can also access the dataset in our BigQuery open-targets-genetics instance and use SQL to access the relevant fields:
Find GWAS lead variant, study, and reported trait information for a given tag variant with BigQuery
I hope this helps – feel free to post any follow up questions below!
Cheers,
~ Andrew