Dump of harmonised GWAS sumstats


I noticed in the BigQuery “studies” table there are 50719 studies, of which 8316 have sumstats available. I assume all these studies have been passed though the sumstats-harmonizer. Is the harmonized output available for all variants from each GWAS study? Seems like the BQ database only contains the subset relevant to providing results to the portal, which is completely understandable.



1 Like

Hi Vince,

Thank you for your query. Yes, that is correct, all GWAS results integrated into the Genetics Portal go through some bits of the harmonisation pipeline (depending on whether they fall into the summary statistics or lead variants subtypes).
We also perform a number of filtering steps to the associations prior to integration (e.g. in Open Targets Genetics we include GWAS Catalog curated associations with p≤5e−8) and therefore - as you have pointed out - the BQ database does not contain harmonised outputs for all variants from each GWAS study.

You can find more info on our documentation pages

I hope this helps! Feel free to reach out if you have any further questions on this.