Proposal: Adding genome-wide genetic correlation matrix to OpenTargets?

Background

The OpenTargets Platform currently provides powerful tools for understanding locus-level relationships between studies. These identify shared causal variants at specific genomic loci. However, there is currently no study-level metric capturing global genetic correlation (rg) - which would be quite useful across many contexts.

The Opportunity

OpenTargets has backend access to GWAS summary statistics for studies based on hasSumstats = true in the study table. While these summary statistics aren’t publicly hosted on the Platform, they could be used internally to compute a genetic correlation matrix using LD Score Regression (LDSC) from Bulik-Sullivan et al., Nat Genet 2015.

Why OpenTargets is Uniquely Positioned for This

  1. Systematic Harmonization: The Platform already harmonizes GWAS summary statistics across diverse sources (GWAS Catalog, FinnGen, UK Biobank, etc.) into a unified schema with consistent disease mappings (diseaseIds → EFO/MONDO) and quality control (sumstatQCValues). This harmonization infrastructure eliminates the primary barrier preventing researchers from computing genetic correlations at scale.

  2. Scalable Compute Infrastructure: Computing pairwise genetic correlations across hundreds of GWAS studies requires significant computational resources and optimized workflows. OpenTargets’ backend infrastructure can efficiently perform LD Score Regression on all studies with hasSumstats = true in a single batch computation, something individual research groups struggle to do.

  3. Complex Data Summarization: The Platform excels at distilling complex genetic evidence into actionable insights (L2G scores, colocalisation probabilities, credible sets). A genetic correlation matrix continues this philosophy - transforming millions of SNP-level associations into interpretable study-level relationships that inform drug discovery decisions.

Community Question

Would this be a valuable addition to the Platform? Happy to discuss implementation details, use cases, or schema design refinements!

1 Like

Dear @shastvx1
We are exploring the possibility of running the LDSC on the summary statistics, although I can not guarantee yet when it will happen.

With kind regards,
Szymon Szyszkowski

Thanks for your reply @Szymon_Szyszkowski!