Background
The OpenTargets Platform currently provides powerful tools for understanding locus-level relationships between studies. These identify shared causal variants at specific genomic loci. However, there is currently no study-level metric capturing global genetic correlation (rg) - which would be quite useful across many contexts.
The Opportunity
OpenTargets has backend access to GWAS summary statistics for studies based on hasSumstats = true in the study table. While these summary statistics aren’t publicly hosted on the Platform, they could be used internally to compute a genetic correlation matrix using LD Score Regression (LDSC) from Bulik-Sullivan et al., Nat Genet 2015.
Why OpenTargets is Uniquely Positioned for This
-
Systematic Harmonization: The Platform already harmonizes GWAS summary statistics across diverse sources (GWAS Catalog, FinnGen, UK Biobank, etc.) into a unified schema with consistent disease mappings (
diseaseIds→ EFO/MONDO) and quality control (sumstatQCValues). This harmonization infrastructure eliminates the primary barrier preventing researchers from computing genetic correlations at scale. -
Scalable Compute Infrastructure: Computing pairwise genetic correlations across hundreds of GWAS studies requires significant computational resources and optimized workflows. OpenTargets’ backend infrastructure can efficiently perform LD Score Regression on all studies with
hasSumstats = truein a single batch computation, something individual research groups struggle to do. -
Complex Data Summarization: The Platform excels at distilling complex genetic evidence into actionable insights (L2G scores, colocalisation probabilities, credible sets). A genetic correlation matrix continues this philosophy - transforming millions of SNP-level associations into interpretable study-level relationships that inform drug discovery decisions.
Community Question
Would this be a valuable addition to the Platform? Happy to discuss implementation details, use cases, or schema design refinements!