Differences in "L2G share" as computed from v8 (Oct 2022) and v25.12?

Hello,

We’re interested in computing “L2G share” scores as defined in Minikel et al 2024 using L2G values in the v25.12 release of Open Targets. As per their publication, they used v8 (Oct 2022), with the final share scores available here: genetic_support/data/assoc.tsv.gz at main · ericminikel/genetic_support · GitHub . For our case, we use the GraphQL API (credibleSet –> l2GPredictions –> rows) to retrieve these values and retain everything else the same (as much as I could replicate!)

Overall, there is consensus but we notice some key systematic differences. Out of the ~9.3k genes we have in common across 250 traits, we observe a Spearman rho of 0.74 in the L2G share values and an ~80% overlap between genes that pass a ≥0.5 threshold (though there is a similar percentage of genes that pass this threshold in total: 42% in v25.12 vs 39% in v8). Importantly, the discordance at the 0.5 threshold is asymmetric: >3x more gene–study pairs pass in v25.12 but fail in v8 (~1k) than vice versa (300), suggesting the v25.12 model assigns systematically higher shares to genes at the same loci. More broadly, do you have an intuition as to what change in the model between versions could explain the moderate correlation (ρ = 0.74) on identical gene x study pairs and the difference in distribution of L2G shares?

Thanks, happy to clarify any questions!

Gene prioritisation by L2G is a continuous effort that depends on the availability of high-confidence GWAS and functional genomics data. As more data becomes available, predictions are expected to evolve. The Open Targets Platform recalculates L2G scores with every quarterly release, incorporating the latest GWAS studies (which tend to be better powered), updated LD references, colocalising QTL studies, and effector-gene gold standards used for training. Check the last blog post by @irene for more context.

When Open Targets Genetics was merged into the Open Targets Platform, the entire pipeline was rebuilt from scratch using the latest available data. Several steps changed materially: SuSiE is now used for fine-mapping in preference to COJO conditional analysis wherever summary statistics are available; colocalisation is now based on credible sets; and the L2G model was retrained with a revised feature set that improves performance across available benchmarks. You can read more about the rationale behind this migration in the Open Targets blog.

Given this context, it is not surprising to find differences in L2G results between these two time points. The 3× asymmetry you observe (more genes gaining than losing at the ≥0.5 threshold) is consistent with what you’d expect from higher-resolution fine-mapping. SuSiE can resolve independent signals within the same locus that were previously collapsed together. Where evidence was previously attributed to a single gene, it may now be distributed across multiple independent signals — each prioritised to its own gene. This redistribution can alter scores for several genes around a locus simultaneously, driving the upward shift. That said, this is one of several possible explanations — changes to the feature set and training data likely also contribute.

If you have a specific locus or gene example in mind, we’d be happy to dig into it in more detail. Please feel free to ask any follow-up questions!

1 Like

Thanks for your detailed response @ochoa! This is indeed very helpful.