Why does Open Targets Genetics display partial L2G scores on Locus pages when there is no evidence of colocalization?

JeremyS · 3 November 2021 12:08

The Open Targets Genetics “Locus” page shows which genes are prioritised by our Locus-to-Gene (L2G) scoring model. In this table, you can see the L2G score from the full model as well as “Partial L2G scores” from models that were built using only one category of predictors, such as distance only, colocalisation only, or QTL colocalisation only. These partial L2G scores can be used to assess how strong the evidence is from each of these categories for a given gene.

One of our users recently noticed that the partial L2G scores for “QTL colocalisation” are present even for studies without summary statistics. How can this be?

In this screenshot of the Locus page for the locus around 11_61830500_A_G (rs1535) for LDL cholesterol in Willer CJ (2013), the gene prioritisation using the L2G pipeline displays partial L2G scores for Variant Pathogenicity, Distance, QTL coloc, and Chromatin Interaction, even though the prioritised genes don’t all have evidence of colocalisation.

For studies without summary statistics, we use an alternative approximate colocalisation method. Briefly, we run the PICS method (Farh et al. 2015) to estimate the probability for each SNP to be causal at the study locus, and then use the CLPP method (Hormozdiari et al. 2016) to estimate colocalisation with QTLs. This information is used to generate input features for the L2G model, but these approximate colocalisations aren’t exposed anywhere else in the Genetics portal.

So, studies with summary statistics use colocalisation information from running the coloc method (Giambartolomei et al. 2014), while those without summary statistics use the alternative method.

The “Evidence of colocalisation” column will only show “Yes” for studies with summary statistics, and where the colocalisation probability is greater than 0.8 (PP.H4>0.8). However, smaller values of the colocalisation probability can still be used within the L2G model; the feature that includes coloc information for the L2G score uses continuous values of PP.H4/PP.H3, so it can incorporate values below 0.8 as partly predictive.

While it would be useful to be able to see which variants or data contributed to the predictions for L2G, such as variant pathogenicity and chromatin interaction, this isn’t currently possible with the way the model is implemented.

Topic		Replies	Views
Colocalisation status of GWAS study GCST012464 (Mullins et al.) Data issue genetics-portal	2	294	26 August 2022
Data source scores ot_genetics_portal vs L2G in final association score General genetics-portal	1	113	8 May 2024
Accessing L2G scores through API GraphQL API genetics-portal	4	777	17 June 2022
Can I load my sumstats or loci to perform Locus-to-gene annotation? GraphQL API genetics-portal	1	362	23 November 2022
How to interpret Variant-to-Gene (V2G) and Locus-to-Gene (L2G) scores in Open Targets Genetics Open Targets Genetics FAQs	0	1631	19 July 2021

Why does Open Targets Genetics display partial L2G scores on Locus pages when there is no evidence of colocalization?

Related topics