Why does the Open Targets Genetics portal display partial L2G scores on Locus pages when there is no evidence of colocalization?

The Open Targets Genetics “Locus” page shows which genes are prioritised by our Locus-to-Gene (L2G) scoring model. In this table, you can see the L2G score from the full model as well as “Partial L2G scores” from models that were built using only one category of predictors, such as distance only, colocalisation only, or QTL colocalisation only. These partial L2G scores can be used to assess how strong the evidence is from each of these categories for a given gene.

One of our users recently noticed that the partial L2G scores for “QTL colocalisation” are present even for studies without summary statistics. How can this be?


In this screenshot of the Locus page for the locus around 11_61830500_A_G (rs1535) for LDL cholesterol in Willer CJ (2013), the gene prioritisation using the L2G pipeline displays partial L2G scores for Variant Pathogenicity, Distance, QTL coloc, and Chromatin Interaction, even though the prioritised genes don’t all have evidence of colocalisation.

For studies without summary statistics, we use an alternative approximate colocalisation method. Briefly, we run the PICS method (Farh et al. 2015) to estimate the probability for each SNP to be causal at the study locus, and then use the CLPP method (Hormozdiari et al. 2016) to estimate colocalisation with QTLs. This information is used to generate input features for the L2G model, but these approximate colocalisations aren’t exposed anywhere else in the Genetics portal.

So, studies with summary statistics use colocalisation information from running the coloc method (Giambartolomei et al. 2014), while those without summary statistics use the alternative method.

The “Evidence of colocalisation” column will only show “Yes” for studies with summary statistics, and where the colocalisation probability is greater than 0.8 (PP.H4>0.8). However, smaller values of the colocalisation probability can still be used within the L2G model; the feature that includes coloc information for the L2G score uses continuous values of PP.H4/PP.H3, so it can incorporate values below 0.8 as partly predictive.

While it would be useful to be able to see which variants or data contributed to the predictions for L2G, such as variant pathogenicity and chromatin interaction, this isn’t currently possible with the way the model is implemented.