Understanding GWAS association generation and score calculation in the Platform

Dear Open Targets team,

We have been benchmarking the GWAS-derived genetic associations in the Open Targets vs the ones we can obtain in GWAS catalog directly. Based on that we have some specific questions:

1.How are GWAS-derived associations generated in open targets? Is the data reprocessed?

During our review of associations for IL13 and other genes we encountered several cases where the Platform reports a GWAS-based genetic association, but the referenced GWAS Catalog study does not contain any variant directly associated with the gene. We would like to understand the intended data flow here.

Two examples for IL13:

  1. Actinic keratosis: The association is supported by studies GCST90480468 (30 reported associations) and GCST90476204 (66 associations). Neither study lists any association with IL13 or nearby genes such as TH2LCRR, IL4, IL5, RAD50, or KIF3A. Also in the open targets page for these studies I cannot find IL13 neither in the GWAS credible sets part.
  2. Abnormal thrombosis: Similarly, the referenced GWAS studies (GCST90044359, GCST90044364) do not appear to report any associations with IL13 in GWAS catalog. In this two cases, I identify the IL13 in the L2G score.

2. Clarifying the genetics score calculation:

The Platform documentation describes the genetics datasource score as a normalised harmonic sum of L2G scores across credible sets, which we find clearly explained. However, we noticed a numerical discrepancy that we cannot account for with that formula alone.

For the IL13 – adult-onset asthma association, a single credible set is present with L2G = 0.384, yet the Platform reports a genetics score of 0.233.

Thank you for building and maintaining such an open and well-documented resource. Also thank you in advance for your support in those specific questions.

Best regards,

Bruna

Hi @Bruna_almirall, welcome to the Open Targets Community, and thank you for your kind words about the Platform! We’re glad it’s useful :tada:

To answer your questions:

1.How are GWAS-derived associations generated in open targets? Is the data reprocessed?

In the GWAS Catalog associations page you will find the reported leads from the study authors, and I believe the GWAS Catalog uses the nearest gene for mapping.

When we ingest studies into the Open Targets Platform, we reprocess the data from the GWAS Catalog. Where summary statistics are available, we perform harmonisation, fine-mapping, colocalisation, and Locus-to-Gene analysis.

The Locus-to-Gene (L2G) analysis, our in-house pipeline, uses 28 features to rank the most likely causal genes for a credible set and might prioritise genes that were not listed in the original study. Any prioritised gene with an L2G score > 0.05 is included in the Platform as evidence for the target-disease association.

However, if you look at the credible sets for the studies you mentioned, you’ll see that other genes are ranked higher by the L2G algorithm, and are therefore more likely to be the causal gene than IL13, e.g. the credible set for actinic keratosis from GCST90480468

The L2G widget shows you which features contributed most to the L2G score, and you can explore the contribution of each feature in more detail:

In general, we would recommend exploring the supporting evidence for the associations, and seeing whether there is any other corroborating evidence.

You can find out more about our processing of the data in the documentation e.g. Study documentation, credible set documentation, GWAS and functional genomics section.

2. Clarifying the genetics score calculation:

Data source scores are indeed calculated using a harmonic sum, but the harmonic sum is then normalised by dividing the result by the maximum theoretical harmonic sum, which approximates to 1.644.

0.384/1.644 ~ 0.233

I hope that helps! Let us know if you need more information.

Thank you very much for the clarification and your quick reply. That helps a lot.