Dear Open Targets team,
I am currently working with the GWAS associations and L2G datasets from the Open Targets Platform and have a few questions regarding score aggregation.
-
How is the final GWAS association score calculated from multiple L2G predictions across different credible sets or GWAS studies?
-
How are GWAS studies annotated with multiple diseases/traits handled?
- For example, if one GWAS study is linked to several EFO terms, how are the GWAS association assigned to each disease?
-
I also noticed that the L2G score shown in the main table sometimes differs slightly from the score displayed in the detailed SHAP contribution widget (e.g., 0.919 vs 0.913). Could you clarify the reason for this difference?
Thank you for your help!
Hi,
Evidence scores for each datasources are aggregated to get a datasource specific association scores. We apply a normalised harmonic sum as a way for aggregation. You can found more details on the process in the documenation here.
We do this for every datasources, each datasource has its way of defining what is considered evidence scores. For GWAS credible set derived evidence, the score is the locus to gene score (l2g), as that value captures our confidence of link between the genetic signal and the gene (docs are here). In this context (however important they are), the strength of the association or the effect size doesn’t really matter.
Please let us know if you have further questions.
Hi,
Thanks for the quick response, really helpfully!
Best,
Mujisaka
It was a very quick, however a very incomplete reply! Sorry, I have missed the last two questions:
-
Sometimes the experimental design of the GWAS studies implies multiple assigned diseases eg. when the genome of multiple types of cancer patients are compared with healthy genomes. You can see this reflects in the study schema. Associated loci from these studies are exploded providing evidence for all assigned diseases.
-
We are looking into it. I believe those numbers should be the same.
Thanks. I have check for the source data, the number shown in shap widget matches those from Locus-to-Gene (L2G) prediction file (Open Targets Platform). Not sure if I made something wrong, but thanks for checking it!