Has OpenTargets significantly changed the number of high associations (e.g. overall association > 0.6) in the last 3 years?

evad · 15 September 2022 11:12

Has your algorithm for calculating overall association scores between targets and diseases changed a lot over the last 3 years or so?

I have some old data pulled down and when I compare it to the scores on the website today, they are very different? Do you have any stats on overall in the DB, how much has changed/by how much?

hcornu · 15 September 2022 16:30

Hi @evad, and welcome to the Open Targets Community!

The algorithm to calculate the overall association scores between targets and diseases has not changed. To my knowledge, we have always used a harmonic sum to perform this calculation.

However, there are a couple of reasons why the score may have changed between releases.

New data was ingested. This is particularly relevant for associations which did not previously have much evidence to support them. For associations with a lot of evidence, additional evidence is unlikely to make a difference to the score.
We may also have removed outdated sources of evidence.
The data scoring was modified. We may revise the weight or scoring of the evidence itself. For example, data from Open Targets Genetics used to be scored based on the p-value of the GWAS association. This was updated when we introduced the Locus-to-Gene model, which is now the basis for Open Targets Genetics evidence scores in the Platform.

We don’t track how much the scores change between releases.

It’s worth noting that the association score reflects the amount of evidence available for a particular association, rather than the confidence we have in this association.

I hope this helps!

Helena

ochoa · 16 September 2022 08:35

There are some subtle changes to the algorithm that happened to account for bugfixes.

Other fundamental change with the release of the rewritten application was the way evidence was propagated in the ontology. In the past evidence used to be duplicated from node to node to ensure that any individual piece of evidence was present in all ancestor EFO terms. That caused that indirect associations were most often driven by terms that were high in the ontology (e.g. neoplasm). With the launch of the rewrite platform in 2021, the indirect associations for a node are based on the evidence for the node and all the evidence in their descendants without any duplication of the individual pieces of evidence. This might have an overall effect on scores.

Apart from these details, the scoring remains conceptually the same, but the evidence data has changed significantly within the last 3 years.

Some clues on differences in data and scores between recent releases can be found here as developed by @irene @dsuveges and @Kirill_Tsukanov:
https://opentargets-ot-release-metrics-app-9fkuxk.streamlitapp.com
Keep in mind this is for internal use and it’s not documented.

Topic		Replies	Views
How are associations scores calculated in the Open Targets Platform? Platform FAQs	0	482	16 June 2023
Differences in overall association scoring between target-disease and disease-target General	3	362	28 September 2022
Why are there some inconsistencies in evidence count? Technical Support	3	266	7 September 2023
The number of evidences from "open-targets-prod.platform.evidence" does not match "evidenceCount" Google BigQuery/Cloud	6	280	11 August 2023
Score values from Disease->Target vs. Target->Disease Frequently Asked Questions	3	543	25 June 2021

Has OpenTargets significantly changed the number of high associations (e.g. overall association > 0.6) in the last 3 years?

Related topics