Has OpenTargets significantly changed the number of high associations (e.g. overall association > 0.6) in the last 3 years?

Has your algorithm for calculating overall association scores between targets and diseases changed a lot over the last 3 years or so?

I have some old data pulled down and when I compare it to the scores on the website today, they are very different? Do you have any stats on overall in the DB, how much has changed/by how much?

The algorithm to calculate the overall association scores between targets and diseases has not changed. To my knowledge, we have always used a harmonic sum to perform this calculation.

However, there are a couple of reasons why the score may have changed between releases.

  1. New data was ingested. This is particularly relevant for associations which did not previously have much evidence to support them. For associations with a lot of evidence, additional evidence is unlikely to make a difference to the score.
    We may also have removed outdated sources of evidence.

  2. The data scoring was modified. We may revise the weight or scoring of the evidence itself. For example, data from Open Targets Genetics used to be scored based on the p-value of the GWAS association. This was updated when we introduced the Locus-to-Gene model, which is now the basis for Open Targets Genetics evidence scores in the Platform.

We don’t track how much the scores change between releases.

It’s worth noting that the association score reflects the amount of evidence available for a particular association, rather than the confidence we have in this association.

There are some subtle changes to the algorithm that happened to account for bugfixes.

Other fundamental change with the release of the rewritten application was the way evidence was propagated in the ontology. In the past evidence used to be duplicated from node to node to ensure that any individual piece of evidence was present in all ancestor EFO terms. That caused that indirect associations were most often driven by terms that were high in the ontology (e.g. neoplasm). With the launch of the rewrite platform in 2021, the indirect associations for a node are based on the evidence for the node and all the evidence in their descendants without any duplication of the individual pieces of evidence. This might have an overall effect on scores.

Apart from these details, the scoring remains conceptually the same, but the evidence data has changed significantly within the last 3 years.

Some clues on differences in data and scores between recent releases can be found here as developed by @irene @dsuveges and @Kirill_Tsukanov:
