23.02 Platform release now live!

We have just released the latest update to the Open Targets Platform — 23.02.

Key highlights for this release:

This release integrates 14,611,717 evidence strings to build 6,960,486 target-disease associations between 22,274 diseases or phenotypes and 62,678 targets from the following 22 public resources:

Additionally, the Platform now allows users to explore data on 12,854 drugs or compounds.

For more details, read the 23.02 blog post.

The evidence from Europe PMC seems to be half that from the previous release, in spite of adding patents. What caused this change? It would be useful to put a percentage change for evidence count in each of these release notes.

Hi @Pankaj_Agarwal!

The drop in evidence from EuropePMC is due to a known bug in the pipeline. Because of this, we are not processing all the publications that we should be processing. We are actively working with Europe PMC to resolve this.

However, the drop in associations is less drastic, which suggests that we are not losing crucial evidence.

Thank you for your feedback about including percentage changes. However, we provide these metrics to give a sense of the amount of data and its distribution, but we do not want to place too much emphasis on the numbers, since we value the quality of the data over its quantity. The amount of evidence will fluctuate over time based on our data sources or the way we process the data, particularly with a data source like this one. In fact, we are currently introducing some changes to the pipeline that will cause the number of evidences to drop.

Out of curiosity, would you be willing to share how you use these metrics? Thank you!

Thanks, @hcornu. Can you provide a little bit more detail in terms of the number of publications not being processed? Should we continue to use the previous release for the epmc data until this bug is fixed, or a union of the two releases?

I agree with your comment about quality, not quantity, but I have found that quantity is important for QC reasons. I use it to ensure that when I am postprocessing the data something has not changed to make me lose significant amounts.

I noticed that one of the INPUT files for evidence in the 23.02 release is BZIP2 zipped (evidence-files/atlas.json.bz2) while all other files are GZIPPED, but the platform-etl-bakcend scripts and reference.conf do not specify that it is BZIP2, so the pipeline fails to process this file since it is trying to load it as text files since all evidence files are just “globbed up” I think…