Older versions of OpenTargets data

sanjayb100 · 1 September 2024 14:16

A question regarding archived versions of OpenTargets data. The oldest version of OpenTargets data that I can see is from July 2019:
https://ftp.ebi.ac.uk/pub/databases/opentargets/platform/
I wondered if there were older versions of OpenTargets data available elsewhere or on request, and if so, what is the oldest available collated data?

Many thanks,

Sanjay

dsuveges · 2 September 2024 09:06

Hi Sanjay, Under the ftp address you have pasted, there is a complete collection of OpenTargets Platform data releases. The earliest releases from 16.04. However, keep in mind that over the years the data model changed significantly, which can make a systemic comparison relatively complicated.

Best,
Daniel

ochoa · 2 September 2024 09:18

@sanjayb100 you might want to look at how @eczech and colleagues performed a temporal analysis on the Open Targets data.

eczech · 2 September 2024 16:59

you might want to look at how @eczech and colleagues performed a temporal analysis on the Open Targets data.

It wasn’t in that paper, but we have built some consolidated datasets across OT versions going all the way back to 16.04 (i.e. from 2016). We weren’t merging schemas for raw evidence across years extensively, so we were able to eschew most of the problems @dsuveges mentioned. We created a simple, merged view like this: ot_version, gene_id, diesase_id, datasource_id, score. Here is an example of the schemas/data we were merging:

! curl http://ftp.ebi.ac.uk/pub/databases/opentargets/platform/16.04/16.04_association_data.json.gz \
| gzip -dc | head -n 1000 > /tmp/16.04_association_data.sample.json
(
    spark.read.json("/tmp/16.04_association_data.sample.json")
    .select(
        F.col("target.id").alias("gene_id"),
        F.col("disease.id").alias("disease_id"),
        F.col("association_score.datasources.*")
    )
    .transform(lambda df: (
        df.select(
            "gene_id", "disease_id", 
            F.array(*[
                F.struct(F.lit(c).alias("datasource_id"), F.col(c).alias("score"))
                for c in df.columns if c not in {"gene_id", "disease_id"}
            ]).alias("scores")
        )
    ))
    .select("*", F.explode("scores").alias("score"))
    .select("gene_id", "disease_id", "score.datasource_id", "score.score")
    .printSchema()
)
root
 |-- gene_id: string (nullable = true)
 |-- disease_id: string (nullable = true)
 |-- datasource_id: string (nullable = false)
 |-- score: double (nullable = true)

Here are a few other things to watch out for in doing this:

Older datasets have an is_direct flag in the data to delineate between direct associations and those attained through EFO ancestors. Where that is true, you get the equivalent of the more recent “direct” datasets at Open Targets Downloads. Where that is either true or false, you get the equivalent of the “indirect” datasets in the more recent downloads. That might not be obvious at first, so it’s worth knowing if you’re trying to work across versions.
Some of the folder structures change over time. Starting with version 18.12, data starts appearing in output directories so you’ll need a switch for directory layout like that.
I believe older datasets included associations with a score of 0 while newer ones don’t, so you’ll likely want to filter out 0 scores or somehow deal with that inconsistency.

sanjayb100 · 10 September 2024 17:15

Thanks all! I hadn’t seen that preprint, I’ll be reading this in detail! Thanks also for the pointers on some of the practical challenges in this approach

Topic		Replies	Views
Accessing older versions of score tables in OT Platform Data Access ot-platform	3	206	14 December 2022
25.03 Platform release now live! Open Targets Genetics data update Releases genetics-portal , ot-platform	2	172	28 March 2025
24.03 Platform release now live! Releases ot-platform	2	368	25 March 2024
Previous version of database Data Access	1	23	4 October 2024
Accessing Open Targets Genetics data following deprecation? (finding diseases associated with a SNP) Community Feedback	3	20	1 August 2025

Older versions of OpenTargets data

Related topics