Help understanding the data contained in the Platform data downloads

koryclick · 9 September 2024 15:46

Hello OT team!
In the same line of “clarifying scores of evidence”. But I write it here because it is defined “Evidence Count”.
Basically I extracted the “score” and the “evidenceCount” from “diseaseId”=EFO_0000565 & “targetId”=ENSG00000157764 (leukemia and BRAF).
The information comes from "Associations - direct (overall score) " and "Associations - direct (by data source) " json files.
And the evidenceCount of both json files is the same for this association (4) .
the results is just one dictionary for each json file.

“associationByDatasourceDirect”:
“score”: 0.16253705352810702,
“datasourceId”: “chembl”,
“evidenceCount”: 4,
“diseaseId”: “EFO_0000565”,

associationByOverallDirect_view
“score”: 0.09881128059278485,
“targetId”: “ENSG00000157764”,
“diseaseId”: “EFO_0000565”,
“evidenceCount”: 4

In which part of the documentation do they say how is calculated the evidenceCount? I understand is the ontology evidence ? Should I assume evidenceCount means that the ontological association between “TargetID” and “DiseaseID” appears in ONLY 1 datasource from the total of 23 (in OT_24.06), which is on the order 4 ? or the string appears 4 times in the ontology association calculation algorithm (that I guess is somewhere related with the data_processing?)

Each unique target-disease pair in the Open Targets Platform is defined as an association . For example, while there might be several pieces of evidence referring to CFTR and Cystic fibrosis from multiple sources, one single association contextualises all this information within the Platform. [1]

Just to be sure I understand the numbers behind and when to use them.

Many thanks again!

UPDATE:

I Think I found part of my answer in this piece of text.
" Target - disease evidence

Every event or set of events pinpointing a target as a potential causal gene or protein for a disease, represents the unit of information, most often referred as evidence. Within Open Targets, a series of pipelines ensure information is retrieved from their sources and standardised in a way that can be immediately applied to answer drug development queries."

I was able to establish the comparison between the score and evidenceCount in the platform and the downloaded JSON files. Many thanks!

Topic		Replies	Views
How to Obtain Descriptions for Tables and Columns Data Access datadownloads , data	2	41	1 October 2024
Documentation for reproducing platform target profiles from data downloads? Technical Support	1	304	16 March 2022
Help with using the new GraphQL API to pull targets associated with few diseases GraphQL API ot-platform	2	527	28 October 2021
Is there the description of each feature in Open Targets datasets? Data Access	2	362	21 July 2023
How to select all diseases using the Open Targets Platform GraphQl API? GraphQL API	1	614	9 August 2021

Help understanding the data contained in the Platform data downloads

UPDATE:

Related topics