Help understanding the data contained in the Platform data downloads

imane · 14 March 2022 12:53

I would appreciate it if you could tell me whether it is possible to have metadata about the data available in the data downloads, because I can’t quite understand some column names and the meaning of some rows. For instance, in the diseases table there is a column named “ sko “ and I can’t understand what it means, and I also couldn’t find in the documentation about it. I have a hard time understanding dbXRefs column values as well for example.

So my request is: could you please help me find a documentation about the data that will make me understand what is it about, the columns meaning for example because I couldn’t find it anywhere, and guide me to a better understanding of the data available in the Open Targets Platform?

This question was sent to the Open Targets helpdesk and has been posted here so that the answers can benefit the whole Community of users.

hcornu · 15 March 2022 12:48

Hi @imane, and thank you for your question!

I have had a chat with the team and we agree that having metadata would help to make our data more accessible, but we don’t currently have any metadata for specific datasets nor the capacity to create it.

sko is an inherited field name from other datasets, while dbXrefs are database cross references that we use to help us join our data to other input sources.

For other fields, we suggest taking a look at the GraphQL API schema endpoint, which includes some documentation. For example, disease in the API schema shows:

"Disease or phenotype entity"
type Disease {
  "Open Targets disease id"
  id: String!

  "Disease name"
  name: String!

  "Disease description"
  description: String

  "List of external cross reference IDs"
  dbXRefs: [String!]

  "List of direct location Disease terms"
  directLocationIds: [String!]

  "List of indirect location Disease terms"
  indirectLocationIds: [String!]

  "List of obsolete diseases"
  obsoleteTerms: [String!]

  "Disease synonyms"
  synonyms: [DiseaseSynonyms!]
  ancestors: [String!]!
  descendants: [String!]!

You can play around with the API in our GraphQL Playground.

However, it is important to note that not all fields are annotated, and the API names aren’t always correlated with the names used in the raw data.

We’re working on documenting our endpoints, but I’m sorry I couldn’t be more helpful at this time!

imane · 25 March 2022 13:25

Could you please help me understand how the “score” in the table “associationByOverallDirect” is calculated and what does it mean?

I would appreciate it also if you could explain to me what is the “evidenceCount” in the same table.

hcornu · 25 March 2022 15:31

Hi @imane!

AssociationByOverallDirect is the overall association score using only direct evidence, and evidenceCount is the number of evidence strings that support that association.

For more information about how the scoring is calculated, we recommend you take a look at the Platform documentation: Target - disease associations - Open Targets Platform Documentation

Helena

koryclick · 21 August 2024 11:37

Hello!

I have a question about disease > ontology > leaf. What does it mean leaf?

irene · 21 August 2024 13:37

Leaves are disease terms that are at the very bottom of the branch, i.e a term without children. Cowpox is a leaf term, syphilis is not.

koryclick · 9 September 2024 15:46

Hello OT team!
In the same line of “clarifying scores of evidence”. But I write it here because it is defined “Evidence Count”.
Basically I extracted the “score” and the “evidenceCount” from “diseaseId”=EFO_0000565 & “targetId”=ENSG00000157764 (leukemia and BRAF).
The information comes from "Associations - direct (overall score) " and "Associations - direct (by data source) " json files.
And the evidenceCount of both json files is the same for this association (4) .
the results is just one dictionary for each json file.

“associationByDatasourceDirect”:
“score”: 0.16253705352810702,
“datasourceId”: “chembl”,
“evidenceCount”: 4,
“diseaseId”: “EFO_0000565”,

associationByOverallDirect_view
“score”: 0.09881128059278485,
“targetId”: “ENSG00000157764”,
“diseaseId”: “EFO_0000565”,
“evidenceCount”: 4

In which part of the documentation do they say how is calculated the evidenceCount? I understand is the ontology evidence ? Should I assume evidenceCount means that the ontological association between “TargetID” and “DiseaseID” appears in ONLY 1 datasource from the total of 23 (in OT_24.06), which is on the order 4 ? or the string appears 4 times in the ontology association calculation algorithm (that I guess is somewhere related with the data_processing?)

Each unique target-disease pair in the Open Targets Platform is defined as an association . For example, while there might be several pieces of evidence referring to CFTR and Cystic fibrosis from multiple sources, one single association contextualises all this information within the Platform. [1]

Just to be sure I understand the numbers behind and when to use them.

Many thanks again!

UPDATE:

I Think I found part of my answer in this piece of text.
" Target - disease evidence

Every event or set of events pinpointing a target as a potential causal gene or protein for a disease, represents the unit of information, most often referred as evidence. Within Open Targets, a series of pipelines ensure information is retrieved from their sources and standardised in a way that can be immediately applied to answer drug development queries."

I was able to establish the comparison between the score and evidenceCount in the platform and the downloaded JSON files. Many thanks!

Topic		Replies	Views
Overall association score via graphql/API GraphQL API	6	470	28 June 2022
Score values from Disease->Target vs. Target->Disease Frequently Asked Questions	3	546	25 June 2021
How to find targets associated with a disease using the new GraphQL API or Google BigQuery Data Access ot-platform	0	1062	1 June 2021
Where can I find the overall association score in DB? Data Access ot-platform	4	552	2 August 2021
Returning all associations data using the Platform API GraphQL API	3	1476	11 October 2021

Help understanding the data contained in the Platform data downloads

UPDATE:

Related topics