Help understanding the data contained in the Platform data downloads

I would appreciate it if you could tell me whether it is possible to have metadata about the data available in the data downloads, because I can’t quite understand some column names and the meaning of some rows. For instance, in the diseases table there is a column named “ sko “ and I can’t understand what it means, and I also couldn’t find in the documentation about it. I have a hard time understanding dbXRefs column values as well for example.

So my request is: could you please help me find a documentation about the data that will make me understand what is it about, the columns meaning for example because I couldn’t find it anywhere, and guide me to a better understanding of the data available in the Open Targets Platform?

This question was sent to the Open Targets helpdesk and has been posted here so that the answers can benefit the whole Community of users.

Hi @imane, and thank you for your question!

I have had a chat with the team and we agree that having metadata would help to make our data more accessible, but we don’t currently have any metadata for specific datasets nor the capacity to create it.

sko is an inherited field name from other datasets, while dbXrefs are database cross references that we use to help us join our data to other input sources.

For other fields, we suggest taking a look at the GraphQL API schema endpoint, which includes some documentation. For example, disease in the API schema shows:

"Disease or phenotype entity"
type Disease {
  "Open Targets disease id"
  id: String!

  "Disease name"
  name: String!

  "Disease description"
  description: String

  "List of external cross reference IDs"
  dbXRefs: [String!]

  "List of direct location Disease terms"
  directLocationIds: [String!]

  "List of indirect location Disease terms"
  indirectLocationIds: [String!]

  "List of obsolete diseases"
  obsoleteTerms: [String!]

  "Disease synonyms"
  synonyms: [DiseaseSynonyms!]
  ancestors: [String!]!
  descendants: [String!]!

You can play around with the API in our GraphQL Playground.

However, it is important to note that not all fields are annotated, and the API names aren’t always correlated with the names used in the raw data.

We’re working on documenting our endpoints, but I’m sorry I couldn’t be more helpful at this time!

Could you please help me understand how the “score” in the table “associationByOverallDirect” is calculated and what does it mean?

I would appreciate it also if you could explain to me what is the “evidenceCount” in the same table.

Hi @imane!

AssociationByOverallDirect is the overall association score using only direct evidence, and evidenceCount is the number of evidence strings that support that association.

For more information about how the scoring is calculated, we recommend you take a look at the Platform documentation: Target - disease associations - Open Targets Platform Documentation

Helena