Data statistics of Open Targets

I’m interested in some general data statistics in Open Targets. I’m looking for two types of counts described below.

  • How can I get the counts of the different data types (genes, variants, proteins, drugs, diseases, etc.) that are curated in Open Targets?
  • How can I get the counts of associations between these different data types (gene-variant, drug-protein, disease-drug, etc.)?

Hey @vthawfeek! Welcome to the Open Targets Community :tada:

  • You can find the counts of different data types in the Open Targets Platform in our release notes, which are posted on the Community, blog, and documentation for each release.

Here are the counts from the latest release — 22.06:

  • Most of this information should be available in the evidence files from our data downloads. You can check the schema of the evidence to find the counts you are interested in, but the main ones are probably:

    • variantId (same data format as Open Targets Genetics)
    • targetId (ensembl)
    • drugId (string) or drugFromSource (chembl)
    • diseaseFromSource (string) or diseaseId (EFO)

I hope this helps!


Thanks @hcornu! This is indeed very helpful.