Hi,
I can confirm, the Platform dataset has been updated on BigQuery! This query:
SELECT
evidence.datasourceId,
COUNT(*) as record_count
FROM
`bigquery-public-data.open_targets_platform.evidence_*` AS evidence
WHERE
evidence.targetId = "ENSG00000139618"
GROUP BY
evidence.datasourceId
ORDER BY
record_count DESC;
Give this output:
1 eva 71084
2 europepmc 21709
3 gene_burden 609
4 gwas_credible_sets 463
5 cancer_gene_census 246
6 eva_somatic 118
7 genomics_england 106
8 uniprot_variants 96
9 impc 67
10 intogen 19
11 cancer_biomarkers 12
12 expression_atlas 11
13 reactome 8
14 uniprot_literature 7
15 orphanet 5
16 gene2phenotype 3
17 clingen 2
18 crispr 1
19 crispr_screen 1
The different evidence sources are stored in distinct tables named as evidence_{sourceId}, so if you want to read all evidence you’ll need to read all tables as FROM `bigquery-public-data.open_targets_platform.evidence_*` AS evidence. And from now on, you don’t need to apply a filter if you want to work with source specific evidence.
One caveat: for some reason, Google syncs data without dropping tables that are no longer found in our datasets, so the evidence table is still there, please don’t use it, or read all evidence as evidence*. We’ll reach out to Google to make sure this table will not be propagated for the next release.
Please keep us informed if you still found anything strange.
Best,
Daniel
