Need help with my Google BigQuery - It appears that europepmc evidence now returns zero records

The association_by_datasource_direct table returns data for datasourceId=‘europepmc’, but the underlying evidence table has no corresponding records for the same gene and datasource. This prevents us from retrieving literature PMIDs associated with disease-gene relationships.

Affected Gene (Example)

  • Gene: BRCA2
  • Ensembl ID: ENSG00000139618

Query 1: Association Table (WORKS - Returns 611 rows)

SELECT DISTINCT
targets.approvedSymbol AS gene,
diseases.name AS disease,
associations.score AS score
FROM
`bigquery-public-data.open_targets_platform.association_by_datasource_direct` AS associations
JOIN
`bigquery-public-data.open_targets_platform.disease` AS diseases
ON associations.diseaseId = diseases.id
JOIN
`bigquery-public-data.open_targets_platform.target` AS targets
ON associations.targetId = targets.id
WHERE
associations.targetId = ‘ENSG00000139618’
AND associations.datasourceId = ‘europepmc’
ORDER BY score DESC
LIMIT 10;

Result: :white_check_mark: Returns 611 rows with associations


Query 2: Evidence Table (FAILS - Returns 0 rows)

SELECT
targets.approvedSymbol AS gene,
diseases.name AS disease,
evidence.literature.list AS pmids
FROM
`bigquery-public-data.open_targets_platform.evidence` AS evidence
JOIN
`bigquery-public-data.open_targets_platform.target` AS targets
ON evidence.targetId = targets.id
JOIN
`bigquery-public-data.open_targets_platform.disease` AS diseases
ON evidence.diseaseId = diseases.id
WHERE
evidence.targetId = ‘ENSG00000139618’
AND evidence.datasourceId = ‘europepmc’
LIMIT 10;

Result: :cross_mark: Returns 0 rows


Query 3: What datasources ARE available in evidence table?

SELECT
evidence.datasourceId,
COUNT(\*) as record_count
FROM
`bigquery-public-data.open_targets_platform.evidence` AS evidence
WHERE
evidence.targetId = ‘ENSG00000139618’
GROUP BY
evidence.datasourceId
ORDER BY
record_count DESC;

Result:

datasourceId record_count
cancer_biomarkers 12

Note: europepmc is completely absent from the evidence table for this gene.


Expected Behavior

If association_by_datasource_direct contains aggregated scores from europepmc for a gene, we expect the underlying evidence records to be available in the evidence table with the same datasourceId.

Actual Behavior

  • association_by_datasource_direct has 611 europepmc records for BRCA2
  • evidence table has 0 europepmc records for BRCA2
  • Only cancer_biomarkers (12 records) exists in the evidence table

Questions:

  1. Is europepmc evidence data intentionally excluded from the BigQuery public dataset?
  2. Is there an alternative table or method to retrieve the underlying literature PMIDs?
  3. Was this data available previously and deprecated?

Hi bhattu,

Thank you for reporting this issue! With 25.12, we have changed the layout of how we represent evidence: the so far unified dataset got exploded into datasource specific evidence datasets each with their own schemas. This difference lead to a sequential overwrite of the evidence dataset leading to an incomplete dataset containing only one (the last uploaded) datasource at the end. We are updating our datasets in BQ reflecting the above mentioned changes. I’ll keep you posted once the process is finished.

Best,
Daniel

Hi bhattu,

We have fixed the issue and have pushed the updated evidence dataset to bigquery. However Google only syncs data on the first day of each month, so the update is not public till 1st February. Sorry for the inconvenience.

Best,
Daniel