The association_by_datasource_direct table returns data for datasourceId=‘europepmc’, but the underlying evidence table has no corresponding records for the same gene and datasource. This prevents us from retrieving literature PMIDs associated with disease-gene relationships.
Affected Gene (Example)
- Gene: BRCA2
- Ensembl ID: ENSG00000139618
Query 1: Association Table (WORKS - Returns 611 rows)
SELECT DISTINCT
targets.approvedSymbol AS gene,
diseases.name AS disease,
associations.score AS score
FROM
`bigquery-public-data.open_targets_platform.association_by_datasource_direct` AS associations
JOIN
`bigquery-public-data.open_targets_platform.disease` AS diseases
ON associations.diseaseId = diseases.id
JOIN
`bigquery-public-data.open_targets_platform.target` AS targets
ON associations.targetId = targets.id
WHERE
associations.targetId = ‘ENSG00000139618’
AND associations.datasourceId = ‘europepmc’
ORDER BY score DESC
LIMIT 10;
Result:
Returns 611 rows with associations
Query 2: Evidence Table (FAILS - Returns 0 rows)
SELECT
targets.approvedSymbol AS gene,
diseases.name AS disease,
evidence.literature.list AS pmids
FROM
`bigquery-public-data.open_targets_platform.evidence` AS evidence
JOIN
`bigquery-public-data.open_targets_platform.target` AS targets
ON evidence.targetId = targets.id
JOIN
`bigquery-public-data.open_targets_platform.disease` AS diseases
ON evidence.diseaseId = diseases.id
WHERE
evidence.targetId = ‘ENSG00000139618’
AND evidence.datasourceId = ‘europepmc’
LIMIT 10;
Result:
Returns 0 rows
Query 3: What datasources ARE available in evidence table?
SELECT
evidence.datasourceId,
COUNT(\*) as record_count
FROM
`bigquery-public-data.open_targets_platform.evidence` AS evidence
WHERE
evidence.targetId = ‘ENSG00000139618’
GROUP BY
evidence.datasourceId
ORDER BY
record_count DESC;
Result:
| datasourceId | record_count |
|---|---|
| cancer_biomarkers | 12 |
Note: europepmc is completely absent from the evidence table for this gene.
Expected Behavior
If association_by_datasource_direct contains aggregated scores from europepmc for a gene, we expect the underlying evidence records to be available in the evidence table with the same datasourceId.
Actual Behavior
- association_by_datasource_direct has 611 europepmc records for BRCA2
- evidence table has 0 europepmc records for BRCA2
- Only cancer_biomarkers (12 records) exists in the evidence table
Questions:
- Is europepmc evidence data intentionally excluded from the BigQuery public dataset?
- Is there an alternative table or method to retrieve the underlying literature PMIDs?
- Was this data available previously and deprecated?