Why are there some inconsistencies in evidence count?

charlie · 6 September 2023 08:10

Hello Open Targets,

My main goal is to find all evidence that is used to calculate the association score. For this purpose, I encountered two related problems listed as follows.

Inconsistency between Webpage and BigQuery

On the webpage entitled “Evidence for JAK2 in myeloproliferative disorder”

Link: Open Targets Platform

Figure 1:

There are 11 entries and 8 entries annotated under the Cancer Gene Census and the ClinVar (somatic) respectively. (I box them in red)

However, if I use Google Cloud BigQuery (codes in Figure 2) to see evidenceCount, there is one inconsistent part: there are only 7 pieces of evidence under the “eva_somatic” (Figure 3) instead of 8 entries shown on the web page. Could you point out there are any mistakes I made in the query or any misunderstanding to this issue?

Figure 2:

Figure 3:

How to obtain indirect evidence via BigQuery?

As you pointed out that indirect association scores are calculated from the indirect evidence in the issue below, I tried to find indirect evidence via BigQuery.

But, I only found “open-targets-prod.platform.evidence” stored on the cloud. If I use the following codes in Figure 4, there are only five pieces of evidence in total. I know these pieces of evidence are direct. Thus, my question is where I can find indirect evidence on the Cloud so that I can obtain the same number of evidence listed on the webpage.

Figure 4
evidence_code

Charlie

ochoa · 6 September 2023 11:00

Your assumptions are correct. The indirect evidence is not available in a dataset, but you can easily reproduce it using the evidence dataset + the disease dataset. You will first need to explode the diseases ancestors field to get all the paths from all the EFOs and the root. Once you have a 2 column dataset with the EFO id and the EFO id of the ontology propagated term, you can perform a JOIN operation.

It requires a lit bit of extra SQL skills, but you can always ask the community if you get stuck. Good luck!

charlie · 7 September 2023 08:51

Thanks for the reply. After using the following query, I indeed get a list of EFO and GO items. But I have no idea as how to get all the paths from all the EFOs and the root in BigQuery. May you tell me a clear query command? Thanks.

SELECT
ancestors
FROM
open-targets-prod.platform.diseases

Charlie

ochoa · 7 September 2023 10:54

You can find more help working with nested structures in BigQuery here:

The next example will give you the ancestors:

SELECT
  id,
  ancestor
FROM `bigquery-public-data.open_targets_platform.diseases` AS d
CROSS JOIN UNNEST(d.ancestors.list) AS ancestor;

When doing the subsequent join beware not to drop the direct evidence. You can also explore using a LEFT JOIN instead of the CROSS JOIN depending what you want to achieve.

Topic		Replies	Views
The number of evidences from "open-targets-prod.platform.evidence" does not match "evidenceCount" Google BigQuery/Cloud	6	280	11 August 2023
Discrepancy in Evidence Counts Between GraphQL API and BigQuery for enableIndirect: true Google BigQuery/Cloud data	5	67	14 February 2025
Has OpenTargets significantly changed the number of high associations (e.g. overall association > 0.6) in the last 3 years? Technical Support data	2	266	16 September 2022
How does the Platform display direct and indirect evidence? Frequently Asked Questions	7	863	13 May 2022
Evidence missing from JSON full data download Data Access	4	93	3 June 2024

Why are there some inconsistencies in evidence count?

Related topics