Why are there some inconsistencies in evidence count?

Hello Open Targets,

My main goal is to find all evidence that is used to calculate the association score. For this purpose, I encountered two related problems listed as follows.

  1. Inconsistency between Webpage and BigQuery

On the webpage entitled “Evidence for JAK2 in myeloproliferative disorder”

Link: Open Targets Platform

Figure 1:

There are 11 entries and 8 entries annotated under the Cancer Gene Census and the ClinVar (somatic) respectively. (I box them in red)

However, if I use Google Cloud BigQuery (codes in Figure 2) to see evidenceCount, there is one inconsistent part: there are only 7 pieces of evidence under the “eva_somatic” (Figure 3) instead of 8 entries shown on the web page. Could you point out there are any mistakes I made in the query or any misunderstanding to this issue?

Figure 2:

Figure 3:

  1. How to obtain indirect evidence via BigQuery?

As you pointed out that indirect association scores are calculated from the indirect evidence in the issue below, I tried to find indirect evidence via BigQuery.

But, I only found “open-targets-prod.platform.evidence” stored on the cloud. If I use the following codes in Figure 4, there are only five pieces of evidence in total. I know these pieces of evidence are direct. Thus, my question is where I can find indirect evidence on the Cloud so that I can obtain the same number of evidence listed on the webpage.

Figure 4
evidence_code

Charlie

Your assumptions are correct. The indirect evidence is not available in a dataset, but you can easily reproduce it using the evidence dataset + the disease dataset. You will first need to explode the diseases ancestors field to get all the paths from all the EFOs and the root. Once you have a 2 column dataset with the EFO id and the EFO id of the ontology propagated term, you can perform a JOIN operation.

It requires a lit bit of extra SQL skills, but you can always ask the community if you get stuck. Good luck!

1 Like

Thanks for the reply. After using the following query, I indeed get a list of EFO and GO items. But I have no idea as how to get all the paths from all the EFOs and the root in BigQuery. May you tell me a clear query command? Thanks.

SELECT
ancestors
FROM
open-targets-prod.platform.diseases

Charlie

You can find more help working with nested structures in BigQuery here:

The next example will give you the ancestors:

SELECT
  id,
  ancestor
FROM `bigquery-public-data.open_targets_platform.diseases` AS d
CROSS JOIN UNNEST(d.ancestors.list) AS ancestor;

When doing the subsequent join beware not to drop the direct evidence. You can also explore using a LEFT JOIN instead of the CROSS JOIN depending what you want to achieve.