There are 11 entries and 8 entries annotated under the Cancer Gene Census and the ClinVar (somatic) respectively. (I box them in red)
However, if I use Google Cloud BigQuery (codes in Figure 2) to see evidenceCount, there is one inconsistent part: there are only 7 pieces of evidence under the “eva_somatic” (Figure 3) instead of 8 entries shown on the web page. Could you point out there are any mistakes I made in the query or any misunderstanding to this issue?
As you pointed out that indirect association scores are calculated from the indirect evidence in the issue below, I tried to find indirect evidence via BigQuery.
But, I only found “open-targets-prod.platform.evidence” stored on the cloud. If I use the following codes in Figure 4, there are only five pieces of evidence in total. I know these pieces of evidence are direct. Thus, my question is where I can find indirect evidence on the Cloud so that I can obtain the same number of evidence listed on the webpage.
Your assumptions are correct. The indirect evidence is not available in a dataset, but you can easily reproduce it using the evidence dataset + the disease dataset. You will first need to explode the diseases ancestors field to get all the paths from all the EFOs and the root. Once you have a 2 column dataset with the EFO id and the EFO id of the ontology propagated term, you can perform a JOIN operation.
It requires a lit bit of extra SQL skills, but you can always ask the community if you get stuck. Good luck!
Thanks for the reply. After using the following query, I indeed get a list of EFO and GO items. But I have no idea as how to get all the paths from all the EFOs and the root in BigQuery. May you tell me a clear query command? Thanks.