Is the TCGA dataset in Open Targets Platform?

I am wondering how TCGA dataset is integrated/used in OpenTargets platform, including mutation, methylation, miRNA, expression and structure variation.



There are different layers of information from TCGA, or more generally the PCAWG consortium, cascading to the Open Targets Platform. However there is a significant volume of information that it’s not yet including in any direct or indirect way.

Some of the indirect evidence is captured in our somatic datasources. IntOgen post-processes PCAWG and other cohorts for the purpose of finding cancer driver genes using different driver-calling methods. More info on our blog. Similarly, Cancer Gene Census and ClinVar (somatic) are likely to benefit from genes and variants associated with the tumour samples from these large sequencing efforts.

However, we recognise these post-processed outputs are just a fraction of all the outputs of these projects. We have historically de-prioritised the inclusion of other layers of information as it’s hard to establish strong causal links between the genes and the disease. My interpretation is that some of the data is more meaningful from an exploratory perspective rather than to establish causality. But still meaningful. For example, we are in the process of integrating the baseline RNA expression of the tumour samples through a collaboration with ExpressionAtlas.

We would be very interested to find out what other layers of information the community find relevant to assist the drug discovery process.