It is interesting that for many genes that are considered to have “drugs”-based association with the disease, there is NO other evidence, including text mining, that associate the gene (i.e. target) with the disease (for example, TUBB*, POLE*, etc. in Targets associated with T-cell non-Hodgkin lymphoma). This is counterintuitive because a paper should definitely exist to capture the drug-target interaction (and thus should be demonstrated by text mining), or else there is no way a drug could mediate the disease-target association.
I examined randomly some diseases and the source at ChEMBL, and found that all of the genes that exhibit such behavior correspond to protein subunits. I believe the cause is that the evidence for “drugs”-association of such a given [subunit, disease] is inherited from the complex-level evidence, i.e. the paper only finds the drug interacting with the protein complex and there is no resolution at the subunit level, but ALL subunits are still considered to interact with the drug.
Please correct me if my guess is wrong. If it is indeed the case, would it be possible to optimize this procedure such that, say, there comes some weight for this kind of inheritance and text mining also include such inherited evidence?
Thanks,
Y