I’ve been looking at the parquet files (25.03 release) from the ‘Known drug’ dataset as well as the ‘Drug - indications’ dataset. I am a bit confused regarding which data to utilize when it comes to drug indication data. From what I can see, the ‘Known drug’ dataset comes also with drug indication data(?) - i.e. columns ‘diseaseId’,‘phase’,‘status’,‘urls’, yet these data do not seem to match exactly with data that are listed in the ‘Drug -indications’ dataset? Would it be possible to clarify what the differences are, and the underlying idea for having both of them represented?
This is indeed a confusing topic. If you are interested in drug/indication pairs, I’d rely on the Drug - indications dataset. We are currently working towards merging these 2 sections into one, to avoid cases where ChEMBL has curated a drug/indication pair, but the clinical precedence doesn’t reflect it.