Downloadble drug indication data - 'Known drug' vs. 'Drug - indications'

Hi,

I’ve been looking at the parquet files (25.03 release) from the ‘Known drug’ dataset as well as the ‘Drug - indications’ dataset. I am a bit confused regarding which data to utilize when it comes to drug indication data. From what I can see, the ‘Known drug’ dataset comes also with drug indication data(?) - i.e. columns ‘diseaseId’,‘phase’,‘status’,‘urls’, yet these data do not seem to match exactly with data that are listed in the ‘Drug -indications’ dataset? Would it be possible to clarify what the differences are, and the underlying idea for having both of them represented?

kind regards,
Sigve

Hi @sigven,

thank you for your question! You can find an explanation of the differences between datasets in this other thread Clinical precedence not capturing entire data - #2 by irene

This is indeed a confusing topic. If you are interested in drug/indication pairs, I’d rely on the Drug - indications dataset. We are currently working towards merging these 2 sections into one, to avoid cases where ChEMBL has curated a drug/indication pair, but the clinical precedence doesn’t reflect it.

Hope this is helpful!
Irene

1 Like

Thanks! I probably should have gone through the other threads more carefully :smiley: