Differences in number of unique drugs per clinical precedence

Karl_Gemayel · 15 May 2023 14:02

I’m trying to get a dataframe with all clinical data using the json/parquet files. Specifically, I care about drugId, targetId, diseaseId, and maxPhaseForIndication.

It seems this data is present in ‘molecule’, ‘indication’, and ‘knownDrugsAggregated’ files. However, the numbers I get from each are very different.

Can someone explain why? And perhaps suggest the best way to get this information?

irene · 16 May 2023 16:03

Hi @Karl_Gemayel and welcome to our Community!

In your case I would use the evidence parquet file, it should be straightforward to extract a dataframe with the fields that you need directly from there. You wouldn’t need to perform any joins.

As you know, ChEMBL is a provider of evidence between target and disease. ChEMBL evidence represents any target-disease relationship that can be explained by an approved or clinical candidate drug, targeting the gene product and indicated for the disease.

If you download ChEMBL evidence, your fields of interest are: drugId, clinicalPhase, targetId, diseaseId. Note that one evidence represents one study, so to extract the max phase for indication you’d need to aggregate the data and extract the maximum clinical phase. I hope this is helpful.

Could you please elaborate on the differences in numbers you are seeing?

Best,
Irene

Topic		Replies	Views
Downloadble drug indication data - 'Known drug' vs. 'Drug - indications' Data issue datadownloads , data	2	23	31 March 2025
Clinical precedence not capturing entire data Bug reports ot-platform , data	3	271	17 May 2023
How do I know the indication of a clinical trial for a target? Data downloads	2	96	4 December 2024
Drug-indication/Clinical precedence pairs on Open Targets Data issue ot-platform , data	1	38	13 August 2024
How to find known drugs for a given disease Data Access ot-platform	0	793	1 June 2021

Differences in number of unique drugs per clinical precedence

Related topics