Clinical precedence not capturing entire data

Karl_Gemayel · 16 May 2023 09:23

Hello, I’ve come across a large number of drugs with “no data” for Clinical Precedence but with at least one indication with clinical phase data. Can someone clarify the distinction?

Adding an example to illustrate what I mean.

Thanks!

irene · 16 May 2023 21:23

Hi again @Karl_Gemayel,

I guess these are the inconsistencies you commented on the other thread

The reason you are seeing discrepancies is because the clinical precedence dataset and the indications dataset basically draw from two different sources:

Mechanism of Action / Indication / Drug Warnings, they are reproductions of the widgets you can find on ChEMBL. We download them directly from their database.
Clinical Precedence / Known Drugs, are datasets that we create derived from an ad-hoc pipeline that generates the disease/target evidence data from ChEMBL. In some cases it extends the data present in ChEMBL with extra annotation for us such as new mechanisms of action.

The reason why clindamycin palmitate is not in the evidence set is because the target is not human, and therefore it falls out of the Clinical Precedence dataset as well. You can find similar cases also when we are missing the mechanism of action annotation, as for Tozinameran.
Clinical precedence is derived from the evidence mainly for historical reasons (drug annotations were incorporated later). We will discuss in the team how we want to scope this task and I will keep you posted.

Thanks for reporting it!
Irene

Karl_Gemayel · 17 May 2023 08:02

Thanks @irene and sorry for posting twice.

Perhaps the example I chose wasn’t the best. What about drugs such as: CHEMBL1200680, CHEMBL1200522, CHEMBL203266?
I’ve found 1015 such drugs that have a phase 4 status in the indication data file, but are not present in evidence (unless my download is messed up). All the ones I’ve checked online did not have Clinical precedence data. Am I missing something?

Code to reproduce:

chembl = evidence.filter("sourceId == 'chembl'").select(drugId).toPandas()

indication = indication.withColumnRenamed("id", "drugId")
indication = indication.withColumn("indications", F.explode("indications"))
indication = indication.withColumn("phase", F.col("indications.maxPhaseForIndication"))
indication = indication.select("drugId", "phase").toPandas()

# compare IDs for phase 4 in indication
indication_ids = set(indication[indication.apply(lambda x: x.phase == 4, axis=1)].drugId)
chembl_ids = set(df_chembl.drugId)
leftover = indication_ids - chembl_ids

# len(leftover) == 1015

Thanks again,
Karl

irene · 17 May 2023 13:46

First of all, thank you for providing a reproducible example!

The molecules you mention are different examples of the case with Tozinameran that I was commenting yesterday. These are drugs for which we don’t know their mechanism of action, either because it is unknown or because there is a curation gap. If we don’t have annotation on the target these drugs are modulating, we cannot therefore build a target/disease relationship. Consequently, data will be missing in the Clinical Precedence widget.

I hope that answers your question!

Topic		Replies	Views
Downloadble drug indication data - 'Known drug' vs. 'Drug - indications' Data issue datadownloads , data	2	22	31 March 2025
Drug-indication/Clinical precedence pairs on Open Targets Data issue ot-platform , data	1	34	13 August 2024
Differences in number of unique drugs per clinical precedence Data Access	1	188	16 May 2023
Differences between linkedDiseases and approvedIndications across versions Data issue datadownloads , ot-platform , data	0	120	18 January 2024
Clinical trial source link Platform feature requests other	1	281	15 June 2023

Clinical precedence not capturing entire data

Related topics