Hello, I’ve come across a large number of drugs with “no data” for Clinical Precedence but with at least one indication with clinical phase data. Can someone clarify the distinction?
Adding an example to illustrate what I mean.
Thanks!
Hello, I’ve come across a large number of drugs with “no data” for Clinical Precedence but with at least one indication with clinical phase data. Can someone clarify the distinction?
Adding an example to illustrate what I mean.
Thanks!
Hi again @Karl_Gemayel,
I guess these are the inconsistencies you commented on the other thread
The reason you are seeing discrepancies is because the clinical precedence dataset and the indications dataset basically draw from two different sources:
The reason why clindamycin palmitate is not in the evidence set is because the target is not human, and therefore it falls out of the Clinical Precedence dataset as well. You can find similar cases also when we are missing the mechanism of action annotation, as for Tozinameran.
Clinical precedence is derived from the evidence mainly for historical reasons (drug annotations were incorporated later). We will discuss in the team how we want to scope this task and I will keep you posted.
Thanks for reporting it!
Irene
Thanks @irene and sorry for posting twice.
indication
data file, but are not present in evidence
(unless my download is messed up). All the ones I’ve checked online did not have Clinical precedence data. Am I missing something?Code to reproduce:
chembl = evidence.filter("sourceId == 'chembl'").select(drugId).toPandas()
indication = indication.withColumnRenamed("id", "drugId")
indication = indication.withColumn("indications", F.explode("indications"))
indication = indication.withColumn("phase", F.col("indications.maxPhaseForIndication"))
indication = indication.select("drugId", "phase").toPandas()
# compare IDs for phase 4 in indication
indication_ids = set(indication[indication.apply(lambda x: x.phase == 4, axis=1)].drugId)
chembl_ids = set(df_chembl.drugId)
leftover = indication_ids - chembl_ids
# len(leftover) == 1015
Thanks again,
Karl
First of all, thank you for providing a reproducible example!
The molecules you mention are different examples of the case with Tozinameran that I was commenting yesterday. These are drugs for which we don’t know their mechanism of action, either because it is unknown or because there is a curation gap. If we don’t have annotation on the target these drugs are modulating, we cannot therefore build a target/disease relationship. Consequently, data will be missing in the Clinical Precedence widget.
I hope that answers your question!