Clinical vs. pre-clinical safety liability data

Hello, I would like to get an understanding of the safety liability data within the target dataset. Specifically, I would like to have a way of discriminating safety liability entries originating from preclinical vs. clinical measurements. I have plotted below the distribution of datasource for all safety liability entries. Is it easy to discriminate preclinical vs. clinical based on the data source or some other attribute?

Furthermore, I would like to get an understanding what it means for an entry to originate from one of the referenced publications. Have these papers shared the data publicly?

Hi @agamemnonc and welcome to our Community!

You can distinguish the data provenance by filtering the type of study that originated the evidence for that target safety event. This is all available in the safetyLiabilities object of the target dataset.

This is the breakdown by type of study and datasource:

+-------------+---------------------+-----+                                     
|studyType    |datasource           |count|
+-------------+---------------------+-----+
|patient-level|PharmGKB             |2507 |
|cell-based   |ToxCast              |537  |
|biochemical  |ToxCast              |194  |
|clinical     |Brennan et al. (2024)|194  |
|NULL         |Force et al. (2011)  |47   |
|NULL         |Lamore et al. (2017) |30   |
|preclinical  |Brennan et al. (2024)|16   |
|NULL         |ToxCast              |1    |
+-------------+---------------------+-----+
# target.filter(f.size("safetyLiabilities") > 0).select("id", f.explode("safetyLiabilities").alias("sl")).select("id", "sl.datasource", f.explode("sl.studies.type").alias("studyType")).groupBy("studyType", "datasource").count().orderBy(f.col("count").desc()).show(truncate=False)

This annotation for Lamore et al. (2017) and Force et al. (2011) is currently incomplete, and we are working to address this soon. However, I’d say that these entries can generally be considered preclinical.

Broadly, the only clinical data we have that is not based on assays comes from PharmGKB, which informs about toxic drug responses associated to patient pharmacogenetics. We also include curated data from a recent publication by Brennan et al., which focuses on secondary pharmacology. You can learn more about each source of data in the Target safety page of our documentation.

I hope this is helpful!
Irene

Thank you very much @irene