Accessing reason trials stopped in .parquet tables


I was hoping to access the predicted trial termination reasons announced in the 22.04 release. I’m seeing these fields in the graphQL API but are they also accessible in the GCS .parquet dumps? I thought these would’ve been added to “knownDrugsAggregated”.

Sean Hackett
Associate Director of Data Science
Calico Life Sciences LLC

Hi Sean, the data you’re looking for is in the evidence files. These can be queried most easily through Google Big Query. The data is available as a free dataset for querying, for instance:

SELECT count(*) FROM bigquery-public-data.open_targets_platform.evidence WHERE datatypeId = 'known_drug' and studyStopReason IS NOT NULL

If you want the data is a more raw form, the CHEMBL evidence file is available from the EBI FTP.

1 Like

We will eventually work towards including the reasons in the knownDrugsAggregated dataset, but at the moment a single entry in that dataset is a list of clinical trials. It makes things more complicated.

As @JarrodBaker mentioned, the ChEMBL evidence will have all the stop reason predictions with the caveat that you will only find clinical trials studying drugs with a known mechanism of action.

1 Like

Thanks @JarrodBaker and @ochoa!