Re-writing the drug index

JarrodBaker · 5 May 2021 12:31

We recently restructured the Open Targets Platform’s drug index, making it more flexible and easier for stakeholders to expand with their own data.

The Platform retrieves raw inputs from ChEMBL using an Elasticsearch instance. Our pipeline then winnows the approximately 2 million compounds available down to almost 12 000 drugs.

We define a drug to be any molecule that meets at least one of the following criteria:

There is at least 1 known indication (disease);
There is at least 1 known mechanism of action (targets); or
The ChEMBL ID can be mapped to a DrugBankID.

The new data structure is broken down into three broad tranches: molecules, mechanisms of action, and indications. They can be combined as necessary using a ChEMBL ID as a linking field.

For users who run their own instances of the ETL and Platform, there is now also the possibility to add additional data using external files. The new structure simplifies this process since a smaller number of fields must be supplied to add new data. For detailed instructions regarding required fields and formats to do this, please refer to the ETL pipeline configuration’s readme.

This post is based on our recent blog update, where you can find a more detailed explanation of the reasons and process behind the drug index restructuring.

Topic		Replies	Views
Accessing drug mechanisms from ChEMBL in the API GraphQL API	3	388	8 August 2022
Target Level Tractability Doesn't Match "Known Drugs" Technical Support	1	45	4 November 2024
Details on ChEMBL gene target links Technical Support	4	404	9 August 2021
Clarification on Number of Protein-Coding Drug Targets in OpenTargets Database Data issue datadownloads , data	1	44	31 March 2025
Retrieving >25 associated drugs GraphQL API	4	270	17 July 2024

Re-writing the drug index

Related topics