What is the backend database system that is used for open targets data?

Hello, Open Target Community,

I am pretty new to Open Targets. I am fascinated by how Open Targets can host such big data with increasing size and varieties and also providing fast queries. I am very interested in know more about the platform infrastructure. I find some information here, Platform infrastructure - Open Targets Platform Documentation, but it looks somewhat abstract to me. Are the backend a mixed implementation of Elasticsearch and Clickhouse? And how the GraphQL and Google BigQuery interact with the backend? I have some familiarity with Elasticsearch, but no Clickhouse. Is there any good introductory tutorial on more details on how the system works and infrastructure setup? Thanks a lot!

Hello @wenliang! :wave:

Welcome to the Open Targets Community! :tada:

My apologies for the delay in responding to your post.

Yes, we use both ElasticSearch and ClickHouse to power the Open Targets Platform. ElasticSearch contains data related to our main entities – targets, diseases/phenotypes, and drugs – along with our target-disease evidence. ClickHouse contains our target-disease associations data and this allows us to explore opportunities to provide on-the-fly scoring for target-disease associations.

Our GraphQL API exposes various endpoints that are used by our front-end web interface.

We also host our data in Google BigQuery to support users that want to use SQL to answer more complex and systematic queries.

You can learn more about the technical aspects of the Platform in our infrastructure documentation which contains relevant links to our GitHub repositories.

Feel free to comment below if you have any further questions.

Cheers,

~ Andrew :slight_smile: