What happened to the batch search functionality?

With the release of the next-generation Platform on 29 April 2021, we made the decision to deprecate the batch search functionality while we work on a replacement approach.

We made this decision because the existing batch search API endpoint and user interface was not optimised to support complex queries and on-the-fly association scoring.

While we work on a new approach, users can perform batch search queries using our comprehensive list of datasets available for download. For more information on how to use the Platform data downloads, including sample Python and R scripts, please visit our data downloads documentation page.

The Platform team have started to work on a replacement batch search functionality and we want to know what data Platform users would like to see in a batch search view.

Previously, our batch search view included:

  • List of associated diseases
  • List of enriched pathways
  • List of enriched Gene Ontology terms
  • Tractability analysis for targets
  • List of drugs
  • Interactions between genes

What data and/or analysis would be useful in a batch search view to support target identification and prioritisation?

Hi Hercules. Again thank for your great job(s).
Let me say, in order, we used:
1.Gene interaction
2.pathways
3.Associated diseases
4.drugs

Now we’are also approaching mrna sirna ecc, to increase complexity of course starting from gene-symbol (sirna for part similarity). Let you know

many of us, who do not know how to program, hope that batch search functionality will soon return. This type of search was not only fundamental but one of the most useful things that could be found on the web. Its ease of use combined with the enormous amount of information, especially that related to clinical and disease, made it unfortunately irreplaceable. Today for us this value is lost. I hope you can focus on this outcome as soon as possible.

Hi @angrist, we are working on different options for batch search as we know it is very popular.

Could you please rank which sections of the batch search were most useful to your work and explain why?

  • Gene symbol mapping
  • Associated diseases
  • Pathways
  • Gene Ontology
  • Tractability
  • Drugs
  • Interactions between targets

Are there other types of analyses that would support your target identification and prioritisation research questions?

Thank you :grinning:

Dear Hercules,

Gene Mapping
Pathway
Drugs and diseases

Do you have any dates regarding when the system will be available?

Thank you,
D

@DavideG, thank you for your insights into how you use batch search. We do not have a confirmed timeline for the return of batch search, but will post more details when they become available.

In the meantime, my colleague @irene has written a post with a BigQuery script that you can use and adapt to replicate batch search. Please see Get marketed drugs for a set of targets with BigQuery.

Hello Andrew,

The original intent of the batch search was to aggregate information and extract patterns for repeated routine questions that non-data scientists have in early drug discovery.

Usually, investigators will come with a list of genes and won’t have time to look at the platform gene by gene. List size can scale up to several hundred genes:

  • Is there any gene associated to tool compounds?
  • Is there any launched drug associated to my list of genes? Is there any clinical trials?
  • Is there any safety concern by targeting these genes?
  • Where are my gene expressed? (baseline expression in tissues and cell types)
  • Do I see any significant enrichment in:
    a) specific disease areas
    b) pathways/GO terms
    c) specific type of evidence from the platform
  • From my list of genes, which ones have been highly cited, normally cited, lowly / not at all cited in the literature.

So, similar questions to the gene profile page of the platform but aggregated and analysed for a better overview.
More advanced questions could be answered.

Best regards,

Hi @ahercules,

Has there been any progress on bringing back the batch search feature? If not, is there a particular reason, e.g. a technical challenge that is difficult to solve in the GraphQL backend? Otherwise, would you be open to contributions from the community?

Best,
Roman

Hi @roman-hillje!

I have recently left the Open Targets team and so will tag @ochoa and @tsirigos who can provide more up-to-date information on plans for batch search.

Cheers,

Andrew :slight_smile:

Oh, I didn’t know about that. Thanks for tagging David and Kostas and good luck on your next adventure!

We @ The Hyve are looking into re-building the batch search feature, at least to some extent. We would love to see the feature re-introduced into the public version later through a PR. What do you think? If you are interested, we would like to hear about requirements from your side before we start implementation.

Hi @roman-hillje.

We are most definitely interested in this functionality. It’s a complex matter though.

We have noticed based on user feedback that there are differences in scope on what batch-search could be used for:

  1. There are users that understand batch-search as an enrichment tool for target information. This is more similar to the preexisting batch-search and similar to the @gkosbio comment.

  2. Other users are instead more interested in building association pages for sets of genes or sets of diseases. For example, prioritising targets by collating all evidence related to a set of diseases related to the same process (e.g. ageing). Alternatively, prioritise diseases based on a list of targets obtained through any previous analytical or experimental process. Our API is actually prepared for this functionality, although the web application does not show the option.

Give it a thought and drop me a line if you guys want to look at this.

Hi David,

Thank you for the explanation, and apologies for the delayed response. Regarding the second scenario, are you referring to the targets and diseases objects in the GraphQL API that allow retrieving associated diseases/targets for multiple targets/diseases, respectively? I tested those but ran into performance issues when submitting lists with 50+ targets/diseases (I think it’s because internally every target/disease is handled as a separate query to Elasticsearch, but not 100% sure about it).

The use case we looked into was retrieving association scores for multiple targets instead of just one, however with support of up to ~250 targets. Our solution was to load data from the assocationBy* into a dedicated ClickHouse table and implementing a query in the GraphQL API backend that can take a list of targets to filter the table by. This solves the performance problem, however it means association scores are no longer calculated on the fly. This might be an acceptable side-effect in some cases, but not so much in others I assume.

I’d be curious to know what you think about this.

Best,
Roman