With the release of the next-generation Platform on 29 April 2021, we made the decision to deprecate the batch search functionality while we work on a replacement approach.
We made this decision because the existing batch search API endpoint and user interface was not optimised to support complex queries and on-the-fly association scoring.
The Platform team have started to work on a replacement batch search functionality and we want to know what data Platform users would like to see in a batch search view.
Previously, our batch search view included:
List of associated diseases
List of enriched pathways
List of enriched Gene Ontology terms
Tractability analysis for targets
List of drugs
Interactions between genes
What data and/or analysis would be useful in a batch search view to support target identification and prioritisation?
many of us, who do not know how to program, hope that batch search functionality will soon return. This type of search was not only fundamental but one of the most useful things that could be found on the web. Its ease of use combined with the enormous amount of information, especially that related to clinical and disease, made it unfortunately irreplaceable. Today for us this value is lost. I hope you can focus on this outcome as soon as possible.
@DavideG, thank you for your insights into how you use batch search. We do not have a confirmed timeline for the return of batch search, but will post more details when they become available.
The original intent of the batch search was to aggregate information and extract patterns for repeated routine questions that non-data scientists have in early drug discovery.
Usually, investigators will come with a list of genes and won’t have time to look at the platform gene by gene. List size can scale up to several hundred genes:
Is there any gene associated to tool compounds?
Is there any launched drug associated to my list of genes? Is there any clinical trials?
Is there any safety concern by targeting these genes?
Where are my gene expressed? (baseline expression in tissues and cell types)
Do I see any significant enrichment in:
a) specific disease areas
b) pathways/GO terms
c) specific type of evidence from the platform
From my list of genes, which ones have been highly cited, normally cited, lowly / not at all cited in the literature.
So, similar questions to the gene profile page of the platform but aggregated and analysed for a better overview.
More advanced questions could be answered.
Has there been any progress on bringing back the batch search feature? If not, is there a particular reason, e.g. a technical challenge that is difficult to solve in the GraphQL backend? Otherwise, would you be open to contributions from the community?
Oh, I didn’t know about that. Thanks for tagging David and Kostas and good luck on your next adventure!
We @ The Hyve are looking into re-building the batch search feature, at least to some extent. We would love to see the feature re-introduced into the public version later through a PR. What do you think? If you are interested, we would like to hear about requirements from your side before we start implementation.
We are most definitely interested in this functionality. It’s a complex matter though.
We have noticed based on user feedback that there are differences in scope on what batch-search could be used for:
There are users that understand batch-search as an enrichment tool for target information. This is more similar to the preexisting batch-search and similar to the @gkosbio comment.
Other users are instead more interested in building association pages for sets of genes or sets of diseases. For example, prioritising targets by collating all evidence related to a set of diseases related to the same process (e.g. ageing). Alternatively, prioritise diseases based on a list of targets obtained through any previous analytical or experimental process. Our API is actually prepared for this functionality, although the web application does not show the option.
Give it a thought and drop me a line if you guys want to look at this.
Thank you for the explanation, and apologies for the delayed response. Regarding the second scenario, are you referring to the targets and diseases objects in the GraphQL API that allow retrieving associated diseases/targets for multiple targets/diseases, respectively? I tested those but ran into performance issues when submitting lists with 50+ targets/diseases (I think it’s because internally every target/disease is handled as a separate query to Elasticsearch, but not 100% sure about it).
The use case we looked into was retrieving association scores for multiple targets instead of just one, however with support of up to ~250 targets. Our solution was to load data from the assocationBy* into a dedicated ClickHouse table and implementing a query in the GraphQL API backend that can take a list of targets to filter the table by. This solves the performance problem, however it means association scores are no longer calculated on the fly. This might be an acceptable side-effect in some cases, but not so much in others I assume.