Hi @nabuan, and welcome to the Open Targets Community!
Thanks for your question. The datasets that we use for L2G are all available, but getting them and training the L2G model from scratch will be quite complicated.
We’re currently working on a new version of the Open Targets Genetics pipelines, and have just released the alpha version of Open Targets Gentropy, a Python package to facilitate the interpretation and analysis of GWAS and functional genomic studies for target identification, which includes the L2G model. We hope this will make it easier for users to run their own analyses.
If you are able to share, I would be very interested to know how you are thinking of using L2G, and which features of Open Targets Genetics are most useful to you (or that you would most like to see).
Thanks a ton for the heads-up on Open Targets Gentropy! Super excited about the alpha release. Didn’t spot it in PyPI repo tho - is it ready for a test drive?
I’m a huge fan of Open Targets and utilize your resources in a bunch of drug-discovery projects, mostly for target prioritization and diving deep into variant exploration. I’m particularly interested on using L2G pipeline to score variants which are not in the ~ 350k L2G table.
Thanks so much for this info.
I am trying to figure out how to run the L2G model to predict for a custom set of loci.
I was looking at the l2g benchmarking notebook to better understand the functionality and noticed that it is pulling data from the following google buckets:
As @Annalisa_Buniello and @hcornu mentioned, this package is still under active development. It is almost feature complete, but the generated data is being under review and quality control. As such, we are not ready to release the generated data just yet and I cannot provide timeline when that would happen. But at some point, the data will certainly be shared with the scientific community.
In the meantime, you can generate any datasets as the entire pipeline depends on publicly available datasets (GnomAD, OpenTargetsPlatform, FinnGen, eQTL Catalogue). I know some of these datasets are quite heavy and the process might be quite computationally expensive. The best approach is to use some cloud compute provider’s service.