L2G Scoring Pipeline Implementation

I am reaching out to inquire about the implementation of the L2G (locus to gene) scoring pipeline (GitHub - opentargets/genetics-l2g-scoring: Locus-to-gene scoring pipeline) on mapping new loci that are not currently included in the L2G table.

**Datasets for Model Training:**Does Open Targets provide access to all necessary datasets to train the L2G model from scratch?

Pre-Trained Models: Alternatively, does Open Targets offer pre-trained L2G models that can be directly utilized for scoring?

Any information/resources/guidance in this regard is much appreciated!

Thanks

Hi @nabuan, and welcome to the Open Targets Community! :tada:

Thanks for your question. The datasets that we use for L2G are all available, but getting them and training the L2G model from scratch will be quite complicated.

We’re currently working on a new version of the Open Targets Genetics pipelines, and have just released the alpha version of Open Targets Gentropy, a Python package to facilitate the interpretation and analysis of GWAS and functional genomic studies for target identification, which includes the L2G model. We hope this will make it easier for users to run their own analyses.

If you are able to share, I would be very interested to know how you are thinking of using L2G, and which features of Open Targets Genetics are most useful to you (or that you would most like to see).

Best wishes,

Helena

1 Like

Hey Helena,

Thanks a ton for the heads-up on Open Targets Gentropy! Super excited about the alpha release. Didn’t spot it in PyPI repo tho - is it ready for a test drive?

I’m a huge fan of Open Targets and utilize your resources in a bunch of drug-discovery projects, mostly for target prioritization and diving deep into variant exploration. I’m particularly interested on using L2G pipeline to score variants which are not in the ~ 350k L2G table.

Best,
Naba

Hi Naba,
thanks very much for sharing your use case with our Community!

Yes, Gentropy is definitely ready for a test run - you can find more info here.

Please do let us know if you have any further questions or comments for us once you take a look at our new code.

Best wishes,

Annalisa

Hi Annalisa,

Thanks so much for this info.
I am trying to figure out how to run the L2G model to predict for a custom set of loci.
I was looking at the l2g benchmarking notebook to better understand the functionality and noticed that it is pulling data from the following google buckets:

credible_set_path = “gs://genetics_etl_python_playground/output/python_etl/parquet/XX.XX/credible_set”
predictions_path = “gs://genetics_etl_python_playground/output/python_etl/parquet/XX.XX/l2g_predictions”
old_predictions_path = “gs://genetics-portal-dev-data/22.09.1/outputs/l2g”

Are these buckets accessible by public?

Any advice/pointers in this regard is much appreciated!

Thanks!

Naba

Hi Naba,

As @Annalisa_Buniello and @hcornu mentioned, this package is still under active development. It is almost feature complete, but the generated data is being under review and quality control. As such, we are not ready to release the generated data just yet and I cannot provide timeline when that would happen. But at some point, the data will certainly be shared with the scientific community.

In the meantime, you can generate any datasets as the entire pipeline depends on publicly available datasets (GnomAD, OpenTargetsPlatform, FinnGen, eQTL Catalogue). I know some of these datasets are quite heavy and the process might be quite computationally expensive. The best approach is to use some cloud compute provider’s service.

Please keep your eyes on our comms for updates.

Best,
Daniel

1 Like