Best approach for L2G mapping with GWAS variant lists for a specific indication

jkozlowska · 4 November 2025 12:00

Hi there!

I have long lists of GWAS-associated variants and would like to obtain L2G scores to identify likely causal genes. I’d appreciate guidance on the best approach as it’s currently unclear. I’ve prepared my data in parquet/CSV files with these columns:

- `variant_id`: chr:pos:ref:alt (e.g., “4:79957426:C:T”)

- `rsid`: dbSNP identifier (e.g., “rs10000339”)

- `chromosome`, `position`, `reference_allele`, `alternate_allele`: Genomic coordinates (GRCh38)

- `gwas_hit`: Lead variant for the locus

- `p_value`: Association p-value from GWAS

- `indication`: Disease indication (e.g., “uc”)

These are GWAS variants plus variants in LD (not fine-mapped credible sets). My Questions

Can I use existing Open Targets L2G scores?

Is there a way to query/download L2G scores for my specific variant list? I see L2G data in the Open Targets Genetics portal, but I’m unclear on:

Can I batch query with rsIDs or variant IDs?
What’s the recommended approach: API, GraphQL, FTP download?

Do I need to/can I run L2G locally via gentropy?

I’ve looked at the gentropy CLI documentation for locus_to_gene step, but I notice it expects:

Credible sets
Feature matrices
Reference datasets

Would L2G work on raw GWAS variants or is finemapping necessary for the pipeline to work?

What’s the recommended workflow?

Given my starting point (GWAS variants + p-values only), what would you recommend:

Option A: Query existing Open Targets L2G scores

Map my variants to OT study loci
Download pre-computed L2G predictions
Filter for my indications

Option B: Run fine-mapping first, then L2G

Use SuSiE/FINEMAP to create credible sets
Run gentropy L2G locally
More rigorous but time-intensive

Or is there something else I’m not considering?

Any guidance on the best approach would be greatly appreciated!

Szymon_Szyszkowski · 10 November 2025 13:22

Welcome to the community @jkozlowska.
Before I can answer your (many) questions, lets prepare the background and summarise all of the knowledge behind L2G that is currently implemented on the Platform.

L2G steps breakdown

As you have found, to obtain the L2G scores we use gentropy steps. The whole process of obtaining the scores is divided into 3 parts:

Building L2G Feature Matrix
Training model
Predicting scores

I had made a short schema of the process below

All dataset paths are written in red and are relative to the https://ftp.ebi.ac.uk/pub/databases/opentargets/platform/latest release.
This architecture is written in our [Unified Pipeline configuration](

Answers

Given the knowledge above, I can now answer your questions directly

Can I use existing Open Targets L2G scores?

Yes, you can use the GraphQL api to retrieve scores for specific variant.

{ 
  variant(variantId: "4_79957426_C_T") {
    id
    GWASCredibleSets: credibleSets(
      studyTypes: [gwas]
    ) {
      count
      rows {
        studyLocusId
        studyId
        l2GPredictions {
          rows {
            score
            target {
              id
            }
          }
        }
      }
    }
  }
}

This query should return the variant you were looking for and it’s L2G scores across all credible sets.

If you want to get the scores for loci associated with the disease of interest (assuming uc is Ulcerative Colitis ), you need to search for studies linked to the disease

{
  studies(diseaseIds: "EFO_0000729") {
    rows {
      id
    }
  }
}

and finally filter results from the first query (all credible sets) by the studyIds from the second query. (not shown).

Note that the L2G score is not describing variant, rather full credible set. To have a best proxy of variant score estimation you would need to make sure that your variant has a high (>0.9) posterior inclusion probability within the searched credible set.

Is there a way to query/download L2G scores for my specific variant list? I see L2G data in the Open Targets Genetics portal, but I’m unclear on:

Can I batch query with rsIDs or variant IDs?

What’s the recommended approach: API, GraphQL, FTP download?

The preferred way to query in batch is to download the datasets

predictions to find studyLocusId and L2G scores
credible_set to bring variantId that account to studyLocusId
study to filter out studyLocusId by the relevant disease

Alternatively you can use big query to run SQL directly on the datasets.

Do I need to/can I run L2G locally via gentropy?

After querying if you can not find your variants, to obtain L2G scores for them you would eventually need to run the fine-mapping (fortunately with in-sample-LD) and obtain credible sets.

To obtain the predictions for new credible sets You need to:

Transform your credible sets to StudyLocus format.
Build L2G Feature Matrix Step to obtain credible sets.
Run the L2G step with generated feature matrix and credible sets using pre-trained model

Building Feature Matrix

As you may have seen from the diagram above, to generate the L2G feature matrix, one need to provide:

colocalisation dataset that contains results from colocalising your GWAS credible sets to QTL credible sets
variant index that contains all variants from your GWAS credible sets annotated with VEP
study index dataset that contains information about the molQTLs studies (the column geneId is used to determine molecular feature affected by the credible set linked to study
target index dataset - this can be used directly from the platform output without any modifications

Would L2G work on raw GWAS variants or is finemapping necessary for the pipeline to work?

No, the feature matrix step requires the PIP to be present for each individual variant in locus as it is used as a feature weight, so if you have variants, you need to run the fine-mapping before building L2G feature matrix.

What’s the recommended workflow?
Given my starting point (GWAS variants + p-values only), what would you recommend:
Option A: Query existing Open Targets L2G scores

Map my variants to OT study loci

Download pre-computed L2G predictions

Filter for my indications
Option B: Run fine-mapping first, then L2G

Use SuSiE/FINEMAP to create credible sets

Run gentropy L2G locally

More rigorous but time-intensive

As mentioned above, I would start from querying existing predictions.

If you would not find the variants you are looking for, then I would suggest running fine-mapping and building the feature matrix, running L2G model with both..

By looking at your workflow ideas, I feel that you know most of the stuff already, if there is anything unclear, feel free to raise questions.

With kind regards,
Szymon Szyszkowski

Topic		Replies	Views
L2G Scoring Pipeline Implementation Data Access genetics-portal	5	522	14 March 2024
Query by GeneID and Phenotype to get L2G scores GraphQL API genetics-portal	1	377	15 March 2023
Accessing L2G scores through API GraphQL API genetics-portal	4	781	17 June 2022
Data sources for V2G Scoring General genetics-portal , data	5	183	13 March 2025
Which studies that can be used for L2G Data downloads	2	406	6 June 2022

Best approach for L2G mapping with GWAS variant lists for a specific indication

L2G steps breakdown

Answers

Building Feature Matrix

Related topics