Accessing L2G scores through API

Hello, I am struggling to formulate a GraphQL query that would allow me to fetch L2G scores for a given SNP-study association, such as displayed here under Gene prioritisation using locus-to-gene pipeline: Open Targets Genetics . Or if not available, at least the highest scoring gene for a given SNP-study association. I can only find the ManhattanAssociation type which should somehow be able to produce bestLocus2Genes but I am unable to formulate a successful query for it.

1 Like

Hello @marynias!

Welcome, and thank you for joining the Open Targets Community! :slight_smile:

How do I access Locus2Gene data for a specific study and locus?

To access the Locus2Gene data for a specific study and locus (represented by the lead variant ID) using the Genetics Portal GraphQL API, you will need to use the studyLocus2GeneTable endpoint and pass the studyId and variantId as parameters.

For example, to access the Locus2Gene data for Crohn’s disease (GCST003044) and the locus around 1_67215986_T_G, you would use the following query:

query getL2GScoresForAStudyVariantPair {
  studyLocus2GeneTable(studyId:"GCST003044", variantId:"1_67215986_T_G") {
    rows {
      gene {
        symbol
        id
      }
      yProbaModel
      yProbaDistance
      yProbaInteraction
      yProbaMolecularQTL
      yProbaPathogenicity
      hasColoc
      distanceToLocus
    }
  }
}

Link to run GraphQL API query

Within the API response, the following fields map to the columns in the Locus2Gene data table seen in the screenshot below:

Column label API response field
Gene rows.gene.symbol
Overall L2G score rows.yProbaModel
Variant Pathogenicity rows.yProbaPathogenicity
Distance rows.yProbaDistance
QTL Coloc rows.yProbaMolecularQTL
Chromatin interaction rows.yProbaInteraction
Distance to locus (bp) rows.distanceToLocus
Evidence of colocalisation rows.hasColoc

Are there other ways to access Locus2Gene data for more systematic queries?

The GraphQL API approach is good way to find Locus2Gene data for a single study-locus pair.

However, for more systemic analyses involving multiple studies or loci, I would recommend that you use our BigQuery instance - open-targets-genetics.

For example, the same GraphQL API query for Crohn’s disease (GCST003044) and the locus around 1_67215986_T_G in BigQuery would be:

SELECT 
    study_id,
    gene_id,
    y_proba_full_model,
    y_proba_logi_distance,
    y_proba_logi_interaction,
    y_proba_logi_molecularQTL,
    y_proba_logi_pathogenicity
FROM `open-targets-genetics.200201.locus2gene` 
WHERE 
    study_id='GCST003044' 
    AND pos=67215986 
    AND chrom='1'
    AND ref='T'
    AND alt='G' 
ORDER BY y_proba_full_model desc 
# LIMIT 100

Link to run the BigQuery script

Results of the query can be exported in CSV or JSON formats or can be imported into your own BigQuery instance of Google Sheets.

Alternatively, you can also download all of the Locus2Gene data in Parquet format using our FTP service.

I hope this helps answer your question! Please feel free to comment below if you have further questions about accessing the data! :slight_smile:

1 Like

Thank you @ahercules for the clear and prompt recipe! I would have never figured that one out on my own :).

1 Like