Hello, I am struggling to formulate a GraphQL query that would allow me to fetch L2G scores for a given SNP-study association, such as displayed here under Gene prioritisation using locus-to-gene pipeline: Open Targets Genetics . Or if not available, at least the highest scoring gene for a given SNP-study association. I can only find the ManhattanAssociation type which should somehow be able to produce bestLocus2Genes but I am unable to formulate a successful query for it.
Hello @marynias!
Welcome, and thank you for joining the Open Targets Community!
How do I access Locus2Gene data for a specific study and locus?
To access the Locus2Gene data for a specific study and locus (represented by the lead variant ID) using the Genetics Portal GraphQL API, you will need to use the studyLocus2GeneTable
endpoint and pass the studyId
and variantId
as parameters.
For example, to access the Locus2Gene data for Crohn’s disease (GCST003044) and the locus around 1_67215986_T_G, you would use the following query:
query getL2GScoresForAStudyVariantPair {
studyLocus2GeneTable(studyId:"GCST003044", variantId:"1_67215986_T_G") {
rows {
gene {
symbol
id
}
yProbaModel
yProbaDistance
yProbaInteraction
yProbaMolecularQTL
yProbaPathogenicity
hasColoc
distanceToLocus
}
}
}
Within the API response, the following fields map to the columns in the Locus2Gene data table seen in the screenshot below:
Column label | API response field |
---|---|
Gene | rows.gene.symbol |
Overall L2G score | rows.yProbaModel |
Variant Pathogenicity | rows.yProbaPathogenicity |
Distance | rows.yProbaDistance |
QTL Coloc | rows.yProbaMolecularQTL |
Chromatin interaction | rows.yProbaInteraction |
Distance to locus (bp) | rows.distanceToLocus |
Evidence of colocalisation | rows.hasColoc |
Are there other ways to access Locus2Gene data for more systematic queries?
The GraphQL API approach is good way to find Locus2Gene data for a single study-locus pair.
However, for more systemic analyses involving multiple studies or loci, I would recommend that you use our BigQuery instance - open-targets-genetics.
For example, the same GraphQL API query for Crohn’s disease (GCST003044) and the locus around 1_67215986_T_G in BigQuery would be:
SELECT
study_id,
gene_id,
y_proba_full_model,
y_proba_logi_distance,
y_proba_logi_interaction,
y_proba_logi_molecularQTL,
y_proba_logi_pathogenicity
FROM `open-targets-genetics.200201.locus2gene`
WHERE
study_id='GCST003044'
AND pos=67215986
AND chrom='1'
AND ref='T'
AND alt='G'
ORDER BY y_proba_full_model desc
# LIMIT 100
Link to run the BigQuery script
Results of the query can be exported in CSV or JSON formats or can be imported into your own BigQuery instance of Google Sheets.
Alternatively, you can also download all of the Locus2Gene data in Parquet format using our FTP service.
I hope this helps answer your question! Please feel free to comment below if you have further questions about accessing the data!
Thank you @ahercules for the clear and prompt recipe! I would have never figured that one out on my own :).
Hello,
I would like to do the exact same thing, but access the “Overall V2G” scores instead. What table do you suggest? To be more specific, I want to access columns in this table through API
Thank you!
Hi!
For the overall V2G score for a given variant (1_67215986_T_G in this case), you can use the following query in the GraphAPI playground:
query getV2GScoresForAStudyVariantPair {
genesForVariant(variantId: "1_67215986_T_G"){
gene {
symbol
id
}
overallScore
}
}
`
Thanks,
Xiangyu