Searching credibleSets by region overlap

Flizzy · 23 December 2025 14:30

I’m trying to retrieve all credible sets (study-loci) that overlap a genomic interval (e.g., a ±500kb window around a gene). I expected the regions argument of credibleSets to behave like an interval/overlap filter, but it seems to behave as an exact string match on a pre-defined region identifier.

Example:

credibleSets(regions:["chr18:59371918-61371918"]) returns 4 rows
credibleSets(regions:["chr18:59371917-61371918"]) returns 0 rows

This suggests the API is not doing coordinate overlap but requires an exact “region” key.

Is this exact-match behavior intentional?

And if yes, is there any supported way in v4 to query all credible sets overlapping a given region (“chr:start-end”), without knowing the exact pre-defined region strings in advance?

Daniel_Considine · 23 December 2025 16:25

Hi @Flizzy

You’re correct that the region column is a string. This column is used to keep track of which region susie fine-mapped credible sets have been derived from. As such, it isn’t really ideal to filter for overlaps. Also, this field will be NULL for credible sets derived from PICS fine-mapping.

Unfortunately I’m not very familiar with using the API or BigQuery for this type of data manipulation, and I’m not certain our data allows for this kind of API query. I will doublecheck with someone in the team after Christmas, but in the meantime I’ll show you what my strategy would be to do this in pyspark locally. Credible sets can be downloaded here: Open Targets Platform

Find the minimum and maximum values for the variant positions, via variantIds, in the StudyLocus/credible set locus object. VariantIds always have the format 18_59371918_ref_alt so we can split by ‘_’ and take the second element:

import pyspark.sql.functions as f
from gentropy.common.session import Session

session = Session()

cs = (
    session.spark.read.parquet("/users/dc16/data/releases/25.12/credible_set")
    .withColumn(
        "locusStart",
        f.array_min(
            f.transform(
                f.col("locus.variantId"),
                lambda v: f.split(v, "_").getItem(1).cast("int"),
            )
        ),
    )
    .withColumn(
        "locusEnd",
        f.array_max(
            f.transform(
                f.col("locus.variantId"),
                lambda v: f.split(v, "_").getItem(1).cast("int"),
            )
        ),
    )
)

Then use these locusStart and locusEnd columns, along with the chromosome column, to filter the credible set rows according to your criteria. locusStart must be less than the end of your overlap region, and locusEnd must be greater than overlap region start:

cs_filtered = cs.filter(f.col("chromosome") == "18").filter(
    (f.col("locusStart") <= 61371918) & (f.col("locusEnd") >= 59371918)
)
cs_filtered.count()
2161

This returns 2,161 credible sets in total.

Apologies again that I can’t give you a more comprehensive answer for the API query. Our Gentropy python package documentation might also be useful: Open Targets Gentropy - Open Targets Gentropy

Flizzy · 26 December 2025 12:19

Hey Daniel,

Thanks a lot for the quick reply, that makes it much clearer.

I have looked into Gentropy before, but I was hoping for something that also handles the “getting the data” part automatically, since I’m trying to plug this into a reusable pipeline. With Gentropy, it sounds like I’d first have to download the release datasets myself and then filter/join locally. In contrast, with the GraphQL API, I can compose the response I need across datasets without managing multiple local joins.

That said, your message makes me think I might have to rethink my workflow philosophy a bit and accept a manual download step, then use Gentropy/Spark for the region-type filtering.

Also, I didn’t realize you offer a BigQuery option. That might actually combine the upsides of the API and Gentropy, but I don’t yet know whether BigQuery is something we can use at work.

Thanks again, and I hope you have a relaxing holiday break!

Topic		Replies	Views
BigQuery data genetics credset table identifiers OT Genetics topics genetics-portal	2	50	30 July 2025
R script for GraphQL query: query multiple rsID and get gene IDs and Annotation information GraphQL API	17	648	18 May 2023
In Open Targets Genetics, what is the “credible set overlap”? Open Targets Genetics FAQs	0	936	22 July 2021
Updating API code query from OTG to OpenTargets Platform Data Access ot-platform	19	161	13 October 2025
R query for colocalizations GraphQL API batch-search , genetics-portal , other	5	119	9 July 2025

Searching credibleSets by region overlap

Related topics