How are rsIDs mapped to unique variants in Open Targets Genetics?


In the Genetics portal FAQs you mention, regarding mapping conflicts:

Many multi-allelic sites can be assigned a single rsID, and some rsIDs can point to different positions in the genome.

This means that rsIDs are not unique to a single variant. We have mapped all rsIDs from GWAS Catalog to unique variants.

A small minority of rsIDs will map to multiple variant IDs (approximately 0.6% of lead variants). When this occurs, variants will be duplicated in the portal.

Where does the mapping from rsId to unique variants happen? I’ve looked at the variant annotation on GitHub but there is no explicit disambiguation process there that I can see.

To be clear my question is: when a given rsId is mapped to more than one locus, how do you pick the one locus to which this rsId will be mapped inside the variant-index dataset?

This question was sent to the Open Targets helpdesk and has been anonymised.

1 Like

This mapping actually happens in GWAS catalog, and not directly by Open Targets. When we import GWAS with summary statistics from GWAS catalog, the data have already been harmonised to ensure that the effect allele is clear, that the alleles are with respect to the forward strand, and that each row is unique (chromosome, position, effect allele, other allele). In Open Targets Genetics, we use this information to display any effect sizes (beta or OR) with the alternative (non-reference) allele as the effect allele.

This FAQ entry is simply to point out that a single rsID may correspond with more than one variant as defined by chr:pos:ref:alt, since there may be more than one alt allele at a single position. This doesn’t mean that a single rsID maps to more than one locus. (We don’t use data from alternative contigs in our pipelines.)