Discrepancy in variant reference allele from Open Targets and gnomAD

I am looking at variant 5_177322357_TTA_T but when I click on the gnomAD link it has a different reference allele reported compared to Open Targets (T/A). This is also the same on dbSNP.

Could you please clarify where Open Targets got the reference allele information from (TTA)?

Additionally, the variant can’t be found when clicking on the Ensembl link on the variant page from OT, so there is also an issue there.


1 Like

Hi @gerijs ,

The reference variant set on the Genetics Portal is sourced from gnomAD 2.1, which is originally on GRCh37. The links are generated via the variant rsId, and the same rsId in the new dataset has different alleles:

  • gnomAD 2.1: rs1222165268, 5-176749358-TTA-T(GRCh37)
  • gnomAD 3.1: rs1222165268, 5-177322357-T-A(GRCh38)

This discrepancy is going to be resolved soon, as we are currently in the process of updating our variant index to gnomAD 3.1. At this point, unfortunately I cannot provide a solid timeline. Please watch out for announcements.


Hi @dsuveges,

Thanks for your answer to this question. I happened to see it and would like to clarify this with you. I have downloaded the OpenTargets V2D .json files from your FTP site and now have a large data set containing GWAS study IDs, significant lead variants, and tag variants, among other things in that dataset.

The example above is confusing to me because the genomic coordinates are mapped to CRCh38 (5_177322357) but the reference and alternate alleles are apparently mapped to the GRCh37 (TTA_T). With respect to the dataset I have referenced in my post, Is it also the case that there is some discrepancy between GRCh37 and GRCh38? For example, if I have a lead variant identified in some study and am interested in investigating the tag variants linked to that lead variant, can I assume the genomic coordinates and reference and alternate allele information are all correctly mapped to GRCh38 or is there also a discrepancy here in this dataset?

Thanks so much in advance for your help.

1 Like

Hi @shirondru ,

The generation of these datasets are self-consistent eg. all the lead variants and tag variants are coming from the same study, so there should not be inconsistency. The above inconsistency is only happening in the interface between the gnomAD 2 and 3. However as part of the release process we join all variant data with the gnomAD 2.1 based variant index on chr:pos_ref_alt.

Please let me know if my answer didn’t fully cover you question.

1 Like