Discrepancy in variant reference allele from Open Targets and gnomAD

I am looking at variant 5_177322357_TTA_T but when I click on the gnomAD link it has a different reference allele reported compared to Open Targets (T/A). This is also the same on dbSNP.

Could you please clarify where Open Targets got the reference allele information from (TTA)?

Additionally, the variant can’t be found when clicking on the Ensembl link on the variant page from OT, so there is also an issue there.

Cheers,
Geri

1 Like

Hi @gerijs ,

The reference variant set on the Genetics Portal is sourced from gnomAD 2.1, which is originally on GRCh37. The links are generated via the variant rsId, and the same rsId in the new dataset has different alleles:

  • gnomAD 2.1: rs1222165268, 5-176749358-TTA-T(GRCh37)
  • gnomAD 3.1: rs1222165268, 5-177322357-T-A(GRCh38)

This discrepancy is going to be resolved soon, as we are currently in the process of updating our variant index to gnomAD 3.1. At this point, unfortunately I cannot provide a solid timeline. Please watch out for announcements.

2 Likes

Hi @dsuveges,

Thanks for your answer to this question. I happened to see it and would like to clarify this with you. I have downloaded the OpenTargets V2D .json files from your FTP site and now have a large data set containing GWAS study IDs, significant lead variants, and tag variants, among other things in that dataset.

The example above is confusing to me because the genomic coordinates are mapped to CRCh38 (5_177322357) but the reference and alternate alleles are apparently mapped to the GRCh37 (TTA_T). With respect to the dataset I have referenced in my post, Is it also the case that there is some discrepancy between GRCh37 and GRCh38? For example, if I have a lead variant identified in some study and am interested in investigating the tag variants linked to that lead variant, can I assume the genomic coordinates and reference and alternate allele information are all correctly mapped to GRCh38 or is there also a discrepancy here in this dataset?

Thanks so much in advance for your help.

1 Like

Hi @shirondru ,

The generation of these datasets are self-consistent eg. all the lead variants and tag variants are coming from the same study, so there should not be inconsistency. The above inconsistency is only happening in the interface between the gnomAD 2 and 3. However as part of the release process we join all variant data with the gnomAD 2.1 based variant index on chr:pos_ref_alt.

Please let me know if my answer didn’t fully cover you question.

1 Like

Hi @dsuveges I wanted to follow up on this and use a different variant as an example.

The variant 6_26745970_C_T seems to have a similar problem and I am not sure how to interpret the genomic coordinates and ref/alt allele of this variant.

If i follow the rsid for this variant (rs13213200) supplied by OpenTargets on gnomAD I see the following:

gnomAD 2.1: 6-26755915-C-T link. Here, the chrom, ref and alt alleles are the same as in OpenTargets Genetics, but the position is different (26755915 vs 26745970).
gnomAD 3.1: 6-26745970-G-A link. Here, the chrom and position are the same as in Open Targets Genetics, but the ref and alt alleles are different (G>A vs C>T)

Further, if I search for position chr6:26745970 on the UCSC genome browser using both GRCh37 and GRCh38 assemblies, the reference allele is G in both cases, never C.

All this leads me to to think that 6_26745970_C_T is not a real variant in either GRCh37 nor GRCh38. Is that fair? I think I can resolve this discrepancy by following the rsID on gnomAD and grabbing the chrom-pos-ref-alt with my assembly of choice, do you agree? Are you aware of a faster, less manual way of doing this (perhaps taking advantage of OpenTargets Genetics FTP?)

Thanks again for your help.

Hi @shirondru,

Regarding your example (6_26745970_C_T) , it is consistent with the way we generate variant information for the Genetics Portal:

  • The list of variants are generated based on GnomAD2.
  • GnomAD 2 dataset provides the GRCh37 coordinates and the alleles.
  • We then lift over the GRCh37 coordinates to the new GRCh38 build. However at this point we do not align alleles or do not perform any kind of checks on the sequence.
  • The variant identifiers are then generated based on the GRCh38 coordinates and the alleles provided by the GnomAD2. (That’s why you will see G as reference allele if you look up the GRCh37 coordinates on the UCSC browser: chr6:26,755,915)

However there certainly is a discrepancy, this will hopefully resolved once we are migrated our variant annotation, variant index to the newer 3rd release of GnomAD. This update is expected to resolve a large number of issues in the Portal. You can expect this to be released somewhere in Q4 of this year.

You can re-generate the variant identifiers for GnomAD2 dataset based on the GRCh37 coordinates of the variant. (we store this information on the variant index).

Best,
Daniel