Difference in variant and variant_gene files

I was wondering how to explain the main difference between variants and v2g files?
In variants there are geneIDs retrieved and most_severe_consequence to explain what are those genes and gene distance.
How those genes are related to the variants ? Is it just the nearest protein coding genes based on distance or there is another reason of associating each variant to each gene in variant file ? I understand that this is completely different from the v2g file in which there are genes retrieved after the application of v2g pipeline.

Hi Dimitris,

The geneIDs and most_severe_consequence from the variants table are derived from the Variant Effect Predictor (VEP), most of the time, the geneIDs directly overlaps the SNPs, and the effects are in-silico predictions of the consequences of the SNPs, these predictions are based entirely on the DNA sequence alone and does not taken into account any additional functional data (which is what v2g does).

Best wishes,
Xiangyu

1 Like

Thank you very much for the clarification