Variant table and VEP annotations

I want to prioritize variants’ annotation from VEP – from variant table – with the final goal of having only one combination of associated gene/transcript for each variant. The definition of the table columns are not very clear. Is there any way for me to understand which gene/transcript pair is the most important combination according to OTs criteria?

When you run VEP command on terminal, you can add a flag (i.e. --pick --pick_order tsl,appris,rank) to prioritize annotated gene/transcript. However, I cannot find what was the original flags included in OTs variant table and the column definition is very concise. There is a column called “transcriptConsequences” and within that there is “transcriptIndex” which I initially thought as a priority index, but the definition is ambiguous.

I appreciate to have your input and help with this, thanks!

Hi @Ehsan_Khajouei, and welcome to our Community!

Here is the VEP query we use to populate our variant index. The transcript information found in transcriptConsequences is sorted based on two factors: the predicted functional impact (from most to least severe) and the distance from the gene’s footprint. This means the first item in the list corresponds to the closest gene with the most severe predicted consequence. If you’re looking to extract one gene per variant and this criterion fits your needs, you can use this first index to filter the object.

Thank you for your feedback! We’ll update the column description to be more informative. I hope this is helpful!

1 Like

FYI the consequence scores are manually defined by us in Gentropy: gentropy/src/gentropy/config.py at 161159fe4aa087d9b8f9b03169aa979462fde7d9 · opentargets/gentropy · GitHub

Hi @irene and thanks very much for your reply!

Just to make sure; if I use the first index within this expanded column “transcriptconsequences_transcriptIndex”, then I have the most severe & closest to the gene’s footprint transcript?

Thanks for sharing the links as well.

The exact behaviour is that transcripts are ordered by consequence score in the first place; in case there are multiple transcripts with the same score, it compares based on the distance to gene’s footprint. The approach is, therefore, to prioritise functional impact over proximity.

1 Like