V2g data downloads

Hi,

Hope you are doing well. Recently I’ve started to explore the data download section. For the moment I’ve downloaded the 22.09 v2g dataset. The data contains aproximately 1 billion rows and 22 columns. The list of columns is: ‘chr_id’, ‘position’,‘ref_allele’,‘alt_allele’,‘gene_id’,‘feature’,‘type_id’,'source_id’, 'fpred_labels’, ‘fpred_scores’,
'fpred_max_label’, 'fpred_max_score’, ‘qtl_beta’,‘qtl_se’,'qtl_pval’, 'qtl_score’, 'interval_score’, ‘qtl_score_q’,'interval_score_q’, 'd’, ‘distance_score’,
‘distance_score_q’.

If I understand correctly, this dataset would allow me to link genetic variants to either QTL, fpred, interval and distance scores. However, there are certain aspects that are unclear for me, for example:

1 - The QTL scores refer to which type of assay: eQTLs, pQTLs or sQTLs?

2 - What’s the biological meaning of fpred, interval and distance scores?

3 - What’s the threshold to keep statistically significant associations between variants and QTL scores?

Best regards,
Felipe

Hi Felipe,

Thank you for using the Open Targets resource!

1 - The QTL score is an aggregate across all three types of QTLs, weighted accordingly.

2- “fpred” is the in-silico functional prediction derived from VEP score, which indicates the translational/transcriptional consequences of the given variant.

The interval score is derived from the physical 3D chromatin interactions between a variant and its assigned gene.

In addition, it also takes into account the correlation of regulatory activity between the variant and the TSS of its assigned gene.

The biological meaning of this can be interpreted as evidence that the variant is influencing the expression of a target gene via enhancer-promoter interactions.

Finally, the distance score is based on the linear genomic distance between the variant and the gene. Biologically, non-coding variants often affect the expression of the closest gene in linear distance, but not always.

3 - For all QTLs, we keep variants that pass the following threshold:
p≤ (0.05 / number of variants tested for the gene)

Hope this clears things up! There are further details on our documentation, but please follow up here too if you have additional queries!

Xiangyu

1 Like