Access to ftp json files of Colocalisation analysis

Hello,

I was wondering where i can find the json files of the colocalisation analysis of OT genetics.

Thank you in advance :slight_smile:

Hi,

The colocalization datasets can be downloaded from here: ftp://ftp.ebi.ac.uk/pub/databases/opentargets/genetics/latest/v2d_coloc in parquet format.

The schema of the dataset looks like this:

root
 |-- coloc_n_vars: integer (nullable = true)
 |-- coloc_h0: double (nullable = true)
 |-- coloc_h1: double (nullable = true)
 |-- coloc_h2: double (nullable = true)
 |-- coloc_h3: double (nullable = true)
 |-- coloc_h4: double (nullable = true)
 |-- left_type: string (nullable = true)
 |-- left_study: string (nullable = true)
 |-- left_chrom: string (nullable = true)
 |-- left_pos: integer (nullable = true)
 |-- left_ref: string (nullable = true)
 |-- left_alt: string (nullable = true)
 |-- right_type: string (nullable = true)
 |-- right_study: string (nullable = true)
 |-- right_bio_feature: string (nullable = true)
 |-- right_phenotype: string (nullable = true)
 |-- right_chrom: string (nullable = true)
 |-- right_pos: integer (nullable = true)
 |-- right_ref: string (nullable = true)
 |-- right_alt: string (nullable = true)
 |-- coloc_h4_h3: double (nullable = true)
 |-- coloc_log2_h4_h3: double (nullable = true)
 |-- is_flipped: boolean (nullable = true)
 |-- right_gene_id: string (nullable = true)
 |-- left_var_right_study_beta: double (nullable = true)
 |-- left_var_right_study_se: double (nullable = true)
 |-- left_var_right_study_pval: double (nullable = true)
 |-- left_var_right_isCC: boolean (nullable = true)

As the dataset is specific for colocalization, you might need to join the table with association and study level information stored under https://ftp.ebi.ac.uk/pub/databases/opentargets/genetics/latest/v2d/.

Please let us know if there’s anything else we can help with!

Best,
Daniel

1 Like

Hi Daniel,

This is a great table. Would love to know what “is_flipped” means? And can I assume that
“left_var_right_study_beta” is the beta for the left-hand variant, but for the ‘right_study’, as found in the summary stats table? Or is there some flipping of alleles going on there?

Thanks,
Clare.

Huh, this is a bit of a black magic. To make the colocalisation process efficient, each comparison is only done once (A vs B done, but not B vs A). This process yields partial data however containing only half of the matrix of colocalising peaks. To complete the matrix, there’s a “flipping”. The code for the process can be found here.

There is a rather obscure resource with some explanation on the columns here. It says:

  • is_flipped: Only the upper triangle of the pairwise matrix is calculated. This field shows whether the row comes from a reflection of that matrix or whether is it the original estimate.
  • left_var_right_study_beta: Beta estimated for the left variant in the right study

If there’s anything to clear up, let us know.

Hi Daniel,

Thanks very much for this, that’s helpful!

Presumably as the colocalisation is a pairwise procedure, the ‘reflection’ always yields the same colocalisation result (i.e the coloc_h4 is the same, regardless of whether it was A vs. B or B vs. A)? But I understand then that the ‘left_var’ could be a different variant as that depends on which is the ‘left_study’.

Am I also correct in assuming that the variant for the ‘left_study’ corresponds to a ‘lead_variant_id’ in the credible set table?

Cheers
Clare.

Presumably as the colocalisation is a pairwise procedure, the ‘reflection’ always yields the same colocalisation result (i.e the coloc_h4 is the same, regardless of whether it was A vs. B or B vs. A)? But I understand then that the ‘left_var’ could be a different variant as that depends on which is the ‘left_study’.

Yes, this is all correct! The comparison is pairwise, the flipping procedure is as simple as changing column names from left to right and vice versa. The colocalisation statistics remained unchanged. The colocalization is also a study aware process, as study/locus pairs are compared, so when flipping, studies also flipped.

Am I also correct in assuming that the variant for the ‘left_study’ corresponds to a ‘lead_variant_id’ in the credible set table?

Yes, that’s also correct. I hope it all makes sense.

Great, thank you. Yes, that all makes sense (and it is a fantastic resource!).

Clare.

1 Like