Distance-based clustering in gwas-catalog data

Dear Team,

I am not so clear about distance-based clustering in gwas-catalog data, can you give me some further explanation?

    • Identify studies with an abnormally large number of reported loci compared to number of loci after distance-based clustering (±500kb). Criteria for “abnormal” are set to number of loci > 10, and decrease in loci count after clustering of >10%. For “abnormal” studies (N≈120), apply distance based clustering.

:speech_balloon:

Thanks.

Shicheng

Hi Shicheng,

GWAS Catalog can contain none independent loci where summary statistics are bulk imported from supplementary tables. To correct this, the distance-based clustering step is used to remove redundant variants.

Xiangyu

1 Like

Thank you @Xiangyu. I’m going to mark this as resolved for now. @Shicheng_Guo please let us know if you have any further questions

Thank you Xiangyu and Kirill, Can you share me the script how it was performed? or Is it shared in the github page?
Thanks
Shicheng

Hi Xiangyu and Kirill,

Is there any internal discussion about my previous poster?

Thanks.

Shicheng

@Kirill_Tsukanov @Xiangyu

I’ll let @Xiangyu get back to you about where to find the relevant code. Unmarking as resolved for now.

I’ll get back to you separately on the other post about mendelian randomisation.

Hi Shicheng,

Yes, the distance based clumping is a part of the v2d processing pipeline, and the script for it is here.

Best wishes,
Xiangyu

1 Like