Distance-based clustering in gwas-catalog data

Shicheng_Guo · 4 May 2022 02:33

Dear Team,

I am not so clear about distance-based clustering in gwas-catalog data, can you give me some further explanation?

- Identify studies with an abnormally large number of reported loci compared to number of loci after distance-based clustering (±500kb). Criteria for “abnormal” are set to number of loci > 10, and decrease in loci count after clustering of >10%. For “abnormal” studies (N≈120), apply distance based clustering.

Thanks.

Shicheng

Xiangyu · 13 May 2022 11:41

Hi Shicheng,

GWAS Catalog can contain none independent loci where summary statistics are bulk imported from supplementary tables. To correct this, the distance-based clustering step is used to remove redundant variants.

Xiangyu

Kirill_Tsukanov · 17 May 2022 08:18

Thank you @Xiangyu. I’m going to mark this as resolved for now. @Shicheng_Guo please let us know if you have any further questions

Shicheng_Guo · 17 May 2022 20:32

Thank you Xiangyu and Kirill, Can you share me the script how it was performed? or Is it shared in the github page?
Thanks
Shicheng

Shicheng_Guo · 17 May 2022 21:54

Hi Xiangyu and Kirill,

Is there any internal discussion about my previous poster?

Thanks.

Shicheng

@Kirill_Tsukanov @Xiangyu

Kirill_Tsukanov · 18 May 2022 06:57

I’ll let @Xiangyu get back to you about where to find the relevant code. Unmarking as resolved for now.

I’ll get back to you separately on the other post about mendelian randomisation.

Xiangyu · 18 May 2022 08:58

Hi Shicheng,

Yes, the distance based clumping is a part of the v2d processing pipeline, and the script for it is here.

Best wishes,
Xiangyu

Topic		Replies	Views
Mismatches between Open Targets Genetics and the GWAS Catalog (Sliz et al. 2021 (GCST90027161)) Data issue genetics-portal	2	411	30 November 2022
How was LD clumping performed to select eQTLs for candidate gene? Open Targets Genetics FAQs	3	413	11 May 2022
Is Open Targets Genetics considering integrating Mendelian gene enrichment as a feature for GWAS locus prioritisation? Genetics feature requests genetics-portal , data-updates	0	347	23 May 2022
Are you considering implementing two-sample MR? Genetics feature requests data-updates	1	493	20 May 2022
Understanding GWAS association generation and score calculation in the Platform General	2	44	19 June 2026

Distance-based clustering in gwas-catalog data

Related topics