I am not so clear about distance-based clustering in gwas-catalog data, can you give me some further explanation?
Identify studies with an abnormally large number of reported loci compared to number of loci after distance-based clustering (±500kb). Criteria for “abnormal” are set to number of loci > 10, and decrease in loci count after clustering of >10%. For “abnormal” studies (N≈120), apply distance based clustering.
GWAS Catalog can contain none independent loci where summary statistics are bulk imported from supplementary tables. To correct this, the distance-based clustering step is used to remove redundant variants.