P-values and standard errors of credible set variants

Hello

Having downloaded the credible set parquet files I noticed it seems only the lead variant in a credible set has a p-value, and none of the variants have standard error (while all variants have beta). Am I missing something or will p-values and standard errors for all credible set variants be available?

Thanks!

Hello @Juha_Karjalainen ,

Thank you for exploring our datasets.

We are aware of some of the problems (You mentioned), when attempting to use the statistics from the locus object of credible_set dataset. We are working to resolve them, thus for the time being, we encourage users to use the actual values directly from the summary statistics source.

As you may know, credible sets come from multiple sources like:

  • GWAS Catalog
  • UKBB PPP
  • eQTL Catalogue
  • Finngen

While we do the fine-mapping of credible sets for GWAS Catalog and UKBB PPP datasources with SuSiE-inf and PICS methods implemented in gentropy depending on the source and conditions described in fine-mapping pipelines. For other datasources we rely on the fine-mapping results they provide.

Missing p-values in credible set locus

Unfortunately as of the June release we do not persist the original p-values from summary statistics when the credible sets are result of our fine-mapping methods. This include credible sets from SuSiE-inf and PICS , although the p-values should be accessible when the fine-mapping was done by the datasource, either by FinnGen or eQTL Catalogue (SuSie).

finemappingMethod numberOfEmptyPvalVariants variantCount
SuSie 0 55400481
SuSiE-inf 9445449 9445449
PICS 9027469 9327232

We are exploring the possible workarounds for that discrepancy, but for now the best way to bring the p-values for missing loci is to go back to the original summary statistics on the datasource provider portals.

Beta and standard error

Unfortunately it gets more confusing here.

In the case of the beta, values depends on the fine-mapping method used.

  • SuSie FinnGen - beta is reported as mean from FinnGen SuSiE SNP files, see ingestion-step.
  • SuSIE-inf - beta represents unscaled ÎĽ (mean), see fine-mapper
  • PICS GWAS Catalog and SuSie eQTL Catalogue - beta represents the original beta and is only reported for the lead variants in case of PICS if provided by the source (some lead variants fine-mapped with PICS from GCCA - GWAS Catalog Curated Associations may lack it).

For the standardError all of the values are provided from the summary statistics (if available)

finemappingMethod numberOfEmptyBetaVariants numberOfEmptySEVariants leadVariantCount variantCount
SuSiE-inf 0 9445449 501136 9445449
SuSie 0 0 2182186 55400481
PICS 9079730 0 299763 9327232

On the note of the above, we are planning the revamp on the StudyLocus dataset to actually distinguish between the original statistics and ones resulting from fine-mapping.

I will post back as soon as these issues get resolved.

With kind regards,
Szymon Szyszkowski

Thank you for the explanation Szymon!

I see - I think it would be fantastic if the original p-values and standard errors from the summary statistics would be joined to the fine-mapping results (when they are available so in case of SuSiE-inf). This way any fine-mapped variant’s statistics would be directly accessible without having to download the original full summary stats and resort to them. Thanks for pointing out you use the posterior mean for FinnGen - I’m from the FinnGen community and we actually report the original marginal beta for fine-mapped variants in downstream tools but have both the marginal beta and posterior mean available. Happy to chat if helpful