Hello @Juha_Karjalainen ,
Thank you for exploring our datasets.
We are aware of some of the problems (You mentioned), when attempting to use the statistics from the locus object of credible_set dataset. We are working to resolve them, thus for the time being, we encourage users to use the actual values directly from the summary statistics source.
As you may know, credible sets come from multiple sources like:
- GWAS Catalog
- UKBB PPP
- eQTL Catalogue
- Finngen
While we do the fine-mapping of credible sets for GWAS Catalog and UKBB PPP datasources with SuSiE-inf and PICS methods implemented in gentropy depending on the source and conditions described in fine-mapping pipelines. For other datasources we rely on the fine-mapping results they provide.
Missing p-values in credible set locus
Unfortunately as of the June release we do not persist the original p-values from summary statistics when the credible sets are result of our fine-mapping methods. This include credible sets from SuSiE-inf and PICS , although the p-values should be accessible when the fine-mapping was done by the datasource, either by FinnGen or eQTL Catalogue (SuSie).
| finemappingMethod |
numberOfEmptyPvalVariants |
variantCount |
| SuSie |
0 |
55400481 |
| SuSiE-inf |
9445449 |
9445449 |
| PICS |
9027469 |
9327232 |
We are exploring the possible workarounds for that discrepancy, but for now the best way to bring the p-values for missing loci is to go back to the original summary statistics on the datasource provider portals.
Beta and standard error
Unfortunately it gets more confusing here.
In the case of the beta, values depends on the fine-mapping method used.
SuSie FinnGen - beta is reported as mean from FinnGen SuSiE SNP files, see ingestion-step.
SuSIE-inf - beta represents unscaled ÎĽ (mean), see fine-mapper
PICS GWAS Catalog and SuSie eQTL Catalogue - beta represents the original beta and is only reported for the lead variants in case of PICS if provided by the source (some lead variants fine-mapped with PICS from GCCA - GWAS Catalog Curated Associations may lack it).
For the standardError all of the values are provided from the summary statistics (if available)
| finemappingMethod |
numberOfEmptyBetaVariants |
numberOfEmptySEVariants |
leadVariantCount |
variantCount |
| SuSiE-inf |
0 |
9445449 |
501136 |
9445449 |
| SuSie |
0 |
0 |
2182186 |
55400481 |
| PICS |
9079730 |
0 |
299763 |
9327232 |
On the note of the above, we are planning the revamp on the StudyLocus dataset to actually distinguish between the original statistics and ones resulting from fine-mapping.
I will post back as soon as these issues get resolved.
With kind regards,
Szymon Szyszkowski