Fine-mapping using the GCTA-COJO software

Hi and thank you for the development of this platform. I would like to implement the COJO pipeline on my data as performed here by Open Targets. I am aware of the documentation page and the GitHub repository, however there is something that is still unclear to me on how was the pipeline applied. As I understand from these sources, the COJO --slct pipeline was applied on a set of summary statistics in each locus for the region surrounding the top SNPs of the summary statistics. From these results, the jma.cojo output is used for rerunning the -cojo-cond algorithm in order to condition on the variants identified by --slct and ultimately construct credible sets.

I have three questions:

  1. First is the conditioning performed even on loci that have a single independent locus identified by --slct algorithm, in order to see whether in a credible set or “signal” multiple causal variants are present?
  2. Second, if the answer to the above question is no, how do we construct credible sets to loci in which a single independent snp is identified by --slct, since in fine-mapping a signal may consist of a single causal variant (CCV) or even more?
  3. Another question is that when we are conditioning using --cojo-cond how do we construct separate credible sets per locus. E.g. if a locus has 10 independent loci generated by --cojo-slct, how do we construct 10 credible sets if we are only running --cojo-cond using all 10 snps to condition the summary statistics on?

I really appreciate any input. Thank you.

Welcome to the OTG community and thank you for your question! In response to your questions:

  1. No, the conditioning step is only performed when there are multiple independent signals within a window (2mb in our implementation).
  2. If there is a single independent, significant SNP at a locus, and it is not in LD with any other SNPs in the region, then the credible set would consist of just the one SNP, with posterior probability of 1.
  3. Separate credible sets are computed for each set of conditionally independent summary stats. In your example, there would be 10 credible sets, each would be conditioned on the other 9 independent signals.

Best wishes,
Xiangyu

Thank you for your reply, I really appreciate it.

As I understand it, the slct algorithm of the software is applied and the resulting independent variant IDs are used subsequently in the cond algorithm. With regards your second answer, if I condition each signal based on the remaining signals in a locus, isn’t it possible that his would result in overlaps between credible sets?

Also something is still unclear to me. For loci with multiple independent signals, the ABF can be used to construct the comprising credible sets, which I get. In cases however, where there is only one single independent variant per locus identified by the slct algorithm, how do I proceed with the credible set construction for this independent signal? You mentioned LD, but how do I find this information, can you elaborate on the steps following the identification of single independent variant from slct?

Thank you and apologies for the confusion. It seems that I miss some of the steps of the pipeline.