I noticed that in the new release, there are fewer studies available compared to the previous Open Targets Genetics release. Specifically, my dataset included about 11,000 studies from the earlier version — many of which were derived from NEAL2 and SAIGE. I can’t find these studies in the new release.
Could you please clarify:
Whether NEAL2 and SAIGE studies were entirely excluded from the new release?
If so, what was the rationale for their removal?
My analysis depends on the availability of these studies because I’m using them to map EFO traits to gene causality, not for drug discovery purposes.
This is correct: NEAL2 and SAIGE studies were entirely excluded from the 25.03 release, because they are or will be superseded by GWAS from the GWAS Catalog (e.g. fastGWA database and pan-UKBB, which will be added in the future).
Those studies were never part of the GWAS Catalog, they were ingested specifically for Open Targets Genetics. Since there are more recent studies that supersede them, so we chose not to include them when we added our statistical genetics analyses into the Open Targets Platform.
I would also note that the updated Open Targets Platform doesn’t contain fewer studies than the last release of Open Targets Genetics. Can I ask how you made that calculation?
Dear Helena, I downloaded all studies from openTargets genetics and filtered for those studies with causal genes ( a causal gene has l2g score > 0.5), and also I excluded those studies with no EFO terms representing the tested trait. This gives me a total of 17,000 studies. Now when I compare with studies in the new release, I found that ~12000 studies not there anymore. Here is the source composition of the missing studies. I can send you the list. In fact the majority were in GCST initials. I higlhy appreciate your help thanks. StudID_prefix Count
NEALE2 926
FINNGEN 981
GCST 9022
SAIGE 273
Quite a lot of things have changed in the new pipeline compared to Open Targets Genetics and we believe we should have a more trustworthy representation of GWAS association across the universe of publicly available information. Our current pipeline is not perfect, but we know in most cases should provide a more precise landscape compared to OTG.
An example of how things have changed is the p-value significance for GWAS association has changed from 5e-8 to 1e-8. Through our internal benchmarks, this is the main reason why associations that just passed GWAS-significance thresholds in OTG are not replicated in the new Platform.
Another improvement we have made in the new pipeline is that we are now much better at tracing the reason why we decided a GWAS association or study didn’t pass our internal QCs. In the excluded FTP path, you can find the study and credible_set datasets that report observations that were observed in the source but not included in the Platform. In both datasets, the qualityControls column flags potential concerns, a subset of them being reasons for exclusion.
If you want to share particular associations in the GWAS Catalog or Finngen studies that are no longer supported we can try to assist you in finding the reason why they are no longer there.