Hi @hcornu, this is fantastic work. I am wondering if I was interested in downloading the harmonised sumstats (am primarily thinking GWAS catalog and possibly UKBB/Fingen as well) such that I have a local-ish copy to use in some pipelines if this is something that can be done through Google Cloud and if so – how?
Hi @Kirill_Tsukanov, many thanks for the response back. This has somewhat answered my question, but I am wondering how to download the copy from google cloud storage onto a local directory rather than simply ‘move’ it to another google cloud bucket. Is that something that can be done?
Hi @marcustutert, if you want to download the files locally, then you first need to do the export as described in that answer (note that it doesn’t only move the data, it exports it from the BigQuery database into the Google Storage bucket). Once this is done, you will be able to download the data locally using:
Thanks again @Kirill_Tsukanov for the helpful reply. Previously when I spoke to members of OTAR, to download the data (locally and off the google cloud bucket) they informed me it would cost £ to do so. Is this still the case? Also in the event I wanted to pull more up-to-date data from OTAR from google cloud, how would you suggest I do so? Is there a way to automate a sync that only collects the “diff” between the two files perhaps?
The egress charges don’t apply when you move the data within the cloud; for example, if you export the BigQuery dataset to a cloud bucket using the command I described below).
However, the egress charges will apply whenever you export the data from the cloud bucket to your local setup outside the Google Cloud infrastructure. This would apply in any case, whether you’re downloading from our “open-targets-genetics-releases” bucket, or from your own bucket to which you exported the BigQuery data.
It’s important to note that these are being charged by Google, so we have no control over that (and none of that money goes to Open Targets as well).
You can always fetch all of the data generated by Open Targets for free from the EMBL-EBI FTP server. Please see these links for further reference:
Finally, to address your question about automatic a sync. Unfortunately there isn’t a straightforward way to do this. Note that the schema of the data may (and frequently does) change between the releases, including adding and reorganising certian data fields. So it’s not an easy task to do a incremental sync in this situation.
Please let me know if you have any further questions, I’ll be happy to help
Note that Open Targets doesn’t provide harmonised sumstats (though we use them internally). For those you would need to go to the original providers, namely, GWAS catalog, FinnGen, and wherever you want UKB sumstats from. OT fine-mapping data is available though (v2d_credset).