Create an OpenTargets fullstack clone

Hi, I run the DACC for the Impact of Genomic Function on Variation consortium .

It has been suggested that the IGVF extend OpenTargets software rather than build our own giant genomics db from scratch.
In order to do this, we would need to create a full clone of your entire software stack; ideally on AWS.

Is there anyway you can help us do this?

Hi Ben,

As far as I know, support by EBI is done only for members of the consortium but our company called The Hyve is more than happy to help you with this. We have set up open targets for multiple clients. See https://www.thehyve.nl/services/open-targets for more information.

Good luck,

Sjoerd.

Hi @Ben_Hitz,

As @Sjoerd mentioned, we don’t provide support for particular instances. However, we want to make as easier as possible for others to spin out their own versions and we accommodate our stack to make this possible. As part of the open source spirit, we also welcome any contributions in the form of code or documentation.

If you want to share your particular use case, I’m sure the community could help you to make some progress and you can decide whether you need additional services.

@thondeboer recently shared some success on creating a clone. CHOP has also recently branched the application and deployed in the public domain https://www.chop.edu/news/chop-helps-develop-platform-speed-drug-development-kids-cancer

I hope this helps

1 Like

Hi Ben,

Indeed I was able to implement the OT Platform internally here, to combine our own evidence data with the ones provided by OT. The GITHUB repos contain most of the information you need, although if you want to install OTP locally and not use Google cloud as OT does, you are going to have to do some wrangling since there is a lot of assumptions that the deployment is on Google…

But it can definitely be done (as @Sjoerd and his company The Hyve shows) but it is not an easy lift…Let me know if you want to chat about what it will take and I can show you…

Thon

1 Like

Maybe start by organizing what github repositories are important for which functions; possibly this is buried in some help doc somewhere but I found it hard to follow the global organization.

Our use case is essentaily a massive extension of what is in OpenTargets: Genetics. We would add specifically LD blocks for various ancestry groups, and most importantly definition of “cCRES” (putative conserved regulatory elements) defined either computationally or via assays such as ATAC-seq + histone ChIP-seq etc. Then these elements/genes/variants are all linked by other sets of experiments - including full gene regulatory networks.

OpenTargets genetics has a useful baseline for us (Variants, genes, GWAS) but we would extend it to specific regulatory interactions and use it to house the experimental results and computational predictons from our IGVF consortium.

Hi @thondeboer,

Congratulations on deploying OT private instance.
I am also trying the same. When running the OT tractability instance from: GitHub - chembl/tractability_pipeline_v2: Pipeline for assessing the tractability of potential targets (starting from Gene IDs), the pipeline is starving for memory.
I had 16GB ram (with available 13GB), the error message said, the protac pipeline needs 13.7 GB. When I increased the RAM to 32 GB, the pipeline fails in same stage and says it needs 71 GB of RAM.

By any chance, do you remember, how much RAM did you needed for running OT tractability utility. It will kind of you if you can share your suggestion regarding the memory needed for deploying OT platform on personal computer.

I have scaled my input for 2 Ensembl ids to 64 ids, the problem is the same. This means it is not the input but the pipeline need little tweaking in my case.

Thanks in advancee,
Saurav

Hello @sauravsaha,

Have you been able to fix the problem? In case it helps, I want to let you know that tractability is run for each of our releases and that we make available the dataset in a Google Storage bucket. For example, you have the latest one here: https://storage.googleapis.com/otar001-core/Tractability/23.02/Tractability_23.02_tractability_buckets_2022-01-13.tsv

Thanks for noting you are finding it hard to reproduce the tractability pipeline. I have passed this comment to our colleagues to see what is going on.

Hi @irene,

Thank you for your response.

I was able to run the tractability pipeline but with much higher RAM.
I am trying to build Open Targets in command line using the scripts kept in Github for each pipeline.

I do not want to use Google Storage bucket and would like to use local server for the development. The pipelines that is of interest to me are tractability, safety, Chemical probes & TEPs, Baseline expression, and Molecular interactions, as I wish to customize the tool according to our current use case.

The data used by tractability pipeline in the github (tractability_pipeline_v2/ot_tractability_pipeline_v2/data at master · chembl/tractability_pipeline_v2 · GitHub) is not just a tsv file but multiple files. I believe downloading the data from Open Targets and then writing scripts for extracting features from the scratch looks to be immediate way of deploying Open Targets. Any guidance/suggestions regarding this?

Thanks in advance.

Kind regards,
Saurav

1 Like