Can I create my own graphql database for use in the Platform UI from the data download?

thondeboer · 1 November 2022 15:49

Is there some docs or code to help me create my own GraphQL database from the data download? We would like to extend the information in the database and create our own version of the Platform for internal use, showing that new data.

JarrodBaker · 9 November 2022 14:51

I’ll give a broad overview here, but depending on your specific requirements you might need further assistance.

An overview

The GraphQL interface itself does not contain any data (a stateless service), it is an interface which allows us to query two databases which host the data for OTP; in our documentation we refer to it as the API. The two databases (one Elasticsearch and the other Clickhouse) need to be built using the outputs of the OTP ETL. The web application queries the API to provide a GUI for the OTP.

The databases are prepared using code from this repository. Within that repository, the directory terraform_create_images hosts the code which we use to create the necessary GCP VMs which we use to run OTP.

The four components are deployed using this repository. This again uses Terraform to deploy to GCP.

We deploy our infrastructure on Google Cloud Platform, and as such much of the deployment code is tied quite tightly to that service. Each of the individual components is deployed either as a VM or a Docker image.

A rough guide to releasing

Typically our release process has the following steps:

Collect all the necessary ETL inputs using the Platform Input Support repository. You don’t need to do this yourself, the outputs of this step are available from either Google Cloud (gs://open-targets-data-releases/22.09/input) or the EBI FTP (http://ftp.ebi.ac.uk/pub/databases/opentargets/platform/22.09/input/).
Using the outputs from step 1, use the ETL to create the datasets. If you are adding in additional data, this is probably going to be where you need to do it. You can examine our ETL outputs from either Google Cloud (gs://open-targets-data-releases/22.09/output) or the EBI FTP (http://ftp.ebi.ac.uk/pub/databases/opentargets/platform/22.09/output/).
Using the outputs from step 2, load the Clickhouse and Elasticsearch databases with the code in platform-output-support. You could potentially add in additional data at this point. There is a risk in doing so. The datasets interact through ID terms (a target might reference a drug, which references a disease). For example, the query:

query target_to_drug_to_disease {
  target(ensemblId: "ENSG00000157764") {
    knownDrugs {
      rows {
        drugId
        disease {
          id
        }
      }
    }
  }
}

Because our inputs are produced in step 2 we can validate that each referenced drug referenced by the target, actually exists in the data. The same is true for each disease referenced by the drug. If you add in unrecognised entries the API will not be able to return valid responses.

Deploy the created images using the terraform-google-opentargets-platform repository. You’ll likely have to update this quite significantly if you’re not deploying to GCP.

thondeboer · 9 November 2022 17:04

Thanks Jarod, this is indeed what I needed! I am getting the input files directly from the bucket and will add my evidence from there, so it gets validated through the pipeline as well etc. I would use the platform-etl-support scala code to add my data into the process and then take it all the way to the clickhouse/ES servers and use the platform-api to create the graphql API that will then be used by the UI…Quite the process, but I am getting there!

Thon

Topic		Replies	Views
Sequence ontology data missing from output and ETL step Data Access graphql	2	315	29 November 2022
Extensible GraphQL API Template for Serving Custom Data GraphQL API	1	239	27 January 2023
Installing the Open Targets Platform locally Platform Infrastructure	2	338	16 September 2022
Date of access using API GraphQL API	1	295	13 June 2023
How much space do I need to download the full OT Platform data? Data downloads ot-platform	4	477	2 November 2021

Can I create my own graphql database for use in the Platform UI from the data download?

An overview

A rough guide to releasing

Related topics