Documentation Clarity - Platform ETL

Cole_DeVries · 6 January 2023 00:13

I am trying to use AWS Elastic Map Reduce (EMR) Serverless to replicate the steps that are taken in the following repository: GitHub - opentargets/platform-etl-backend.

I want to overwrite the reference.conf configuration to specify new common input and output file locations to point to S3 buckets, rather than S3 storage.

Is this possible to do using the application.conf file? Or do you recommend updating the reference.conf file with storage locations prior to running sbt assembly to build the fat jar? In reading the documentation in the README.md, it’s unclear which direction is recommended.

As a long shot ask, is there anyone that has attempted running the platform-etl-backend processes on AWS EMR?

Anything helps, and thank you for sharing such an amazing project!

mkarmona · 7 January 2023 11:53

The platform ETL uses PureConfig to manage configuration files. On the Open Targets backend ETL README (configuration section), it says how to overwrite partial structures you need as the rest of the configuration file will default to the reference configuration. This fragment shows how to include a JVM option to pass a custom configuration file with customised sections

-Dconfig.file=custom.conf

Topic		Replies	Views
Is there a release of platform-etl-support that accompanies the 22.09 release? Data downloads ot-platform , data	2	347	18 November 2022
Tag Release not updated Technical Support ot-platform	2	247	26 July 2023
Sequence ontology data missing from output and ETL step Data Access graphql	2	315	29 November 2022
How to deploy a local instance of OpenTargets? Technical Support	3	640	27 September 2024
Can I create my own graphql database for use in the Platform UI from the data download? Platform Infrastructure ot-platform , data	2	431	9 November 2022

Documentation Clarity - Platform ETL

Related topics