Documentation Clarity - Platform ETL

I am trying to use AWS Elastic Map Reduce (EMR) Serverless to replicate the steps that are taken in the following repository: GitHub - opentargets/platform-etl-backend.

I want to overwrite the reference.conf configuration to specify new common input and output file locations to point to S3 buckets, rather than S3 storage.

Is this possible to do using the application.conf file? Or do you recommend updating the reference.conf file with storage locations prior to running sbt assembly to build the fat jar? In reading the documentation in the README.md, it’s unclear which direction is recommended.

As a long shot ask, is there anyone that has attempted running the platform-etl-backend processes on AWS EMR?

Anything helps, and thank you for sharing such an amazing project!

The platform ETL uses PureConfig to manage configuration files. On the Open Targets backend ETL README (configuration section), it says how to overwrite partial structures you need as the rest of the configuration file will default to the reference configuration. This fragment shows how to include a JVM option to pass a custom configuration file with customised sections

-Dconfig.file=custom.conf
3 Likes