Good evening,
I was just curious if anyone has a recommendation for a particular use case that I’ve come across, which sort of relates to change data capture (CDC).
Let’s say I stood up my infrastructure successfully and loaded data into ElasticSearch and ClickHouse from release X.0. When release X.1 is published, I only want to import the data that we can refer to as the delta between release X.1 and X.0:
Delta = X.1 - X.2
This would reduce the compute (and therefore, financial burden) on the data consumer and support refreshes as new releases are published.
Are there any recommendations for how to approach this use case? Does BigQuery provide any tooling that can be leveraged to support the use case? Even something like transaction logs for the source table would maybe be helpful in identifying new/changed rows (that brings me to the realization that the equation I posted above doesn’t account for existing rows that are updated- if there is such a thing).
Thanks for your help!
Cole