Login

I’ve been trying to implement a slowly changing dimension type 2 pattern in our data warehouse, but I keep running into issues with the historical records becoming misaligned after a batch job fails partway through. How do you all handle making the updates to the dimension table and the fact table joins truly atomic?

we run with a staging area and a single transactional swap. changes land in a staging table, then in one atomic operation we close the old SCD type 2 row and insert the new one for the dimension, and only after that do we point the fact to the new surrogate. if the batch dies midflight, nothing commits to production and the history stays intact. we also keep a small audit flag in the staging to prove what happened.

our warehouse doesn't support cross schema transactions, so we still see misalignment when a job crashes. we ended up with orphaned facts or half updated keys and added a nightly reconciliation and a cleanup job, which feels like patching a broken process. are you sure the batch boundary is the real problem?

we added a version column to both the dimension and the fact rows so the join resolves to the correct version; it helped a bit with drift when late arrivals showed up, but the performance cost was real and the gains were incremental.

i once chased a thread about atomicity and ended up realizing the data quality from the source was the real limiter; insisting on a perfect all at once update was chasing a mirage. we applied stricter batch windows and upstream validation instead, and the rest felt like a sidestep.

Login
Username:
Password:	Lost Password?
	Remember me