How can I track column-level changes in data lineage when scripts alter data?
#1
I’ve been trying to build a reliable data lineage map for our ETL pipelines, but I keep hitting a wall when transformations happen outside our core tools. How do you all handle tracking column-level changes when a script in a completely separate system alters the data mid-flow? It feels like the lineage breaks unless I manually document every single hop, which isn’t sustainable.
Reply
#2
I’ve bumped into this exact thing. When an external script touched a column, we tried adding a sidecar record that logs a hash of each column before and after the transform, plus the source and target table and a timestamp. It helped a little for small changes, but once the external job ran a bulk update or changed the schema without notice, the lineage would break unless we manually documented every hop. It felt like chasing ghosts after a while.
Reply
#3
Maybe the problem isn’t the hops so much as the governance around what is trusted to mutate lineage. We started treating external transforms as unknown servers and forced a lightweight contract: if you touch data outside our tooling, you also push a lineage event or you don’t qualify for automatic map updates. It wasn’t easy to sell to the data team, but it cut down drift a bit.
Reply
#4
Short version: we did a quick hack log that captured the row count delta and a tiny column snapshot whenever the external script ran. It bought us a sprint or two, then it started sliding as data changed in ways we didn’t capture from outside. We dropped it because it felt brittle.
Reply
#5
Another thought that helps sometimes is to stop insisting on column level for every pipeline. We map at a higher level source to sink with invariants and use automated checks to flag when a column shows up with unexpected values. It isn’t complete lineage, but it keeps the map honest without drowning in external hops. Still not satisfied, though.
Reply


[-]
Quick Reply
Message
Type your reply to this message here.

Image Verification
Please enter the text contained within the image into the text box below it. This process is used to prevent automated spam bots.
Image Verification
(case insensitive)

Forum Jump: