What’s the best balance between scd type 2 history and query performance?
#1
I'm trying to decide if we should implement a slowly changing dimension Type 2 for our customer region data, but I'm worried it will make our core fact table joins much more complex and slow down reporting. Has anyone found a practical balance between maintaining that historical accuracy and keeping query performance manageable?
Reply
#2
We tried slowly changing dimension Type 2 for the region and it did change the joins on the fact table. Our month end reports slowed a bit until we moved the history into a separate region history table and used a surrogate key in the fact table. It helped a few dashboards but there were still drift stories when the region moved and the history did not load everywhere.
Reply
#3
Another path we tested was keeping current region data in a lightweight dimension and putting most history in a dedicated history store that we joined only for the rare requests. Reports that need it got it on demand which saved most queries from dragging. The ETL logic got more involved though and we kept trimming what we load.
Reply
#4
I worry the problem might not be the history layer at all but the way we model the grain of the fact and the number of joins. We tried reducing joins in the BI layer and the numbers looked stable but audits felt less precise. Hard to tell if we are fooling ourselves.
Reply
#5
Have you measured how much time the joins add versus the ETL cost and whether the accuracy you gain justifies the extra complexity?
Reply


[-]
Quick Reply
Message
Type your reply to this message here.

Image Verification
Please enter the text contained within the image into the text box below it. This process is used to prevent automated spam bots.
Image Verification
(case insensitive)

Forum Jump: