Should I benchmark star vs snowflake schema for my data warehouse?
#1
I’m trying to decide between a star and snowflake schema for our new sales data warehouse, and I’m stuck on how much to normalize the dimension tables. Our product hierarchy is pretty complex, but I worry the snowflake might make our BI tools slower with all those joins. Has anyone actually benchmarked this for a similar use case?
Reply
#2
I’ve lived this one. We started with a wide, denormalized product dimension and kept the other dimensions fairly flat. Dashboards and standard KPI paths loaded noticeably faster than when we added more normalization. The upside was simpler queries and shorter ETL cycles; the downside was more duplicated attributes and a headache keeping product data in sync during catalog changes.
Reply
#3
We did a quick bench once with a couple hundred thousand fact rows and a handful of dims. Read latency for the main KPIs tended to be better with the wide dimension, while ad hoc filters that touched the normalized joins slowed a bit. The normalized version reduced update risk but made common BI paths a few milliseconds slower due to extra lookups.
Reply
#4
I’ve seen teams rely on caching and materialized views to cover the same ground, sometimes more effectively than fighting with the schema. Our BI tool caches hot queries, and a nightly materialized summary helped a lot more than deeper normalization.
Reply
#5
Are you sure the deep hierarchy is the real problem or is data latency and stale dims the bottleneck?
Reply


[-]
Quick Reply
Message
Type your reply to this message here.

Image Verification
Please enter the text contained within the image into the text box below it. This process is used to prevent automated spam bots.
Image Verification
(case insensitive)

Forum Jump: