Login

I’m trying to decide between a star and snowflake schema for our new sales data warehouse, and I’m stuck on how much to normalize the dimension tables. Our product hierarchy is pretty complex, but I worry the snowflake might make our BI tools slower with all those joins. Has anyone actually benchmarked this for a similar use case?

I’ve lived this one. We started with a wide, denormalized product dimension and kept the other dimensions fairly flat. Dashboards and standard KPI paths loaded noticeably faster than when we added more normalization. The upside was simpler queries and shorter ETL cycles; the downside was more duplicated attributes and a headache keeping product data in sync during catalog changes.

We did a quick bench once with a couple hundred thousand fact rows and a handful of dims. Read latency for the main KPIs tended to be better with the wide dimension, while ad hoc filters that touched the normalized joins slowed a bit. The normalized version reduced update risk but made common BI paths a few milliseconds slower due to extra lookups.

I’ve seen teams rely on caching and materialized views to cover the same ground, sometimes more effectively than fighting with the schema. Our BI tool caches hot queries, and a nightly materialized summary helped a lot more than deeper normalization.

Are you sure the deep hierarchy is the real problem or is data latency and stale dims the bottleneck?

Login
Username:
Password:	Lost Password?
	Remember me