Should I use star schema or snowflake for our sales data warehouse?
#1
I’m trying to decide between a star and snowflake schema for our new sales data warehouse, and I’m getting stuck on how much complexity we really need. Our fact table is pretty straightforward, but the product dimension has several slowly changing attributes that branch into multiple normalized tables. I worry the snowflake approach will make our BI team’s queries more difficult, even if it feels cleaner from a normalization standpoint.
Reply
#2
We went with the denormalized approach for dashboards because the BI team hates chasing joins. Our product dimension started as a heavily normalized setup and the dashboards slowed while the join chains loaded. We kept the core product as a wide dimension and only pulled in a couple of slowly changing attributes through separate lookups when needed. It felt faster for typical reports, though the ETL got messier.
Reply
#3
We tried the normalized path for the product dimension, and it made the ETL dance more complex. Surrogate keys, multiple batch processes, and more table hops; but we did gain data quality and easier versioning of attributes.
Reply
#4
Is the real problem the schema or is it data quality in the product attributes? I started with a pilot where I kept the wide product dimension and only normalized the parts that actually changed often, but I kept fallback to join to a single table for common queries.
Reply
#5
We did a quick pilot and measured query latency on 50 most common reports. The difference between the two approaches was within a few percent for most sales queries, not enough to justify the extra maintenance. Still, it feels risky to bake in too much denormalization if product attributes change.
Reply


[-]
Quick Reply
Message
Type your reply to this message here.

Image Verification
Please enter the text contained within the image into the text box below it. This process is used to prevent automated spam bots.
Image Verification
(case insensitive)

Forum Jump: