Login

I’m trying to decide between using a star schema or a single wide table for our new sales data mart, and I’m stuck on the performance implications for our specific queries. The joins in the star schema feel clean for analytics, but I worry about the query complexity slowing down our regular reporting.

I tried a star schema once for a sales data mart and the BI tools responded much faster when slicing by region and product. The clean joins are real, but the SQL they generate grows long once you filter on multiple dimensions. We solved it by adding a couple of aggregated fact tables and monthly summaries refreshed nightly. Without that, some dashboards crawled on big days.

It depends on what you actually run day to day. In our team the wide table version felt snappy for simple queries, but when users started adding more dimensions the queries got heavy and we started seeing NULLs and wide scans. The bottleneck turned out to be ETL and the load window more than the schema.

Action taken we compared two setups. With a star schema we added two aggregates such as month region product and kept the rest in the fact. The main sales dashboards dropped from about 12 seconds to 8 on typical queries, but ad hoc drill downs were slower unless you push predicates to the aggregates.

Do you actually need the multi dimensional filtering, or is the real bottleneck the ETL and data freshness?

Login
Username:
Password:	Lost Password?
	Remember me