Login

I’m trying to decide between using a star schema or a denormalized flat table for our new reporting layer, and I’m stuck. The flat table is simpler for our analysts to query directly, but I’m worried about data integrity and bloated storage as dimensions grow.

Years ago we went with a star schema. It kept the data consistent as dims changed and made it easier for folks to understand where the numbers came from. The ETL was a bit more work, and we added slowly changing dimensions to handle updates. Storage wasn’t a disaster and we relied on prebuilt aggregates to keep dashboards responsive.

Analysts here love a flat table for slicing and dicing, but we felt the pain when dims grew. One product update would cascade into thousands of rows that needed refreshing. We rebuilt the flat nightly and it felt fast until someone asked for a new metric. Then the flat table ballooned and storage and refresh times crept up.

Maybe the problem isn’t the schema at all but what you’re actually trying to answer. I tried the flat approach for speed and ended up chasing data quality issues in the pipeline. I’d run a tiny pilot comparing latency and accuracy and ask whether the bottleneck is the data or the model.

When we split the load, we kept a lean normalized core and offered a denormalized surface for the common reports. It wasn’t perfect, but analysts could get their dashboards quickly while we kept the integrity in the back end. We refreshed the flat layer and the core nightly and stitched in the BI tool.

Login
Username:
Password:	Lost Password?
	Remember me