Star and snowflake are the same idea, a central fact table surrounded by dimensions, drawn at two different levels of normalization. Knowing which to pick is mostly knowing what you’re optimizing for.
Star schema
Dimensions are denormalized: one flat table per dimension, even if that means repeating values. dim_geography holds country, region, city together, with country = 'US' repeated across thousands of rows.
- Pro: queries join the fact to one table per dimension. Fewer joins, simpler SQL, faster reads.
- Con: redundancy. Updating a country name touches many rows; storage is larger.
Snowflake schema
Dimensions are normalized into sub-dimensions. dim_city → dim_region → dim_country, each referencing the next by key.
- Pro: no redundancy. A country name lives in exactly one row. Cleaner writes, smaller storage.
- Con: a query grouping by country now joins city→region→country, which means more joins, more complex SQL, and slower reads.
When to use each
Use a star when the workload is read-heavy analytics (it almost always is), dimensions are modest in size, and you want simple, fast queries. This is the default for product analytics, BI, and dashboards. Spectrum is a star for exactly this reason.
Use a snowflake when a dimension is genuinely large and hierarchical, redundancy is costly or update-prone, or you’re constrained on storage. In practice this is the exception.
Takeaway
Default to a star: read performance and query simplicity win for the analytics workloads most teams have. Normalize a dimension into a snowflake only when that specific dimension’s size or update pattern makes the redundancy a real problem. Don’t snowflake on instinct; denormalization is a feature here, not a smell.
