Snowflake Dimensions Pattern
What are Snowflake Dimensions?
A snowflake dimension is a set of normalized tables for a single business entity. Instead of storing all attributes in one denormalized table, you create a chain of related dimension tables, forming a snowflake-like structure when visualized.
For example, products can be classified by category and subcategory. In a snowflake design:
- Products are assigned to subcategories
- Subcategories are assigned to categories
This creates a hierarchy: Product → Subcategory → Category
Conceptual Visual:
If you imagine the normalized dimension tables positioned outwards from the fact table, they form a snowflake design - hence the name.
Snowflake vs Star: Key Trade-offs
| Aspect | Snowflake (Normalized) | Star (Denormalized) |
|---|---|---|
| Data redundancy | ✅ No redundancy | ❌ Category names repeated across products |
| Model complexity | ❌ More tables to manage | ✅ Simpler with fewer tables |
| Query performance | ❌ Longer relationship chains | ✅ Faster queries (fewer joins) |
| Hierarchy support | ❌ Cannot span multiple tables | ✅ Single-table hierarchies supported |
| Report author experience | ❌ More complex navigation | ✅ Easier to understand and use |
| Maintenance | ✅ Single source for hierarchy changes | ❌ Multiple records to update |
| Model size | ✅ Smaller storage | ❌ Potentially larger |
| Logical data model | ✅ Clearer structure | ⚠️ Less normalized |
Recommendation: In Holistics, we generally recommend denormalized (star) dimensions for better usability and performance, unless you have specific reasons to normalize.
Implementation Example: Product Hierarchy
Let's implement a snowflake dimension for products with category and subcategory classifications.
Normalized (Snowflake) Approach
Model dim_products {
type: 'table'
table_name: 'ecommerce.products'
data_source_name: 'your_datasource_name'
dimension id {
label: 'Product ID'
type: 'number'
primary_key: true
hidden: true
}
dimension name {
label: 'Product Name'
type: 'text'
}
dimension subcategory_id {
label: 'Subcategory ID'
type: 'number'
hidden: true
} // FK to subcategories
}
Model dim_subcategories {
type: 'table'
table_name: 'ecommerce.product_subcategories'
data_source_name: 'your_datasource_name'
dimension id {
label: 'Subcategory ID'
type: 'number'
primary_key: true
hidden: true
}
dimension name {
label: 'Subcategory'
type: 'text'
}
dimension category_id {
label: 'Category ID'
type: 'number'
hidden: true
} // FK to categories
}
Model dim_categories {
type: 'table'
table_name: 'ecommerce.product_categories'
data_source_name: 'your_datasource_name'
dimension id {
label: 'Category ID'
type: 'number'
primary_key: true
hidden: true
}
dimension name {
label: 'Category'
type: 'text'
}
}
// Dataset with snowflake product dimensions
Dataset product_sales {
data_source_name: 'your_datasource_name'
models: [fct_sales, dim_products, dim_subcategories, dim_categories]
relationships: [
// Fact to product
relationship(fct_sales.product_id > dim_products.id, true),
// Product hierarchy chain (snowflake)
relationship(dim_products.subcategory_id > dim_subcategories.id, true),
relationship(dim_subcategories.category_id > dim_categories.id, true)
]
}
Result: Users can analyze sales by category and subcategory, even though sales records only directly link to products.
When Should You Use Snowflake Dimensions?
There are valid scenarios for keeping snowflake dimensions:
1. Source Data is Already Normalized
If your source database uses normalized tables and you're using Holistics models directly on top (without a transformation layer), you may need to work with the snowflake structure as-is.
2. Frequently Changing Hierarchy Attributes
If category names or hierarchy attributes change often, and you need a single source of truth, normalized tables make updates easier.
3. Shared Across Many Facts
If the dimension hierarchy is used by many different fact tables, normalization ensures consistency and easier maintenance.