Snowflake Schema Pattern
This article talks about snowflake schema pattern, and not to be confused with Snowflake (the data warehouse company).
What is Snowflake Schema?
Snowflake schema is a data modeling pattern where dimensions are normalized into multiple related tables, creating a more complex structure than star schema. When visualized, the normalized dimension tables branch out from the fact table like points on a snowflake.
Unlike star schema where each dimension is a single denormalized table, snowflake schema breaks dimensions into hierarchical relationships. For example, instead of storing product categories directly in the products table, you create separate tables:
- Products table links to Subcategories table
- Subcategories table links to Categories table
This creates a normalized hierarchy: Fact → Product → Subcategory → Category
Conceptual Visual:
The branching structure of normalized dimensions creates the snowflake appearance - hence the name.
Snowflake Schema vs Star Schema
The key difference between these two patterns is how dimensions are structured. Here's a detailed comparison to help you choose the right approach:
| Aspect | Snowflake Schema | Star Schema |
|---|---|---|
| Dimension structure | ✅ Normalized (multiple tables) | ❌ Denormalized (single table) |
| Data redundancy | ✅ Minimal redundancy | ❌ Category names repeated across products |
| Schema complexity | ❌ More tables and joins | ✅ Simpler structure |
| Query performance | ❌ More joins required | ✅ Fewer joins, faster queries |
| Hierarchy support | ✅ Natively supports hierarchies across multiple tables | ✅ Single-table hierarchies supported |
| Report author experience | ❌ More complex navigation | ✅ Easier to understand and use |
| Maintenance | ✅ Single source for updates | ❌ Updates across many rows |
| Logical data model | ✅ Clearer normalized structure | ⚠️ Less normalized |
Recommendation: In Holistics, we generally recommend star schema (denormalized dimensions) for better usability and performance, unless you have specific reasons to normalize.
Step-by-Step Implementation
Let's build a complete snowflake schema for an e-commerce dataset with normalized product dimensions.
Step 1: Create the Fact Model
Start with your fact table - this is the center of your schema storing measurable events.
Model fct_sales {
type: 'table'
table_name: 'ecommerce.sales'
data_source_name: 'your_datasource_name'
dimension id {
label: 'Sale ID'
type: 'number'
primary_key: true
hidden: true
}
dimension product_id {
label: 'Product ID'
type: 'number'
hidden: true
}
dimension customer_id {
label: 'Customer ID'
type: 'number'
hidden: true
}
dimension sale_amount {
label: 'Sale Amount'
type: 'number'
}
dimension quantity {
label: 'Quantity'
type: 'number'
}
measure total_sales {
label: 'Total Sales'
type: 'number'
definition: @aql sum(fct_sales.sale_amount);;
}
measure total_quantity {
label: 'Total Quantity Sold'
type: 'number'
definition: @aql sum(fct_sales.quantity);;
}
}
Step 2: Create Normalized Product Dimensions
Now create the product dimension hierarchy - this is where the "snowflake" structure comes in.
Model dim_products {
type: 'table'
table_name: 'ecommerce.products'
data_source_name: 'your_datasource_name'
dimension id {
label: 'Product ID'
type: 'number'
primary_key: true
hidden: true
}
dimension name {
label: 'Product Name'
type: 'text'
}
dimension subcategory_id {
label: 'Subcategory ID'
type: 'number'
hidden: true
} // FK to subcategories
}
Model dim_subcategories {
type: 'table'
table_name: 'ecommerce.product_subcategories'
data_source_name: 'your_datasource_name'
dimension id {
label: 'Subcategory ID'
type: 'number'
primary_key: true
hidden: true
}
dimension name {
label: 'Subcategory'
type: 'text'
}
dimension category_id {
label: 'Category ID'
type: 'number'
hidden: true
} // FK to categories
}
Model dim_categories {
type: 'table'
table_name: 'ecommerce.product_categories'
data_source_name: 'your_datasource_name'
dimension id {
label: 'Category ID'
type: 'number'
primary_key: true
hidden: true
}
dimension name {
label: 'Category'
type: 'text'
}
}
Step 3: Create Customer Location Dimensions
Let's add another normalized dimension hierarchy for customer locations to create a more complete snowflake schema.
Model dim_customers {
type: 'table'
table_name: 'ecommerce.customers'
data_source_name: 'your_datasource_name'
dimension id {
label: 'Customer ID'
type: 'number'
primary_key: true
hidden: true
}
dimension name {
label: 'Customer Name'
type: 'text'
}
dimension city_id {
label: 'City ID'
type: 'number'
hidden: true
} // FK to cities
}
Model dim_cities {
type: 'table'
table_name: 'ecommerce.cities'
data_source_name: 'your_datasource_name'
dimension id {
label: 'City ID'
type: 'number'
primary_key: true
hidden: true
}
dimension name {
label: 'City'
type: 'text'
}
dimension country_id {
label: 'Country ID'
type: 'number'
hidden: true
} // FK to countries
}
Model dim_countries {
type: 'table'
table_name: 'ecommerce.countries'
data_source_name: 'your_datasource_name'
dimension id {
label: 'Country ID'
type: 'number'
primary_key: true
hidden: true
}
dimension name {
label: 'Country Name'
type: 'text'
}
}
Step 4: Build the Complete Snowflake Schema Dataset
Now connect everything together. Notice how we have two normalized hierarchies: products and customer locations.
Dataset ecommerce_snowflake {
label: 'E-commerce (Snowflake Schema)'
data_source_name: 'your_datasource_name'
models: [
fct_sales, // Fact table
dim_products, // Normalized product dimension
dim_subcategories, // Product hierarchy
dim_categories, // Product hierarchy
dim_customers, // Normalized customer dimension
dim_cities, // Customer location hierarchy
dim_countries // Customer location hierarchy
]
relationships: [
// Fact to dimensions
relationship(fct_sales.product_id > dim_products.id, true),
relationship(fct_sales.customer_id > dim_customers.id, true),
// Product hierarchy chain (snowflake)
relationship(dim_products.subcategory_id > dim_subcategories.id, true),
relationship(dim_subcategories.category_id > dim_categories.id, true),
// Customer location hierarchy chain (snowflake)
relationship(dim_customers.city_id > dim_cities.id, true),
relationship(dim_cities.country_id > dim_countries.id, true)
]
}
Result: You now have a complete snowflake schema with multiple normalized dimension hierarchies. Users can analyze sales by product category, subcategory, and customer location (city, country), even though the fact table only directly links to products and customers.
When to Use Snowflake Schema
Choose snowflake schema when you have these specific needs:
1. Source Data is Already Normalized
If your source database uses normalized tables and you're modeling directly on top without a transformation layer (like dbt), you may need to work with the snowflake structure as-is.
2. Multiple Dimension Hierarchies Exist
When you have several dimensions with natural hierarchies (products, geography, organizational structure) and want to maintain clear relationships.
3. Frequently Changing Hierarchy Attributes
If category names or hierarchy attributes change often and you need a single source of truth, normalized tables make updates easier.
4. Shared Across Many Facts
If the dimension hierarchy is used by many different fact tables, normalization ensures consistency and easier maintenance.