Skip to main content

Snowflake Schema Pattern

info

This article talks about snowflake schema pattern, and not to be confused with Snowflake (the data warehouse company).

What is Snowflake Schema?

Snowflake schema is a data modeling pattern where dimensions are normalized into multiple related tables, creating a more complex structure than star schema. When visualized, the normalized dimension tables branch out from the fact table like points on a snowflake.

Unlike star schema where each dimension is a single denormalized table, snowflake schema breaks dimensions into hierarchical relationships. For example, instead of storing product categories directly in the products table, you create separate tables:

  • Products table links to Subcategories table
  • Subcategories table links to Categories table

This creates a normalized hierarchy: Fact → Product → Subcategory → Category

Conceptual Visual:

The branching structure of normalized dimensions creates the snowflake appearance - hence the name.

Snowflake Schema vs Star Schema

The key difference between these two patterns is how dimensions are structured. Here's a detailed comparison to help you choose the right approach:

AspectSnowflake SchemaStar Schema
Dimension structure✅ Normalized (multiple tables)❌ Denormalized (single table)
Data redundancy✅ Minimal redundancy❌ Category names repeated across products
Schema complexity❌ More tables and joins✅ Simpler structure
Query performance❌ More joins required✅ Fewer joins, faster queries
Hierarchy support✅ Natively supports hierarchies across multiple tables✅ Single-table hierarchies supported
Report author experience❌ More complex navigation✅ Easier to understand and use
Maintenance✅ Single source for updates❌ Updates across many rows
Logical data model✅ Clearer normalized structure⚠️ Less normalized

Recommendation: In Holistics, we generally recommend star schema (denormalized dimensions) for better usability and performance, unless you have specific reasons to normalize.

Step-by-Step Implementation

Let's build a complete snowflake schema for an e-commerce dataset with normalized product dimensions.

Step 1: Create the Fact Model

Start with your fact table - this is the center of your schema storing measurable events.

fct_sales.model.aml
Model fct_sales {
type: 'table'
table_name: 'ecommerce.sales'
data_source_name: 'your_datasource_name'

dimension id {
label: 'Sale ID'
type: 'number'
primary_key: true
hidden: true
}

dimension product_id {
label: 'Product ID'
type: 'number'
hidden: true
}

dimension customer_id {
label: 'Customer ID'
type: 'number'
hidden: true
}

dimension sale_amount {
label: 'Sale Amount'
type: 'number'
}

dimension quantity {
label: 'Quantity'
type: 'number'
}

measure total_sales {
label: 'Total Sales'
type: 'number'
definition: @aql sum(fct_sales.sale_amount);;
}

measure total_quantity {
label: 'Total Quantity Sold'
type: 'number'
definition: @aql sum(fct_sales.quantity);;
}
}

Step 2: Create Normalized Product Dimensions

Now create the product dimension hierarchy - this is where the "snowflake" structure comes in.

dim_products.model.aml
Model dim_products {
type: 'table'
table_name: 'ecommerce.products'
data_source_name: 'your_datasource_name'

dimension id {
label: 'Product ID'
type: 'number'
primary_key: true
hidden: true
}

dimension name {
label: 'Product Name'
type: 'text'
}

dimension subcategory_id {
label: 'Subcategory ID'
type: 'number'
hidden: true
} // FK to subcategories
}
dim_subcategories.model.aml
Model dim_subcategories {
type: 'table'
table_name: 'ecommerce.product_subcategories'
data_source_name: 'your_datasource_name'

dimension id {
label: 'Subcategory ID'
type: 'number'
primary_key: true
hidden: true
}

dimension name {
label: 'Subcategory'
type: 'text'
}

dimension category_id {
label: 'Category ID'
type: 'number'
hidden: true
} // FK to categories
}
dim_categories.model.aml
Model dim_categories {
type: 'table'
table_name: 'ecommerce.product_categories'
data_source_name: 'your_datasource_name'

dimension id {
label: 'Category ID'
type: 'number'
primary_key: true
hidden: true
}

dimension name {
label: 'Category'
type: 'text'
}
}

Step 3: Create Customer Location Dimensions

Let's add another normalized dimension hierarchy for customer locations to create a more complete snowflake schema.

dim_customers.model.aml
Model dim_customers {
type: 'table'
table_name: 'ecommerce.customers'
data_source_name: 'your_datasource_name'

dimension id {
label: 'Customer ID'
type: 'number'
primary_key: true
hidden: true
}

dimension name {
label: 'Customer Name'
type: 'text'
}

dimension city_id {
label: 'City ID'
type: 'number'
hidden: true
} // FK to cities
}
dim_cities.model.aml
Model dim_cities {
type: 'table'
table_name: 'ecommerce.cities'
data_source_name: 'your_datasource_name'

dimension id {
label: 'City ID'
type: 'number'
primary_key: true
hidden: true
}

dimension name {
label: 'City'
type: 'text'
}

dimension country_id {
label: 'Country ID'
type: 'number'
hidden: true
} // FK to countries
}
dim_countries.model.aml
Model dim_countries {
type: 'table'
table_name: 'ecommerce.countries'
data_source_name: 'your_datasource_name'

dimension id {
label: 'Country ID'
type: 'number'
primary_key: true
hidden: true
}

dimension name {
label: 'Country Name'
type: 'text'
}
}

Step 4: Build the Complete Snowflake Schema Dataset

Now connect everything together. Notice how we have two normalized hierarchies: products and customer locations.

snowflake_schema.dataset.aml
Dataset ecommerce_snowflake {
label: 'E-commerce (Snowflake Schema)'
data_source_name: 'your_datasource_name'

models: [
fct_sales, // Fact table
dim_products, // Normalized product dimension
dim_subcategories, // Product hierarchy
dim_categories, // Product hierarchy
dim_customers, // Normalized customer dimension
dim_cities, // Customer location hierarchy
dim_countries // Customer location hierarchy
]

relationships: [
// Fact to dimensions
relationship(fct_sales.product_id > dim_products.id, true),
relationship(fct_sales.customer_id > dim_customers.id, true),

// Product hierarchy chain (snowflake)
relationship(dim_products.subcategory_id > dim_subcategories.id, true),
relationship(dim_subcategories.category_id > dim_categories.id, true),

// Customer location hierarchy chain (snowflake)
relationship(dim_customers.city_id > dim_cities.id, true),
relationship(dim_cities.country_id > dim_countries.id, true)
]
}

Result: You now have a complete snowflake schema with multiple normalized dimension hierarchies. Users can analyze sales by product category, subcategory, and customer location (city, country), even though the fact table only directly links to products and customers.

When to Use Snowflake Schema

Choose snowflake schema when you have these specific needs:

1. Source Data is Already Normalized

If your source database uses normalized tables and you're modeling directly on top without a transformation layer (like dbt), you may need to work with the snowflake structure as-is.

2. Multiple Dimension Hierarchies Exist

When you have several dimensions with natural hierarchies (products, geography, organizational structure) and want to maintain clear relationships.

3. Frequently Changing Hierarchy Attributes

If category names or hierarchy attributes change often and you need a single source of truth, normalized tables make updates easier.

4. Shared Across Many Facts

If the dimension hierarchy is used by many different fact tables, normalization ensures consistency and easier maintenance.

Additional Resources


Let us know what you think about this document :)