Star Schema Pattern
What is Star Schema?
Star schema is the most common and straightforward data modeling pattern where one fact table sits at the center, connected to multiple dimension tables radiating outward like points on a star.
Visual Structure:
When you run a query like "Total revenue by product category last month":
- Dimensions filter: Products (category), Dates (last month)
- Fact aggregates: Orders (sum revenue)
When to Use Star Schema
Star schema works best when you have:
- A single primary business process to analyze (like orders, page views, or transactions)
- Need for simple, fast queries with minimal joins
- Performance as a key priority
You might want a different pattern if you're dealing with multiple related business processes (Galaxy Schema is better for this) or have complex dimension hierarchies that benefit from normalization (Snowflake Dimensions).
Step-by-Step Implementation
Step 1: Identify Facts and Dimensions
Start by classifying your models:
Ask yourself:
- "What am I measuring?" → Fact models
- "What am I grouping by?" → Dimension models
Example Classification:
| Model | Type | Why? |
|---|---|---|
fct_orders | Fact | Measures order count, revenue, average order value |
dim_users | Dimension | Describes customers (name, email, signup date) |
dim_products | Dimension | Describes products (name, category, price) |
dim_dates | Dimension | Describes time periods (date, month, quarter, year) |
Step 2: Create the Fact Model
Your fact model is where the action happens - it stores the measurable events and metrics. Focus on defining clear measures and hiding technical fields like foreign keys that users don't need to see.
Model fct_orders {
type: 'table'
table_name: 'ecommerce.orders'
data_source_name: 'your_datasource_name'
dimension id {
label: 'Order ID'
type: 'number'
primary_key: true
hidden: true
}
// Foreign keys - link to dim_users model & hide from end users
dimension user_id {
label: 'User ID'
type: 'number'
hidden: true
}
dimension product_id {
label: 'Product ID'
type: 'number'
hidden: true
}
// Date field for time filtering
dimension created_at {
label: 'Order Date'
type: 'datetime'
}
// Status attribute for filtering
dimension status {
label: 'Order Status'
type: 'text'
}
// "Amount" dimension for aggregation
dimension amount {
label: 'Amount'
type: 'number'
}
// Measures belong in fact models
measure total_orders {
label: 'Total Orders'
type: 'number'
definition: @aql count(fct_orders.id) ;;
}
measure total_revenue {
label: 'Total Revenue'
type: 'number'
definition: @aql sum(fct_orders.amount);;
}
measure avg_order_value {
label: 'Average Order Value'
type: 'number'
definition: @aql avg(fct_orders.amount);;
}
}
Step 3: Create Dimension Models
Dimension models provide the context for your analysis - the attributes you'll filter and group by. Each dimension needs a unique primary key, and should contain descriptive attributes that make sense to your users.
Model dim_products {
type: 'table'
table_name: 'ecommerce.products'
data_source_name: 'your_datasource_name'
dimension id {
label: 'Product ID'
type: 'number'
primary_key: true // ✅ Must be unique!
hidden: true
}
dimension name {
label: 'Product Name'
type: 'text'
}
dimension category {
label: 'Category'
type: 'text'
}
dimension price {
label: 'List Price'
type: 'number'
}
}
Model dim_users {
type: 'table'
table_name: 'ecommerce.users'
data_source_name: 'your_datasource_name'
dimension id {
label: 'User ID'
type: 'number'
primary_key: true
hidden: true
}
dimension name {
label: 'Customer Name'
type: 'text'
}
dimension email {
label: 'Email'
type: 'text'
}
dimension created_at {
label: 'Signup Date'
type: 'datetime'
}
}
Step 4: Connect Models in a Dataset
Now bring everything together in a dataset by defining the relationships.
Dataset ecommerce {
label: 'E-commerce Analytics'
description: 'Core dataset for analyzing orders, users, and products using star schema design.'
data_source_name: 'your_datasource_name'
models: [
fct_orders, // Fact (center)
dim_users, // Dimension (point)
dim_products, // Dimension (point)
dim_dates // Dimension (point)
]
relationships: [
// Fact → Dimension (many-to-one)
relationship(fct_orders.user_id > dim_users.id, true),
relationship(fct_orders.product_id > dim_products.id, true),
relationship(fct_orders.created_at > dim_dates.date, true)
]
}
Key Rules for Star Schema
Relationship Rules
1. Always many-to-one (fact → dimension)
Relationships should always point from the fact table to the dimension table:
// Correct
relationship(fct_orders.user_id > dim_users.id, true)
// Wrong direction
relationship(dim_users.id > fct_orders.user_id, true)
2. The "one" side must have unique primary key
Make sure your dimension's primary key is actually unique:
Model dim_users {
...
dimension id {
label: 'User ID'
type: 'number'
primary_key: true // Must be unique!
}
}
3. Hide foreign keys
Technical fields like foreign keys should be hidden from end users:
Model fct_orders {
...
dimension user_id {
label: 'User ID'
type: 'number'
hidden: true // Hide technical fields
}
}
Example Queries
Once your star schema is set up, you can easily build queries:
Total revenue by product category:
explore {
dimensions {
dim_products.category
}
measures {
fct_orders.total_revenue
}
}
Monthly orders by customer:
explore {
dimensions {
dim_users.name,
fct_orders.created_at // Auto-groups by month
}
measures {
fct_orders.total_orders
}
}
Next Steps
Once you've mastered star schema, explore more advanced patterns:
- Galaxy Schema - Multiple facts sharing dimensions
- Snowflake Dimensions - Normalized dimension hierarchies
- Role-Playing Dimensions - Same dimension, multiple meanings