Skip to main content

Star Schema Pattern

What is Star Schema?

Star schema is the most common and straightforward data modeling pattern where one fact table sits at the center, connected to multiple dimension tables radiating outward like points on a star.

Visual Structure:

When you run a query like "Total revenue by product category last month":

  • Dimensions filter: Products (category), Dates (last month)
  • Fact aggregates: Orders (sum revenue)

When to Use Star Schema

Star schema works best when you have:

  • A single primary business process to analyze (like orders, page views, or transactions)
  • Need for simple, fast queries with minimal joins
  • Performance as a key priority

You might want a different pattern if you're dealing with multiple related business processes (Galaxy Schema is better for this) or have complex dimension hierarchies that benefit from normalization (Snowflake Dimensions).

Step-by-Step Implementation

Step 1: Identify Facts and Dimensions

Start by classifying your models:

Ask yourself:

  • "What am I measuring?" → Fact models
  • "What am I grouping by?" → Dimension models

Example Classification:

ModelTypeWhy?
fct_ordersFactMeasures order count, revenue, average order value
dim_usersDimensionDescribes customers (name, email, signup date)
dim_productsDimensionDescribes products (name, category, price)
dim_datesDimensionDescribes time periods (date, month, quarter, year)

Step 2: Create the Fact Model

Your fact model is where the action happens - it stores the measurable events and metrics. Focus on defining clear measures and hiding technical fields like foreign keys that users don't need to see.

fct_orders.model.aml
Model fct_orders {
type: 'table'
table_name: 'ecommerce.orders'
data_source_name: 'your_datasource_name'

dimension id {
label: 'Order ID'
type: 'number'
primary_key: true
hidden: true
}

// Foreign keys - link to dim_users model & hide from end users
dimension user_id {
label: 'User ID'
type: 'number'
hidden: true
}

dimension product_id {
label: 'Product ID'
type: 'number'
hidden: true
}

// Date field for time filtering
dimension created_at {
label: 'Order Date'
type: 'datetime'
}

// Status attribute for filtering
dimension status {
label: 'Order Status'
type: 'text'
}

// "Amount" dimension for aggregation
dimension amount {
label: 'Amount'
type: 'number'
}

// Measures belong in fact models
measure total_orders {
label: 'Total Orders'
type: 'number'
definition: @aql count(fct_orders.id) ;;
}

measure total_revenue {
label: 'Total Revenue'
type: 'number'
definition: @aql sum(fct_orders.amount);;
}

measure avg_order_value {
label: 'Average Order Value'
type: 'number'
definition: @aql avg(fct_orders.amount);;
}
}

Step 3: Create Dimension Models

Dimension models provide the context for your analysis - the attributes you'll filter and group by. Each dimension needs a unique primary key, and should contain descriptive attributes that make sense to your users.

dim_products.model.aml
Model dim_products {
type: 'table'
table_name: 'ecommerce.products'
data_source_name: 'your_datasource_name'

dimension id {
label: 'Product ID'
type: 'number'
primary_key: true // ✅ Must be unique!
hidden: true
}

dimension name {
label: 'Product Name'
type: 'text'
}

dimension category {
label: 'Category'
type: 'text'
}

dimension price {
label: 'List Price'
type: 'number'
}
}
dim_users.model.aml
Model dim_users {
type: 'table'
table_name: 'ecommerce.users'
data_source_name: 'your_datasource_name'

dimension id {
label: 'User ID'
type: 'number'
primary_key: true
hidden: true
}

dimension name {
label: 'Customer Name'
type: 'text'
}

dimension email {
label: 'Email'
type: 'text'
}

dimension created_at {
label: 'Signup Date'
type: 'datetime'
}
}

Step 4: Connect Models in a Dataset

Now bring everything together in a dataset by defining the relationships.

ecommerce.dataset.aml
Dataset ecommerce {
label: 'E-commerce Analytics'
description: 'Core dataset for analyzing orders, users, and products using star schema design.'

data_source_name: 'your_datasource_name'

models: [
fct_orders, // Fact (center)
dim_users, // Dimension (point)
dim_products, // Dimension (point)
dim_dates // Dimension (point)
]

relationships: [
// Fact → Dimension (many-to-one)
relationship(fct_orders.user_id > dim_users.id, true),
relationship(fct_orders.product_id > dim_products.id, true),
relationship(fct_orders.created_at > dim_dates.date, true)
]
}

Key Rules for Star Schema

Relationship Rules

1. Always many-to-one (fact → dimension)

Relationships should always point from the fact table to the dimension table:

// Correct
relationship(fct_orders.user_id > dim_users.id, true)

// Wrong direction
relationship(dim_users.id > fct_orders.user_id, true)

2. The "one" side must have unique primary key

Make sure your dimension's primary key is actually unique:

Model dim_users {
...
dimension id {
label: 'User ID'
type: 'number'
primary_key: true // Must be unique!
}
}

3. Hide foreign keys

Technical fields like foreign keys should be hidden from end users:

Model fct_orders {
...
dimension user_id {
label: 'User ID'
type: 'number'
hidden: true // Hide technical fields
}
}

Example Queries

Once your star schema is set up, you can easily build queries:

Total revenue by product category:

explore {
dimensions {
dim_products.category
}
measures {
fct_orders.total_revenue
}
}

Monthly orders by customer:

explore {
dimensions {
dim_users.name,
fct_orders.created_at // Auto-groups by month
}
measures {
fct_orders.total_orders
}
}

Next Steps

Once you've mastered star schema, explore more advanced patterns:

Additional Resources


Let us know what you think about this document :)