Skip to main content

Galaxy Schema (Fact Constellation)

What is Galaxy Schema?

Galaxy schema, also called fact constellation, is when you have multiple fact tables that share the same dimensions. Think of it as multiple star schemas connected through common dimensions.

Visual Structure:

When to Use Galaxy Schema

Galaxy schema is the right choice when you need to:

  • Analyze multiple business processes together (like orders and inventory)
  • Work with different facts that share common dimensions (products, dates, locations)
  • Build metrics that span multiple fact tables
  • Get a comprehensive view of your business

If you only have a single business process or your facts don't share dimensions, Star Schema is simpler and more appropriate.

Understanding Path Selection

When multiple facts connect to the same dimensions, there are multiple possible paths between models. Holistics automatically chooses the most analytically correct path using a ranking algorithm based on path tier, weight, and length.

Example: "Total quantity available by product"

When you query inventory metrics by product, Holistics intelligently selects the direct path:

  • Selected path: dim_productsfct_inventory

    • Pattern: Dimension to fact (one-to-many)
    • Tier: Tier 1 (Best - pure one-to-many relationships)
    • Length: 2 hops
  • Alternative path: dim_productsfct_ordersdim_datesfct_inventory

    • Pattern: Mixed - goes through multiple facts
    • Tier: Tier 4 (Mixed pattern not following standard analytics patterns)
    • Length: 4 hops

Holistics automatically selects the first path because Tier 1 ranks higher than Tier 4. The direct dimension-to-fact relationship follows the most common and reliable analytics pattern.

For more details on how path selection works, see Ambiguity in Join Paths.

Step-by-Step Implementation

Step 1: Create Your Fact Models

Start by defining each fact model for your different business processes. Each fact represents a distinct measurable event - in this example, we have orders and inventory.

fct_orders.model.aml
Model fct_orders {
type: 'table'
table_name: 'ecommerce.orders'
data_source_name: 'your_data_source_name'

dimension id {
label: 'Order ID'
type: 'number'
primary_key: true
hidden: true
}

dimension product_id {
label: 'Product ID'
type: 'number'
hidden: true
}

dimension created_at {
label: 'Created At'
type: 'datetime'
}

dimension item_value {
label: 'Item Value'
type: 'number'
}

measure total_gmv {
label: 'Total GMV'
type: 'number'
definition: @aql sum(fct_orders.item_value);;
}
}
fct_inventory.model.aml
Model fct_inventory {
type: 'table'
table_name: 'ecommerce.inventory'
data_source_name: 'your_data_source_name'

dimension id {
label: 'Inventory ID'
type: 'number'
primary_key: true
hidden: true
}

dimension product_id {
label: 'Product ID'
type: 'number'
hidden: true
}

dimension created_at {
label: 'Created At'
type: 'datetime'
}

dimension quantity_available {
label: 'Quantity Available'
type: 'number'
}

measure total_quantity {
label: 'Total Available Quantity'
type: 'number'
definition: @aql sum(fct_inventory.quantity_available);;
}
}

Step 2: Create Shared Dimension Models

These are the dimensions that both fact tables will connect to. Since they're shared across multiple facts, they act as the common ground for cross-process analysis.

dim_products.model.aml
Model dim_products {
type: 'table'
table_name: 'ecommerce.products'
data_source_name: 'your_data_source_name'

dimension id {
label: 'Product ID'
type: 'number'
primary_key: true
hidden: true
}

dimension name {
label: 'Product Name'
type: 'text'
}

dimension category {
label: 'Category'
type: 'text'
}
}
dim_dates.model.aml
Model dim_dates {
type: 'table'
table_name: 'ecommerce.dates'
data_source_name: 'your_data_source_name'

dimension date {
label: 'Date'
type: 'date'
primary_key: true
}

dimension month {
label: 'Month'
type: 'text'
}

dimension quarter {
label: 'Quarter'
type: 'text'
}
}

Step 3: Build Dataset with All Relationships

Now connect everything in a dataset. Notice that all relationships are active - Holistics will automatically choose the best path for each query based on what you're trying to analyze.

galaxy_schema.dataset.aml
Dataset galaxy_schema {
label: 'Orders and Inventory Analysis'
data_source_name: 'your_data_source_name'

models: [
fct_orders,
fct_inventory,
dim_products,
dim_dates
]

relationships: [
// Orders connections
relationship(fct_orders.product_id > dim_products.id, true),
relationship(fct_orders.created_at > dim_dates.date, true),

// Inventory connections (all active - Holistics handles ambiguity automatically)
relationship(fct_inventory.product_id > dim_products.id, true),
relationship(fct_inventory.created_at > dim_dates.date, true)
]

// Holistics automatically uses the direct path: fct_inventory → dim_products
metric total_available_products {
label: 'Total Available Quantity'
type: 'number'
definition: @aql
fct_inventory | sum(fct_inventory.quantity_available)
;;
}

// Uses the direct path: fct_orders → dim_products
metric total_gmv {
label: 'Total GMV'
type: 'number'
definition: @aql fct_orders | sum(fct_orders.item_value);;
}
}

Using Both Facts Together

explore {
dimensions {
dim_products.name
}
measures {
total_available_products,
total_gmv
}
filters {
dim_dates.date matches @(last 6 months)
}
}

Result:

Result Fact Constellation Schema

Next Steps


Additional Resources


Let us know what you think about this document :)