Skip to main content

Pre-aggregate Persistence

Aggregate Awareness requires a physical table in your data warehouse. You have two options for providing this table:

  • Built-in persistence: Holistics creates and manages the table for you
  • External persistence: You point Holistics to an existing table (from dbt, Airflow, etc.)

Built-in persistence

With built-in persistence, Holistics creates and refreshes pre-aggregated tables directly in your data warehouse. Use this when you don't have existing tables and want Holistics to handle everything.

Persistence types

Choose between full and incremental persistence based on your data volume and refresh needs.

FullPersistence

Rebuilds the entire table from scratch each time. Best for smaller datasets or when you need a clean slate on each refresh.

persistence: FullPersistence {
schema: 'persisted'
}

IncrementalPersistence

Appends new records and updates existing ones. Identifies new records using incremental_column and updates based on primary_key. Best for large, append-heavy datasets like time-series data.

persistence: IncrementalPersistence {
schema: 'persisted'
incremental_column: 'created_at_day'
primary_key: 'created_at_day'
}
tip

Always specify primary_key (typically the same as incremental_column) to ensure records are upserted correctly.

Incremental persistence comparison

Note: The column names in incremental_column and primary_key refer to columns in the PreAggregate, not the source models.

For full syntax details, see AML Persistence.


Setting up built-in persistence

Step 1: Add persistence config

Add a persistence block to your pre-aggregate definition:

Dataset ecommerce {
models: [transactions]

pre_aggregates: {
agg_transactions: PreAggregate {
dimension created_at_day {
for: r(transactions.created_at)
time_granularity: "day"
}
dimension status {
for: r(transactions.status)
}
dimension country {
for: r(transactions.country)
}
dimension city {
for: r(transactions.city)
}

measure count_transactions {
for: r(transactions.id)
aggregation_type: 'count'
}

persistence: FullPersistence {
schema: 'persisted'
}
}
}
}
Prerequisites
  • The schema (e.g., persisted) must exist in your data warehouse
  • Your Holistics connection needs write permissions to this schema

Step 2: Trigger table creation

In Holistics, navigate to your Dataset and find the Pre-aggregate section. Click Run on the relevant pre-aggregate.

Once created, the table is ready for Aggregate Awareness to use.


Refreshing pre-aggregated tables

Pre-aggregated tables need to stay in sync with your source data. You can refresh them in several ways:

Manual refresh: Click the Run button in the UI (shown in the video above).

Scheduled refresh: Add a PreAggregateSchedule to your schedules.aml file:

const schedules = [
// Refresh specific pre-aggregates daily at 8:15 PM
PreAggregateSchedule {
cron: '15 20 * * *'
object: ecommerce_dataset
pre_aggregates: ['agg_transactions']
}

// Or refresh all pre-aggregates in a dataset
PreAggregateSchedule {
cron: '0 2 * * *' // 2 AM daily
object: ecommerce_dataset
}
]

API refresh: Coming soon—integrate with your existing orchestration tools.


How Holistics identifies tables

Holistics generates a unique key for each pre-aggregate based on its semantics:

  • Dimension definitions (SQL/AQL)
  • Measure definitions (SQL/AQL)
  • Relevant relationships in the dataset
  • Data source connection

If you change any of these (e.g., modify a dimension's definition), the key changes and Holistics will no longer use the old persisted table—you'll need to run persistence again.

Cross-branch reuse: Pre-aggregates in different git branches can share the same persisted table if they have identical semantics.


Automatic cleanup

Unused tables are automatically deleted after 7 days. To adjust this:

  1. Go to Administration Settings
  2. Find Pre-Aggregate Persistence TTL
  3. Set your preferred retention period
Pre-aggregate persistence TTL setting

The countdown resets when:

  • The table data is updated
  • Someone queries the table

Learn more


External persistence

If you already have pre-aggregated tables in your warehouse (from dbt, Airflow, etc.), point Holistics to them with ExternalPersistence:

Dataset ecommerce {
models: [transactions]

pre_aggregates: {
agg_transactions: PreAggregate {
dimension created_at_day {
for: r(transactions.created_at)
time_granularity: "day"
}
dimension status {
for: r(transactions.status)
}
dimension country {
for: r(transactions.country)
}
dimension city {
for: r(transactions.city)
}

measure count_transactions {
for: r(transactions.id)
aggregation_type: 'count'
}

persistence: ExternalPersistence {
table_name: 'your_schema.agg_transactions'
}
}
}
}
Column names must match

The dimension and measure names in your pre-aggregate config must exactly match the column names in your external table.


Let us know what you think about this document :)