Skip to main content

Pre-aggregate Persistence

Introduction

When setting up aggregate awareness, you are required to specify an underlying table in your data warehouse for Holistics to pick up. These are called pre-aggregated tables.

If you don't have existing pre-aggregated tables, Holistics can create and manage them for you right in your data warehouse. We call this Built-in Persistence. Use this when:

  • You don't have existing pre-aggregated tables
  • You want Holistics to automatically manage table creation and updates
  • You have a writeable schema in your data warehouse

How it works

Step 1: Configure persistence setting

Just add a persistence config to your pre-aggregate:

Dataset ecommerce {
...

pre_aggregates: {
agg_transactions: PreAggregate {
// Your dimension and measure definitions
dimension created_at_day {
for: r(transactions.created_at),
time_granularity: "day"
}
measure count_transactions {
for: r(transactions.id)
aggregation_type: 'count'
}

// Configure Holistics to create the table
persistence: FullPersistence {
// The schema where Holistics will create the table.
// Make sure you grant database write-permission to Holistics
schema: 'persisted'
// For table names, Holistics will handle this automatically
}
}
}
}
Prerequisites
  • The schema (e.g., persisted) must exist in your data warehouse
  • Your Holistics connection needs write permissions to this schema

Step 2: Manually trigger table creation

Head to your Dataset in Holistics, find the Pre-aggregate section. Locate the relevant pre-aggregate, and click on Run button.

Step 3: You're all set!

Now it's all set up. This pre-aggregated table will be considered by Aggregate Awareness when a right combination of queried fields happens.

Rerunning your preaggregated tables

Pre-aggregated tables need to be refreshed when your source data changes. This can be triggered in a few ways.

  • Manual refresh: Analysts can manually persist tables by clicking a button in the UI (video above), making it easy to refresh data on demand.

  • Scheduled refresh: You can configure the preaggregated tables to refresh based on Holistics' built-in scheduling system. Modify the schedules.aml file in your project root (if it's not there, you can go ahead and create it) to add PreAggregateSchedule configs.

    const schedules = [
    // Refresh specific pre-aggregates daily at 8:15 PM
    PreAggregateSchedule {
    cron: '15 20 * * *'
    object: ecommerce_dataset
    pre_aggregates: ['agg_transactions']
    }

    // Or refresh all pre-aggregates in a dataset
    PreAggregateSchedule {
    cron: '0 2 * * *' // 2 AM daily
    object: ecommerce_dataset
    }
    ]
  • API refresh (to be implemented): For more advanced workflows, integrate persistence with your existing orchestration tools via our API trigger. This is currently not supported but it is to be implemented.

Persistence types

When configuring built-in persistence, you have 2 options: full and incremental.

  • FullPersistence: Rebuilds the whole table from scratch each time. Use this option when you have smaller datasets that refresh quickly, or data that needs to a clean slate each time.

    persistence: FullPersistence {
    schema: 'persisted'
    }
  • IncrementalPersistence: Pull in the new change (incremental upsert) based on a incremental column key. Use this when you have big datasets where full refreshes take forever, for example time-series data that grows daily.

    persistence: IncrementalPersistence {
    schema: 'persisted'
    incremental_column: 'created_at_day'
    }

Learn more about persistence syntax at AML Persistence.

Uniqueness of persisted tables

Holistics generates a key for each Pre-Aggregate based on its semantics. When running a Persistence, Holistics would associate that key to the persisted table.

The semantics of a Pre-Aggregate includes (but is not limited to):

  • Name
  • Dimensions (including their SQL/AQL definitions)
  • Measures (including their SQL/AQL definitions)
  • Dataset's Relationships (that are relevant to the Dimensions and Measures)
  • Dataset's Data Source

Therefore, if you make semantic changes to your Pre-Aggregate (for example, by changing a Dimension's definition in its Model), because the key is changed, Holistics will not use the old persisted tables of that Pre-Aggregate.

Additional notes: Pre-Aggregates defined in different git branches of your project can re-use the same persisted tables if (and only if) they have the same semantics.

Automated table cleanup

We'll clean up unused tables after 7 days (but you can change that):

  1. Pop into Administration Settings
  2. Look for Pre-Aggregate Persistence TTL
  3. Set it to whatever works for you
performance-pre-aggregate-persistence-ttl

The countdown resets whenever:

  • You update the table's data
  • Someone queries the table

Learn more


Let us know what you think about this document :)