Skip to main content

Quick Start

To set up Aggregate Awareness in Holistics, you can follow these two steps

  • Defining pre-aggregate in the dataset
  • Assigning the pre-aggregated tables by either:
    • Creating aggregated tables using Holistics Persistence, OR
    • Bringing in external pre-aggregated tables created in advance using data tools like dbt, Airflow, etc.

Define the pre-aggregate

In the dataset file, add the pre_aggregates config to map model dimensions and measures to the respective pre-aggregated table fields.

Dataset ecommerce {
//... other dataset settings

models: [transactions]

// Add pre-aggregate
pre_aggregates: {
agg_transactions: PreAggregate {
// This dimension is in the pre-aggregated table `agg_transactions`
dimension created_at_day {
// It references the field `created_at` in model `transactions`
for: ref('transactions', 'created_at'),
time_granularity: "day"
}
dimension status {
for: ref('transactions', 'status')
}
dimension country {
for: ref('transactions', 'country')
}
dimension city {
for: ref('transactions', 'city')
}
measure count_transactions {
for: ref('transactions', 'id')
aggregation_type: 'count'
}
},
}
}

Persist the aggregated table

The above setup only defines the mapping between your model fields and the pre-aggregate fields. In order for queries to use the pre-aggregate, an actual pre-aggregated table must be created/persisted in your Data Warehouse.

To define how the pre-aggregate is persisted, specify the persistence attribute in your PreAggregate.

Using Holistics built-in persistence

Holistics supports the creation of pre-aggregate tables through Holistics built-in Persistence:

  • You specify a writeable schema in the data warehouse.
  • Holistics handles persisting and refreshing of tables behind the scenes.

For example, you can use a FullPersistence config for your PreAggregate's persistence:

  pre_aggregates: {
transactions_agg: PreAggregate {
//... other pre-aggregate configs

// PERSISTENCE CONFIG
persistence: FullPersistence {
schema: 'persisted'
}
}
}

This config instructs Holistics to use Full Persistence to create pre-aggregated tables for transactions_agg in the schema named persisted in your Data Warehouse.

NOTES

Make sure that:

Then, you will need to run the persistence process.

Trigger persistence manually

To manually trigger the table creation (persistence process), go to Dataset’s Pre-aggregate section in the List tab to invoke the Run button and wait for it to finish.

Set up scheduled refreshes

When the original tables change, the pre-aggregate could become outdated. Instead of just using UI to manually trigger the persistence, you can use a schedule trigger to define the pre-aggregate cadence.

In Holistics, you can create a schedules.aml file at the root of your AML project:

const schedules = [
// Schedule for specific PreAggregates in a Dataset
PreAggregateSchedule {
cron: '15 20 * * *'
object: ecommerce_dataset
pre_aggregates: ['agg_transactions'] // persist the PreAggregate 'agg_transactions' only
}

// Schedule for all PreAggregates in a Dataset
PreAggregateSchedule {
cron: '15 20 * * *'
object: ecommerce_dataset
}
]

You can find the full syntax reference here.

Trigger via API (coming soon)

Alternatively, you can also trigger to create and refresh pre-aggregated tables using API (coming soon).

Learn more

Learn more about Holistics Built-in Pre-Aggregate Persistence here.

Using externally-built tables

If you prefer to build the aggregated tables outside of Holistics (with tools like Airflow, dbt, etc.), you can use ExternalPersistence to map those tables to your PreAggregate definitions.

For example, given that we have an externally-built pre-aggregated table named persisted.agg_transactions that contains 2 columns:

  • created_at_day
  • count_transactions

We can define the PreAggregate like this:

Dataset ecommerce {
// ... other settings
pre_aggregates: {
agg_transactions: PreAggregate {
dimension created_at_day { // This dimension name (`created_at_day`) must match the column name in the pre-aggregated table (`persisted.agg_transactions`)
for: ref('transactions', 'created_at'),
time_granularity: "day"
}
measure count_transactions { // This measure name (`count_transactions`) must match the column name in the pre-aggregated table (`persisted.agg_transactions`)
for: ref('transactions', 'id')
aggregation_type: 'count'
}

// Map to external pre-aggregated table using ExternalPersistence
persistence: ExternalPersistence {
table_name: 'persisted.agg_transactions'
}
},
}
}
NOTES
  • Ensure that the names of dimension and measure of your PreAggregate are the same as the column names in your external pre-aggregated tables.

Tip: Leverage AML Reusability

It is a common need to create different PreAggregates for different time granularities so that you can configure more efficient persistence pipelines.

To conveniently generate multiple PreAggregates for different time granularities, you can leverage AML Reusability.

Build multiple PreAggregates using AML Extend is a great example for you to get started!


Let us know what you think about this document :)