Skip to main content

Quick Start

To set up Aggregate Awareness in Holistics, you can follow these two steps

  • Defining pre-aggregate in the dataset
  • Assigning the pre-aggregated tables by either:
    • Creating aggregated tables using Holistics Persistence, OR
    • Bringing in external pre-aggregated tables created in advance using data tools like dbt, Airflow, etc.

Define the pre-aggregate

In the dataset file, add the pre_aggregates config to map model dimensions and measures to the respective pre-aggregated table fields.

Dataset ecommerce {
//... other dataset settings

models: [transactions]

// Add pre-aggregate
pre_aggregates: {
agg_transactions: PreAggregate {
// This dimension is in the pre-aggregated table `agg_transactions`
dimension created_at_day {
// This reference is in the `transactions` model
for: ref('transactions', 'created_at'),
time_granularity: "day"
}
dimension status {
for: ref('transactions', 'status')
}
dimension country {
for: ref('transactions', 'country')
}
dimension city {
for: ref('transactions', 'city')
}
measure count_transactions {
for: ref('transactions', 'count_transactions')
aggregation_type: 'count'
}
persistence: FullPersistence {
schema: 'persisted'
}
},
}
}

Persist the aggregated table

The above setup assumes there are no physical tables created inside the data warehouse. In order for queries to use the aggregated table, an actual pre-aggregated table must be created.

Holistics supports the creation of this table through Holistics Persistence:

  • You specify a writeable schema in the data warehouse.
  • Holistics handles persisting and refreshing of tables behind the scenes.
  pre_aggregates: {
transactions_agg: PreAggregate {
//... other pre-aggregate config

// PERSISTENCE CONFIG
persistence: FullPersistence {
schema: 'persisted'
}
}
}
NOTES

Make sure that:

Create table manually

To manually trigger the table creation (persistence process), go to Dataset’s Pre-aggregate section in the List tab to invoke the Run button and wait for it to finish.

Set up scheduled refreshes

When the original tables change, the pre-aggregate could become outdated. Instead of just using UI to manually trigger the persistence, you can use a schedule trigger to define the pre-aggregate cadence.

In Holistics, you can create a schedules.aml file at the root of your AML project:

const schedules = [
// Schedule for specific PreAggregates in a Dataset
PreAggregateSchedule {
cron: '15 20 * * *'
object: ecommerce_dataset
pre_aggregates: ['agg_transactions'] // persist the PreAggregate 'agg_transactions' only
}

// Schedule for all PreAggregates in a Dataset
PreAggregateSchedule {
cron: '15 20 * * *'
object: ecommerce_dataset
}
]

You can find the full syntax reference here.

Trigger via API (coming soon)

Alternatively, you can also trigger to create and refresh pre-aggregated tables using API (coming soon).

Work with externally-built tables

If you have a pipeline that builds the aggregated tables outside of Holistics (with tools like Airflow, dbt, etc.), you only need to map the table name to the pre-aggregate definition.

Use ExternalPersistence to let Holistics know you’re using an external table.

Dataset ecommerce {
// ... other settings
pre_aggregates: {
agg_transactions: PreAggregate {
//... other settings
dimension created_at_day
measure count_transactions

// Map external pre-aggregated table with this option
persistence: ExternalPersistence {
table_name: 'persisted.agg_transactions'
}
},
}
}

Tip: Leverage AML Reusability

It is a common need to create different PreAggregates for different time granularities so that you can configure more efficient persistence pipelines.

To conveniently generate multiple PreAggregates for different time granularities, you can leverage AML Reusability.

Build multiple PreAggregates using AML Extend is a great example for you to get started!


Let us know what you think about this document :)