Quick Start
To set up Aggregate Awareness in Holistics, you can follow these two steps
- Defining pre-aggregate in the dataset
- Assigning the pre-aggregated tables by either:
- Creating aggregated tables using Holistics Persistence, OR
- Bringing in external pre-aggregated tables created in advance using data tools like dbt, Airflow, etc.
Define the pre-aggregate
In the dataset file, add the pre_aggregates
config to map model dimensions and measures to the respective pre-aggregated table fields.
Dataset ecommerce {
//... other dataset settings
models: [transactions]
// Add pre-aggregate
pre_aggregates: {
agg_transactions: PreAggregate {
// This dimension is in the pre-aggregated table `agg_transactions`
dimension created_at_day {
// It references the field `created_at` in model `transactions`
for: ref('transactions', 'created_at'),
time_granularity: "day"
}
dimension status {
for: ref('transactions', 'status')
}
dimension country {
for: ref('transactions', 'country')
}
dimension city {
for: ref('transactions', 'city')
}
measure count_transactions {
for: ref('transactions', 'id')
aggregation_type: 'count'
}
},
}
}
Persist the aggregated table
The above setup only defines the mapping between your model fields and the pre-aggregate fields. In order for queries to use the pre-aggregate, an actual pre-aggregated table must be created/persisted in your Data Warehouse.
To define how the pre-aggregate is persisted, specify the persistence
attribute in your PreAggregate.
Using Holistics built-in persistence
Holistics supports the creation of pre-aggregate tables through Holistics built-in Persistence:
- You specify a writeable schema in the data warehouse.
- Holistics handles persisting and refreshing of tables behind the scenes.
For example, you can use a FullPersistence
config for your PreAggregate's persistence
:
pre_aggregates: {
transactions_agg: PreAggregate {
//... other pre-aggregate configs
// PERSISTENCE CONFIG
persistence: FullPersistence {
schema: 'persisted'
}
}
}
This config instructs Holistics to use Full Persistence to create pre-aggregated tables for transactions_agg
in the schema named persisted
in your Data Warehouse.
Make sure that:
- The schema (e.g.
persisted
) exists in the data warehouse. - Holistics connection has proper permission to create a table and write data to this schema.
Then, you will need to run the persistence process.
Trigger persistence manually
To manually trigger the table creation (persistence process), go to Dataset’s Pre-aggregate section in the List tab to invoke the Run button and wait for it to finish.
Set up scheduled refreshes
When the original tables change, the pre-aggregate could become outdated. Instead of just using UI to manually trigger the persistence, you can use a schedule trigger to define the pre-aggregate cadence.
In Holistics, you can create a schedules.aml
file at the root of your AML project:
const schedules = [
// Schedule for specific PreAggregates in a Dataset
PreAggregateSchedule {
cron: '15 20 * * *'
object: ecommerce_dataset
pre_aggregates: ['agg_transactions'] // persist the PreAggregate 'agg_transactions' only
}
// Schedule for all PreAggregates in a Dataset
PreAggregateSchedule {
cron: '15 20 * * *'
object: ecommerce_dataset
}
]
You can find the full syntax reference here.
Trigger via API (coming soon)
Alternatively, you can also trigger to create and refresh pre-aggregated tables using API (coming soon).
Learn more about Holistics Built-in Pre-Aggregate Persistence here.
Using externally-built tables
If you prefer to build the aggregated tables outside of Holistics (with tools like Airflow, dbt, etc.),
you can use ExternalPersistence
to map those tables to your PreAggregate definitions.
For example, given that we have an externally-built pre-aggregated table named persisted.agg_transactions
that contains 2 columns:
created_at_day
count_transactions
We can define the PreAggregate like this:
Dataset ecommerce {
// ... other settings
pre_aggregates: {
agg_transactions: PreAggregate {
dimension created_at_day { // This dimension name (`created_at_day`) must match the column name in the pre-aggregated table (`persisted.agg_transactions`)
for: ref('transactions', 'created_at'),
time_granularity: "day"
}
measure count_transactions { // This measure name (`count_transactions`) must match the column name in the pre-aggregated table (`persisted.agg_transactions`)
for: ref('transactions', 'id')
aggregation_type: 'count'
}
// Map to external pre-aggregated table using ExternalPersistence
persistence: ExternalPersistence {
table_name: 'persisted.agg_transactions'
}
},
}
}
- Ensure that the names of
dimension
andmeasure
of your PreAggregate are the same as the column names in your external pre-aggregated tables.
Tip: Leverage AML Reusability
It is a common need to create different PreAggregates for different time granularities so that you can configure more efficient persistence pipelines.
To conveniently generate multiple PreAggregates for different time granularities, you can leverage AML Reusability.
Build multiple PreAggregates using AML Extend is a great example for you to get started!