Aggregate Awareness Capabilities
In this section, we will explore the capabilities of Holistics Aggregate Awareness in various scenarios.
Dimension Awareness
Being aware of the dimensions and their granularity used in your queries, Holistics can automatically substitute PreAggregates with accurate dimensions and granularity into your queries.
For example, Holistics will automatically use the pre-aggregated table agg_transactions
to answer all of these below queries instead of the raw_transactions
table.
- Count transactions by day and status
- Count transactions by day, status, and country
- Count transactions by day, status, country, and city
- etc.
Join Awareness (Smart Join Substitution)
Aggregate Awareness is also smart enough to perform aggregations with relationship joins if the Pre-Aggregate
- contains the foreign key of a Many-to-One relationship
- and is on the "Many" side of that relationship
For example, let's say in our Pre-Aggregate, we add the dimension transactions.user_id
which has a Many-To-One relationship to the users
model.
Dataset ecommerce {
//... other settings
models: [
transactions,
users
]
pre_aggregates: {
agg_transactions: PreAggregate {
//... other settings
dimension created_at_day {
for: ref('transactions', 'created_at'),
time_granularity: "day"
}
measure count_transactions {
for: ref('transactions', 'id')
aggregation_type: 'count'
}
// ADD THIS
dimension user_id {
for: ref('transactions', 'user_id')
}
persistence: FullPersistence {
schema: 'persisted'
}
},
}
}
//------------------------------------------------------------
// Existing setup
Model transactions {
//... other settings
table_name: 'raw_transactions'
}
Model users {
//... other settings
dimension gender
dimension age_group
table_name: 'users'
}
Relationship: transactions.user_id > users.id;
The following queries will automatically use the pre-aggregated table and perform a join with users
:
- Count transactions by gender
- Count transactions by country, age_group
even when your pre-aggregate does not contain gender, country, or age_group dimensions!
Time Granularity Awareness
Holistics can identify the pre-aggregated table at the right time granularity level based on the option that you configure in the pre-aggregate (minute, hour, day, week, month, quarter, year)
For example, in the quick example above, you can make use of the agg_transactions
at the day level to get the total transactions count at the week or month level, etc.
These queries would use pre-aggregated table agg_transactions
at day level and take additional time to aggregate from day to week or month (although it is still faster than using the raw table).
However, if you use data at the week level frequently and want to optimize the query time faster, try to create another pre-aggregated table at coarser-grained granularity (e.g., agg_transactions_week
)
Map it with the transactions model so that the engine will be aware of it when possible
Dataset ecommerce {
//... other settings
models: [
transactions
]
pre_aggregates: {
// BEGIN NEW CODE
agg_transactions_week: PreAggregate {
dimension created_at_week {
for: ref('transactions', 'created_at'),
time_granularity: "week"
}
//... other pre-aggregate settings
},
// END NEW CODE
agg_transactions: PreAggregate {
//... pre-aggregate settings
},
}
}
For now, when you query the total weekly transactions count, it will use the pre-aggregated table agg_transactions_week
leading to faster results than just using agg_transactions
table.
In the current version, if there are multiple pre-aggregates eligible for a query, the (eligible) pre-aggregate that is defined first in the dataset will be chosen to substitute into the query.
We recommend that you put the coarser-grained pre-aggregates before the finer-grained pre-aggregates in the dataset configuration. (e.g. “month” first, then “week”, then “day”)
Metric Awareness
Holistics is also aware of and leverages pre-aggregates to calculate more complex metrics.
We already have count_transactions: count(transactions.id)
pre-aggregated by week. Then, we want to explore the running count of transactions over the weeks to see the accumulative growth. In Holistics, you can simply define a metric by a single AQL formula:
count(transactions.id) | running_total(transactions.created_at)
Aggregate Awareness will use the pre-aggregated measure count_transactions
at week time granularity to calculate the running total. It saves us a significant amount of computational time.
Measures Types in Pre-aggregates
Learn more about Additive and Non-additive Measures
Depending on the measure type being used in pre-aggregate and whether or not you re-aggregate the measure in the current query, the pre-aggregated tables will be hit or missed.
Additive measures
Aggregate awareness can be used to re-aggregate with these types of additive measures
- sum
- count
- min
- max
For example, a pre-aggregate of "Sum Revenue by Month" can also be used for the query "Sum Revenue by Year".
Non-additive measures
- Pre-aggregated non-additive measures can only match the queries that use the exact same dimensions/granulariy. Here are some non-additive measures:
- count distinct
- median
- stddev
- variance
- avg
- For example, a pre-aggregate of "Count Distinct User by Month" cannot be used for the query "Count Distinct User by Year".
- In the case of
avg
, you can actually split (decompose) it into two additive measures:sum
andcount
. If your PreAggregate hassum
andcount
measures, Aggregate Awareness can automatically pick up those measures to calculate theavg
.
// For `avg`
// ❗ Instead of pre-aggregating `avg` directly
measure avg_a {
for: ref('model', 'a')
aggregation_type: 'avg'
}
// ✅ Pre-aggregate on `sum` and `count
measure sum_a {
for: ref('model', 'a')
aggregation_type: 'sum'
}
measure count_a {
for: ref('model', 'a')
aggregation_type: 'count'
}