Skip to main content

Aggregate Awareness Capabilities

In this section, we will explore the capabilities of Holistics Aggregate Awareness in various scenarios.

Dimension Awareness

info

Being aware of the dimensions and their granularity used in your queries, Holistics can automatically substitute PreAggregates with accurate dimensions and granularity into your queries.

For example, Holistics will automatically use the pre-aggregated table agg_transactions to answer all of these below queries instead of the raw_transactions table.

  • Count transactions by day and status
  • Count transactions by day, status, and country
  • Count transactions by day, status, country, and city
  • etc.
Awareness Capabilities - Dimension awareness

Join Awareness (Smart Join Substitution)

info

Aggregate Awareness is also smart enough to perform aggregations with relationship joins if the Pre-Aggregate

  • contains the foreign key of a Many-to-One relationship
  • and is on the "Many" side of that relationship

For example, let's say in our Pre-Aggregate, we add the dimension transactions.user_id which has a Many-To-One relationship to the users model.

Awareness Capabilities - Join awareness - Add a foreign key to the pre-aggregated table
Dataset ecommerce {
//... other settings
models: [
transactions,
users
]

pre_aggregates: {
agg_transactions: PreAggregate {
//... other settings
dimension created_at_day
measure count_transactions

// ADD THIS
dimension user_id {
for: ref('transactions', 'user_id')
}

persistence: FullPersistence {
schema: 'persisted'
}
},
}
}

//------------------------------------------------------------
// Existing setup

Model transactions {
//... other settings
table_name: 'raw_transactions'
}

Model users {
//... other settings
dimension gender
dimension age_group
table_name: 'users'
}

Relationship: transactions.user_id > users.id;

The following queries will automatically use the pre-aggregated table and perform a join with users:

  • Count transactions by gender
  • Count transactions by country, age_group

even when your pre-aggregate does not contain gender, country, or age_group dimensions!

Awareness Capabilities - Join awareness - Query with pre-aggregate and join with foreign key to extend the dimension awareness capability

Time Granularity Awareness

info

Holistics can identify the pre-aggregated table at the right time granularity level based on the option that you configure in the pre-aggregate (minute, hour, day, week, month, quarter, year)

For example, in the quick example above, you can make use of the agg_transactions at the day level to get the total transactions count at the week or month level, etc.

Awareness Capabilities - Time awareness - Re-aggregate from day to week

These queries would use pre-aggregated table agg_transactions at day level and take additional time to aggregate from day to week or month (although it is still faster than using the raw table).

However, if you use data at the week level frequently and want to optimize the query time faster, try to create another pre-aggregated table at coarser-grained granularity (e.g., agg_transactions_week)

Awareness Capabilities - Time awareness - Create pre-aggregated table at week

Map it with the transactions model so that the engine will be aware of it when possible

Awareness Capabilities - Time awareness - Update the pre-aggregate
Dataset ecommerce {
//... other settings
models: [
transactions
]

pre_aggregates: {
// BEGIN NEW CODE
agg_transactions_week: PreAggregate {
dimension created_at_week {
for: ref('transactions', 'created_at'),
time_granularity: "week"
}
//... other pre-aggregate settings
},
// END NEW CODE

agg_transactions: PreAggregate {
//... pre-aggregate settings
},
}
}

For now, when you query the total weekly transactions count, it will use the pre-aggregated table agg_transactions_week leading to faster results than just using agg_transactions table.

Awareness Capabilities - Time awareness- Query with pre-aggregate at week
NOTICE

In the current version, if there are multiple pre-aggregates eligible for a query, the (eligible) pre-aggregate that is defined first in the dataset will be chosen to substitute into the query.

We recommend that you put the coarser-grained pre-aggregates before the finer-grained pre-aggregates in the dataset configuration. (e.g. “month” first, then “week”, then “day”)

Metric Awareness

info

Holistics is also aware of and leverages pre-aggregates to calculate more complex metrics.

We already have count_transactions: count(transactions.id) pre-aggregated by week. Then, we want to explore the running count of transactions over the weeks to see the accumulative growth. In Holistics, you can simply define a metric by a single AQL formula:

count(transactions.id) | running_total(transactions.created_at)

Aggregate Awareness will use the pre-aggregated measure count_transactions at week time granularity to calculate the running total. It saves us a significant amount of computational time.

Awareness Capabilities - Metric Awareness

Measures Types in Pre-aggregates

KNOWLEDGE CHECKPOINT

Depending on the measure type being used in pre-aggregate and whether or not you re-aggregate the measure in the current query, the pre-aggregated tables will be hit or missed.

Additive measures

  • Aggregate awareness can be used to re-aggregate with these types of additive measures

    • sum
    • count
    • min
    • max
  • For example, a pre-aggregate of "Sum Revenue by Month" can also be used for the query "Sum Revenue by Year".

Non-additive measures

  • Pre-aggregated non-additive measures can only match the queries that use the exact same dimensions/granulariy. Here are some non-additive measures:
    • count distinct
    • median
    • stddev
    • variance
    • avg
  • For example, a pre-aggregate of "Count Distinct User by Month" cannot be used for the query "Count Distinct User by Year".
  • In the case of avg, you can actually split (decompose) it into two additive measures: sum and count. If your PreAggregate has sum and count measures, Aggregate Awareness can automatically pick up those measures to calculate the avg.
// For `avg`

// ❗ Instead of pre-aggregating `avg` directly
measure avg_a {
for: ref('model', 'a')
aggregation_type: 'avg'
}

// ✅ Pre-aggregate on `sum` and `count
measure sum_a {
for: ref('model', 'a')
aggregation_type: 'sum'
}
measure count_a {
for: ref('model', 'a')
aggregation_type: 'count'
}

Let us know what you think about this document :)