Create Nested Aggregation
A grasp of these concepts will help you understand this documentation better:
Introduction
Often times, an analytics problem does not stop at the first aggregation. For example, the following questions will require you to continue to aggregate the initial aggregation to a higher grain, for example:
- How does the average monthly signups change over the years?
- What is the maximum AOV of a customer in a particular country?
In this guide, we will walk through the steps to answer these questions using the Nested Aggregation feature in Holistics. As usual, we will work with the familiar Ecommerce dataset.
Example 1: Average of Monthly Signups over Years
Setup
You may see the use of measure
in code snippets throughout this article. In Holistics, metrics defined inside data models are syntactically referred to as measure
. We will use the word "metric" outside of code snippets to avoid confusions. Read more about Metrics here
In this first simple example, we will only work with the users
model in the ecommerce
dataset:
- Initial Setup
- Final Setup
// users.model.aml
Model users {
...
dimension id {...}
dimension sign_up_at {...}
}
DataSet e_commerce {
...
models: [users]
}
// users.model.aml
Model users {
...
dimension id {...}
dimension sign_up_at {...}
// Note that metrics defined in models are called measures
measure count_users {
label: 'Count Users'
type: 'number'
definition: @aql users | count(users.id) ;;
}
measure avg_of_monthly_count {
label: 'Avg. of Monthly Signups'
type: 'number'
definition: @aql users
| group(month(users.sign_up_at))
| select(users.count_users)
| average()
;;
}
}
DataSet e_commerce {
...
models: [users]
}
High-level flow
- Create the first aggregation: In the
users
model, create thecount_users
metric - Create the second aggregation: Continue to create the
avg_of_monthly_singups
metric that specify the grouping grain to sign up month, and average thecount_users
metric - Use the second aggregation with a higher grain: In the dataset exploration UI, pull in
avg_of_monthly_signups
and thesign_up_at
transformed to the Year grain.
Implementation
1. Create the first aggregation
In the users
model, create a simple count_users
metric:
Model users {
...
// Note that metrics defined in models are called measures
measure count_users {
label: 'Count Users'
type: 'number'
definition: @aql users | count(users.id) ;;
}
}
2. Create the second aggregation.
The logic of the avg_of_monthly_count
metric is as follows:
users | group(month(users.sign_up_at)) | select(users.count_users)
: This part is to specifyusers
as the root model to start the calculation from. Next, we select thecount_users
metric, and group it using the month grainaverage()
: The result of the previous part will be piped into this function and averaged.
In this new metric, the count_users
metric will always be a count by month, and then the monthly count will be averaged across the new dimension that we include when exploring data.
Model users {
...
dimension id {...}
dimension sign_up_at {...}
// Note that metrics defined in models are called measures
measure count_users {...}
measure avg_of_monthly_count {
label: 'Avg. of Monthly Signups'
type: 'number'
definition: @aql users
| group(month(users.sign_up_at))
| select(users.count_users)
| average()
;;
}
}
3. Use the second aggregation with a higher grain
To see the effect of your setup, we can pull in the avg_of_monthly_count
metric with the sign_up_at
dimension which was transformed to the Year grain:
Example 2: Maximum of User’s AOV in a Country
Setup
In this example, we will make use of the following models: order_items
, products
, users
, cities
- Initial Setup
- Final Setup
// order_items.model.aml
Model order_items {
...
dimension id {...}
dimension product_id {...}
dimension quantity {...}
}
// products.model.aml
Model products {
...
dimension id {...}
dimension price {...}
}
// users.model.aml
Model users {
...
dimension id {...}
dimension city_id {...}
}
// cities.model.aml
Model cities {
...
dimension id {...}
dimension country_name {...}
}
// e_commerce.dataset.aml
DataSet e_commerce {
...
models: [
order_items,
products,
users,
cities
]
relationships: [
relationship(order_items.user_id > users.id, true),
relationship(order_items.product_id > products.id, true),
relationship(users.city_id > cities.id, true)
]
}
// order_items.model.aml
Model order_items {
...
dimension id {...}
dimension product_id {...}
dimension quantity {...}
}
// products.model.aml
Model products {
...
dimension id {...}
dimension price {...}
}
// users.model.aml
Model users {
...
dimension id {...}
dimension city_id {...}
}
// cities.model.aml
Model cities {
...
dimension id {...}
dimension country_name {...}
}
// e_commerce.dataset.aml
DataSet e_commerce {
...
models: [
order_items,
products,
users,
cities
]
relationships: [
relationship(order_items.user_id > users.id, true),
relationship(order_items.product_id > products.id, true),
relationship(users.city_id > cities.id, true)
]
metric aov {
label: 'AOV'
type: 'number'
definition: @aql
sum(order_items, order_items.quantity * ecommerce_products.price) /
count_distinct(order_items.order_id);;
}
metric max_user_aov {
label: 'Max User AOV'
type: 'number'
definition: @aql order_items
| group(users.id)
| select(aov)
| max() ;;
}
}
High-level flow
- Create the first aggregation: We will create an
aov
metric that calculates the total value of orders divided by the number of orders - Create the second aggregation: We will create a
max_of_user_aov
metric that group theaov
metric by the user grain, and then find the maximum of the result. - Use the second aggregation with a higher grain: To see the effect of the nested aggregation, we use the
max_of_user_aov
with the Country dimension (which represents a coarser grain comparing to the users grain)
Implementation
1. Create the first aggregation
In the e_commerce
dataset, create an aov
metric that combine fields from the order_items
and products
model:
DataSet e_commerce {
...
metric aov {
label: 'AOV'
type: 'number'
definition: @aql sum(order_items, order_items.quantity * ecommerce_products.price) /
count_distinct(order_items.order_id);;
}
}
2. Create the second aggregation
After having the base aov
metric, we create the max_user_aov
metric that:
- Specify
order_items
as the root model that we start the calculation from - Select the
aov
metric we created earlier, and group it to theusers
grain - Finally, get the max value of the result
metric max_user_aov {
label: 'Max User AOV'
type: 'number'
definition: @aql order_items
| group(users.id)
| select(aov)
| max() ;;
}
For more details on how cross-model calculations works, please refer to Cross-Model Calculation.
3. Use the second aggregation with a higher grain
Finally, to see the effect of the second aggregation, we need to use it with a dimension of higher grain than the one that we used in the group()
function. In this case, we use the Country dimension to check which country has the highest possible maximum AOV: