Skip to main content

Create Nested Aggregation

Knowledge Checkpoint

A grasp of these concepts will help you understand this documentation better:

Introduction

Often times, an analytics problem does not stop at the first aggregation. For example, the following questions will require you to continue to aggregate the initial aggregation to a higher grain, for example:

  • How does the average monthly signups change over the years?
  • What is the maximum AOV of a customer in a particular country?

In this guide, we will walk through the steps to answer these questions using the Nested Aggregation feature in Holistics. As usual, we will work with the familiar Ecommerce dataset.

Example 1: Average of Monthly Signups over Years

Setup

Measures and Metrics

You may see the use of measure in code snippets throughout this article. In Holistics, metrics defined inside data models are syntactically referred to as measure. We will use the word "metric" outside of code snippets to avoid confusions. Read more about Metrics here

In this first simple example, we will only work with the users model in the ecommerce dataset:

// users.model.aml
Model users {
...
dimension id {...}
dimension sign_up_at {...}
}

DataSet e_commerce {
...
models: [users]
}

High-level flow

  1. Create the first aggregation: In the users model, create the count_users metric
  2. Create the second aggregation: Continue to create the avg_of_monthly_singups metric that specify the grouping grain to sign up month, and average the count_users metric
  3. Use the second aggregation with a higher grain: In the dataset exploration UI, pull in avg_of_monthly_signups and the sign_up_at transformed to the Year grain.

Implementation

1. Create the first aggregation

In the users model, create a simple count_users metric:

Model users {
...
// Note that metrics defined in models are called measures
measure count_users {
label: 'Count Users'
type: 'number'
definition: @aql users | count(users.id) ;;
}
}

2. Create the second aggregation.

The logic of the avg_of_monthly_count metric is as follows:

  • users | group(month(users.sign_up_at)) | select(users.count_users): This part is to specify users as the root model to start the calculation from. Next, we select the count_users metric, and group it using the month grain
  • average(): The result of the previous part will be piped into this function and averaged.

In this new metric, the count_users metric will always be a count by month, and then the monthly count will be averaged across the new dimension that we include when exploring data.

Model users {
...
dimension id {...}
dimension sign_up_at {...}

// Note that metrics defined in models are called measures
measure count_users {...}

measure avg_of_monthly_count {
label: 'Avg. of Monthly Signups'
type: 'number'
definition: @aql users
| group(month(users.sign_up_at))
| select(users.count_users)
| average()
;;
}
}

3. Use the second aggregation with a higher grain

To see the effect of your setup, we can pull in the avg_of_monthly_count metric with the sign_up_at dimension which was transformed to the Year grain:

Example 2: Maximum of User’s AOV in a Country

Setup

In this example, we will make use of the following models: order_items, products, users, cities

// order_items.model.aml
Model order_items {
...
dimension id {...}
dimension product_id {...}
dimension quantity {...}
}

// products.model.aml
Model products {
...
dimension id {...}
dimension price {...}
}

// users.model.aml
Model users {
...
dimension id {...}
dimension city_id {...}
}

// cities.model.aml
Model cities {
...
dimension id {...}
dimension country_name {...}
}

// e_commerce.dataset.aml
DataSet e_commerce {
...

models: [
order_items,
products,
users,
cities
]

relationships: [
relationship(order_items.user_id > users.id, true),
relationship(order_items.product_id > products.id, true),
relationship(users.city_id > cities.id, true)
]
}

High-level flow

  1. Create the first aggregation: We will create an aov metric that calculates the total value of orders divided by the number of orders
  2. Create the second aggregation: We will create a max_of_user_aov metric that group the aov metric by the user grain, and then find the maximum of the result.
  3. Use the second aggregation with a higher grain: To see the effect of the nested aggregation, we use the max_of_user_aov with the Country dimension (which represents a coarser grain comparing to the users grain)

Implementation

1. Create the first aggregation

In the e_commerce dataset, create an aov metric that combine fields from the order_items and products model:

DataSet e_commerce {
...
metric aov {
label: 'AOV'
type: 'number'
definition: @aql sum(order_items, order_items.quantity * ecommerce_products.price) /
count_distinct(order_items.order_id);;
}
}

2. Create the second aggregation

After having the base aov metric, we create the max_user_aov metric that:

  • Specifyorder_items as the root model that we start the calculation from
  • Select the aov metric we created earlier, and group it to the users grain
  • Finally, get the max value of the result
metric max_user_aov {
label: 'Max User AOV'
type: 'number'
definition: @aql order_items
| group(users.id)
| select(aov)
| max() ;;
}

For more details on how cross-model calculations works, please refer to Cross-Model Calculation.

3. Use the second aggregation with a higher grain

Finally, to see the effect of the second aggregation, we need to use it with a dimension of higher grain than the one that we used in the group() function. In this case, we use the Country dimension to check which country has the highest possible maximum AOV:


Let us know what you think about this document :)