Skip to main content

Level of Detail (LoD)

Knowledge Checkpoint

This documentation assumes you are familiar with the following concepts:

Introduction

Normally in Holistics, when you drag and drop dimensions and measures in the Data Exploration UI, the measures will be evaluated against the dimensions you included in the visualization:

However, there are many situations where you may want your metrics to be evaluated against a different set of dimensions than what you included in the visualization.

For example, to calculate and display the percentage of orders that each country contributed to the whole, we would want to divide each country’s orders count by the sum of all orders across all countries. The denominator will be the number of orders counted across all countries, in other words, it is not calculated against the Countries dimension that we have included in the visualization.

Or, you may want to calculate a metric of a certain entity (like the AOV of each customer) and analyze it as if it is a typical dimension.

For example, in this map we are showing the countries colored by the highest possible customer AOV in that country:

Map of countries and their highest possible customer AOV. AOV is a metric aggregated to the customer grain.

In this document, we will provide you with a high-level overview of how Holistics and AQL help you handle these use cases.

Some basic concepts

Before we dive into the details, we will need to establish some concepts and terminologies:

  • Level of Detail (LoD for short): This term expresses the property of your data, as in “how much detail your data provides”. For example, in an E-commerce dataset:
    • The order_items table has a high level of detail since it provides information down to the items level.
    • If you aggregate the data into a table called monthly_sales, the table has a low level of detail, since we will not know anything about each individual order from this table.
  • Visualization LoD: The level of details, or grains, that you include in your visualization/reports
  • Calculation LoD: The level of details, or grains, that you use to calculate your metrics. Normally it will be the same as the Visualization LoD, but in some cases, we would want the Calculation LoD to be different.

Currently, Holistics AQL supports the calculation on different LoD via the following features:

  • of_all() function: Used when your Calculation LoD is lower than Visualization LoD
  • Nested Aggregation: Used when your Calculation LoD is higher than Visualization LoD
  • exact_grains() function: Used when you want your metrics to always be evaluated against certain dimensions, and turn the metric into a dimension itself.

In the next part, we will go through the usage of each feature.

Real use cases

In this section, we will work on a hypothetical Ecommerce dataset with the following models:

Use case 1: Involving a lower LoD - of_all() function

To answer the question “What is the order contribution of each country toward the total orders count of the company?”, we will need to perform the following steps:

  1. Count the orders of each country:
  2. Count the orders of the whole company across all countries
  3. Divide the result of (1) by (2)

In this example, you have the same calculation (count of orders) aggregated at different levels of detail - higher LoD in (1), and lower LoD in (2). To calculate this in Holistics’s AQL, you will need to use the of_all() (alias: exclude_grains()) function to exclude the Country dimension from the calculation.

To achieve this, in the modeling layer, your orders model may look like this:

Model orders {
dimension id {}
...

measure total_orders {
label: 'Total Orders'
type: 'number'
definition: @aql count(orders.id) ;;
}

measure total_orders_of_all_countries {
label: 'Total Orders of all Countries'
type: 'number'
definition: @aql orders.total_orders | of_all(countries) ;;
}

measure country_orders_pct {
label: 'Country Orders Pct'
type: 'number'
definition: @aql orders.total_orders * 1.0 / orders.total_orders_of_all_countries ;;
}
}

And when visualizing via the Dataset Exploration UI, you just need to include the Country Name, Total Orders and Total Orders of all Countries fields.

If you write your exploration using AQL:

orders 
| group(countries.name)
| select(
country_name: countries.name,
total_orders: count(orders, orders.id),
total_orders_of_all_countries: count(orders, orders.id) | of_all(countries),
)
| select(
country_name,
total_orders,
total_orders_of_all_countries,
country_order_pct: total_orders * 1.0 / total_orders_of_all_countries
)

For a more detailed guide of how to use of_all() , please check out the Percent of Total guide.

Use case 2: Involving a higher LoD - Nested Aggregation

To answer the question “What is the maximum AOV of a customer in a particular country?”, we will need to perform the following calculations:

  1. Calculate the AOV (average order value) of each customer
  2. Find the max/min of that AOV value, grouped by country
Map of countries and their highest possible customer AOV. AOV is a metric aggregated to the customer grain.

Here we have to work with the metric AOV at different LoD: lower LoD in 1, and higher LoD in 2. (opposite to Use case 1). To solve this in AQL, we will use the Nested Aggregation feature:

In the users model, you can create an aov measure:

Model users {
label: 'Users'
...

measure aov {
label: 'Customer AOV'
type: 'number'
definition: @aql
sum(order_items, order_items.quantity * products.price) * 1.0 / count_distinct(orders.id)
;;
}
}

And in the Ecommerce dataset, you create a max_user_aov metric that has the measure users.aov nested inside:

Dataset ecommerce {
label: 'Ecommerce'
...

metric max_user_aov {
label: 'Max User AOV'
type: 'number'
definition: @aql users | group(users.id) | select(user_aov: users.aov) | max() ;;
}
}

In the Dataset Exploration UI, when you include the Country Name and the Max User AOV fields, the following things happen:

  1. The AOV value is calculated against the users.id dimension
  2. The max() function will find the maximum AOV, and this is evaluated against the Country Name dimension in the visualization.

You can express this nested aggregation logic in AQL like below:

order_items
| group(countries.name)
| select(
country_name: countries.name,
max_customer_aov:
max(
order_items
| group(users.id)
| select(
aov: sum(order_items, order_items.quantity * products.price) * 1.0
/ count_distinct(order_items, order_items.order_id)
)
)
)

Use case 3: Involving a fixed LoD - exact_grains()

There are cases when you have a metric that will always be calculated against some specific dimensions, no matter what other dimensions you may include in the visualization. You may want to use that metric as a dimension, or as a “property” of a certain grain. For example:

  • The Average Order Value of a customer can be a dimension by itself
  • The last-three-month return rate of a certain product
  • The “cohort size”, or the number of users registered in a particular month in a cohort analysis

To achieve these effects, you can use the exact_grains() function to turn a measure into a dimension.

In Use case 2, we have defined aov as a measure in the users model. We can also turn it into a dimension:

Model users {
label: 'Users'
...

dimension aov_dim {
label: 'Customer AOV'
type: 'number'
definition: @aql
sum(order_items, order_items.quantity * products.price) * 1.0
/ count_distinct(orders.id) | exact_grains(users.id)
;;
}
}

And then, in the dataset the max_user_aov metric can be greatly simplified:

Dataset ecommerce {
label: 'Ecommerce'
...

metric max_user_aov {
label: 'Max User AOV'
type: 'number'
definition: @aql max(users.aov_dim) ;;
}
}

This is the result:

Although the result of the dimensionalized measure and the nested measure looks similar, actually they are not the same. They will be affected by measures in different ways.

For example, if we add a "orders.created_at matches 'last month'" filter condition:

These are the different effects of the filter:

  • The measure Max of (AOV Measure) returns the maximum AOV calculated on the orders that were placed last month..
  • The measure Max of (AOV Dimension) returns the maximum AOV calculated on all orders (of all time) that belongs to users having placed an order last month.

Choosing between Nested Aggregation and exact_grains()

As we can see in the example above, on the surface Nested Aggregation and Dimensionalized Measure can produce similar result, but their actual behaviors are very different. So, here are rules of thumb when to use what feature:

  • You should use exact_grains() when you want to use the measure as a group-able dimension. This is particularly useful in use cases creating bins in histograms, doing cohort analysis...
  • Other than that, it is better and more intuitive to stick to Nested Aggregation.

Let us know what you think about this document :)