Level of Detail (LoD)
Introduction
In metric-based BI tools like Holistics, metrics are the foundational elements of your reports and dashboards. You begin by defining these metrics, and then you can create reports and dashboards by slicing and dicing them with various dimensions.
Using more dimensions increases the granularity of the metrics, providing finer details and resulting in more rows of data. Conversely, using fewer dimensions leads to lower granularity, yielding coarser details and fewer rows. This concept is referred to as the level of detail applied to the metric.
Example: Various levels of detail applied to a metric
Normally in Holistics, when you drag and drop dimensions and measures in the Data Exploration UI, the measures will be evaluated against the dimensions you included in the visualization:
However, there are many situations where you may want your metrics to be evaluated against a different set of dimensions than what you included in the visualization.
For example, to calculate and display the percentage of orders that each country contributed to the whole, we would want to divide each country’s orders count by the sum of all orders across all countries. The denominator will be the number of orders counted across all countries, in other words, it is not calculated against the Countries dimension that we have included in the visualization.
Or, you may want to calculate a metric of a certain entity (like the AOV of each customer) and analyze it as if it is a typical dimension.
For example, in this map we are showing the countries colored by the highest possible customer AOV in that country:
In this document, we will provide you with a high-level overview of how Holistics and AQL help you handle these use cases, by allowing control over the level of detail applied to a metric.
Some basic concepts
Before we dive into the details, we will need to establish some concepts and terminologies:
- Level of Detail (LoD for short): This term expresses the property of your data, as in “how much detail your data provides”. For example, in an E-commerce dataset:
- The
order_items
table has a high level of detail since it provides information down to the items level. - If you aggregate the data into a table called
monthly_sales
, the table has a low level of detail, since we will not know anything about each individual order from this table.
- The
- Visualization LoD: The level of details, or grains, that you include in your visualization/reports
- Calculation LoD: The level of details, or grains, that you use to calculate your metrics. Normally it will be the same as the Visualization LoD, but in some cases, we would want the Calculation LoD to be different.
Currently, Holistics AQL supports the calculation on different LoD via the following features:
of_all()
function: Used when your Calculation LoD is lower than Visualization LoD- Nested Aggregation: Used when your Calculation LoD is higher than Visualization LoD
dimensionalize()
function: Used when you want your metrics to always be evaluated against certain dimensions, and turn the metric into a dimension itself.is_at_level()
function: Used when you want to conditionally modify the behavior of a measure based on the active dimensions in the Level of Detail (LoD) context.
In the next part, we will go through the usage of each feature.
Real use cases
In this section, we will work on a hypothetical Ecommerce dataset with the following models:
Use case 1: Involving a lower LoD - of_all()
function
To answer the question “What is the order contribution of each country toward the total orders count of the company?”, we will need to perform the following steps:
- Count the orders of each country:
- Count the orders of the whole company across all countries
- Divide the result of (1) by (2)
In this example, you have the same calculation (count of orders) aggregated at different levels of detail - higher LoD in (1), and lower LoD in (2). To calculate this in Holistics’s AQL, you will need to use the of_all()
(alias: exclude_grains()
) function to exclude the Country dimension from the calculation.
To achieve this, in the modeling layer, your orders
model may look like this:
Model orders {
dimension id {}
...
measure total_orders {
label: 'Total Orders'
type: 'number'
definition: @aql count(orders.id) ;;
}
measure total_orders_of_all_countries {
label: 'Total Orders of all Countries'
type: 'number'
definition: @aql orders.total_orders | of_all(countries) ;;
}
measure country_orders_pct {
label: 'Country Orders Pct'
type: 'number'
definition: @aql orders.total_orders * 1.0 / orders.total_orders_of_all_countries ;;
}
}
And when visualizing via the Dataset Exploration UI, you just need to include the Country Name, Total Orders and Total Orders of all Countries fields.
If you write your exploration using AQL:
orders
| group(countries.name)
| select(
country_name: countries.name,
total_orders: count(orders, orders.id),
total_orders_of_all_countries: count(orders, orders.id) | of_all(countries),
)
| select(
country_name,
total_orders,
total_orders_of_all_countries,
country_order_pct: total_orders * 1.0 / total_orders_of_all_countries
)
For a more detailed guide of how to use of_all()
, please check out the Percent of Total guide.
Use case 2: Involving a higher LoD - Nested Aggregation
To answer the question “What is the maximum AOV of a customer in a particular country?”, we will need to perform the following calculations:
- Calculate the AOV (average order value) of each customer
- Find the max/min of that AOV value, grouped by country
Here we have to work with the metric AOV at different LoD: lower LoD in 1, and higher LoD in 2. (opposite to Use case 1). To solve this in AQL, we will use the Nested Aggregation feature:
In the users
model, you can create an aov
measure:
Model users {
label: 'Users'
...
measure aov {
label: 'Customer AOV'
type: 'number'
definition: @aql
sum(order_items, order_items.quantity * products.price) * 1.0 / count_distinct(orders.id)
;;
}
}
And in the Ecommerce dataset, you create a max_user_aov
metric that has the measure users.aov
nested inside:
Dataset ecommerce {
label: 'Ecommerce'
...
metric max_user_aov {
label: 'Max User AOV'
type: 'number'
definition: @aql users | group(users.id) | select(user_aov: users.aov) | max() ;;
}
}
In the Dataset Exploration UI, when you include the Country Name and the Max User AOV fields, the following things happen:
- The AOV value is calculated against the
users.id
dimension - The
max()
function will find the maximum AOV, and this is evaluated against the Country Name dimension in the visualization.
You can express this nested aggregation logic in AQL like below:
order_items
| group(countries.name)
| select(
country_name: countries.name,
max_customer_aov:
max(
order_items
| group(users.id)
| select(
aov: sum(order_items, order_items.quantity * products.price) * 1.0
/ count_distinct(order_items, order_items.order_id)
)
)
)
Use case 3: Involving a fixed LoD - dimensionalize()
There are cases when you have a metric that will always be calculated against some specific dimensions, no matter what other dimensions you may include in the visualization. You may want to use that metric as a dimension, or as a “property” of a certain grain. For example:
- The Average Order Value of a customer can be a dimension by itself
- The last-three-month return rate of a certain product
- The “cohort size”, or the number of users registered in a particular month in a cohort analysis
To achieve these effects, you can use the dimensionalize()
function to turn a measure into a dimension.
In Use case 2, we have defined aov
as a measure in the users
model. We can also turn it into a dimension:
Model users {
label: 'Users'
...
dimension aov_dim {
label: 'Customer AOV'
type: 'number'
definition: @aql
sum(order_items, order_items.quantity * products.price) * 1.0
/ count_distinct(orders.id) | dimensionalize(users.id)
;;
}
}
And then, in the dataset the max_user_aov
metric can be greatly simplified:
Dataset ecommerce {
label: 'Ecommerce'
...
metric max_user_aov {
label: 'Max User AOV'
type: 'number'
definition: @aql max(users.aov_dim) ;;
}
}
This is the result:
Although the result of the dimensionalized measure and the nested measure looks similar, actually they are not the same. They will be affected by measures in different ways.
For example, if we add a "orders.created_at matches 'last month'" filter condition:
These are the different effects of the filter:
- The measure Max of (AOV Measure) returns the maximum AOV calculated on the orders that were placed last month..
- The measure Max of (AOV Dimension) returns the maximum AOV calculated on all orders (of all time) that belongs to users having placed an order last month.
Use case 4: Calculating Percentage Contributions in a Pivot Table uing is_at_level()
Imagine you have a Pivot Table with three hierarchical levels: Continent
, Country
, and City
. You want to calculate the percentage of sales contribution at each level:
- City to Country
- Country to Continent
- Continent to Grand Total
To achieve this, the is_at_level
function can identify the active dimension in the LoD context, allowing you to compute the correct percentage.
Here's how you can implement this logic:
case(
when: is_at_level(cities.name)
, then: sum(sales.amount) / (sum(sales.amount) | of_all(cities.name))
, when: is_at_level(countries.name)
, then: sum(sales.amount) / (sum(sales.amount) | of_all(countries.name))
, when: is_at_level(countries.continent)
, then: sum(sales.amount) / (sum(sales.amount) | of_all(countries.continent))
, else: 1
)
Explanation:
is_at_level(cities.name)
: Checks if the current dimension level isCity
. If true, the calculation returns the percentage of sales for each city relative to the total sales of all cities. Sinceof_all
is only applied tocities.name
, this will be all cities within the same country.is_at_level(countries.name)
: Checks if the current dimension level isCountry
. If true, the calculation returns the percentage of sales for each country relative to the total sales of all countries within the same continent. Note that we don't need to includecities.name
in theof_all
function because if it's active, it would've been caught by the first condition.is_at_level(countries.continent)
: Checks if the current dimension level isContinent
. If true, the calculation returns the percentage of sales for each continent relative to the total sales of all continents.else
: If none of the conditions are met, it simply returns 1. This is useful for the Grand Total row, where we don't need to calculate the percentage.
This approach allows you to dynamically adjust the measure based on the active dimension, ensuring accurate percentage calculations across different hierarchical levels.
Choosing between Nested Aggregation and dimensionalize()
As we can see in the example above, on the surface Nested Aggregation and Dimensionalized Measure can produce similar result, but their actual behaviors are very different. So, here are rules of thumb when to use what feature:
- You should use
dimensionalize()
when you want to use the measure as a group-able dimension. This is particularly useful in use cases creating bins in histograms, doing cohort analysis... - Other than that, it is better and more intuitive to stick to Nested Aggregation.