Moving Average
A grasp of these concepts will help you understand this documentation better:
Introduction
Moving Average is a powerful analytical tool used to help identifying trends in time series data. By averaging the values of a variable over a rolling window, we can smooth out fluctuations and help the actual trend stands out better.
In this tutorial, we will create a report that visualizes both the monthly revenue and the 3-month-moving average revenue of an E-commerce business using the trailing_period()
and window_avg()
functions in Holistics.
Setup
In this guide, we will use the familiar e_commerce
dataset:
- Initial Setup
- Final Setup
// orders.model.aml
Model orders {
...
dimension id {...}
dimension user_id {...}
dimension created_at {...}
}
// order_items.model.aml
Model order_items.model.aml {
...
dimension order_id {...}
dimension product_id {...}
dimension quantity {...}
}
// products.model.aml
Model products {
...
dimension id {...}
dimension price {...}
dimension categor_id {...}
}
// e_commerce.dataset.aml
Dataset e_commerce {
...
models: [orders, order_items, products]
relationships: [
relationship(order_items.order_id > orders.id, true),
relationship(order_items.product_id > products.id, true)
]
}
// orders.model.aml
Model orders {
...
dimension id {...}
dimension user_id {...}
dimension created_at {...}
}
// order_items.model.aml
Model order_items.model.aml {
...
dimension order_id {...}
dimension product_id {...}
dimension quantity {...}
}
// products.model.aml
Model products {
...
dimension id {...}
dimension price {...}
dimension categor_id {...}
}
// e_commerce.dataset.aml
Dataset e_commerce {
...
models: [orders, order_items, products]
relationships: [
relationship(order_items.order_id > orders.id, true),
relationship(order_items.product_id > products.id, true)
]
metric revenue {
label: 'Revenue'
type: 'number'
definition: @aql coalesce(order_items | sum(order_items.quantity * ecommerce_products.price), 0);;
}
// Use trailing_period
metric moving_avg_revenue_3m_tp {
label: 'trailing_period - 3M Moving Avg. Revenue'
type: 'number'
definition: @aql trailing_period(revenue, ecommerce_orders.created_at, interval(3 months)) / 3;;
}
// Use window_avg
metric moving_avg_revenue_3m_wa {
label: 'window_avg - 3M Moving Avg. Revenue'
type: 'number'
definition: @aql window_avg(revenue, -2..0, order: ecommerce_orders.created_at | month());;
}
}
High-level Flow
- Create a normal aggregation metric: We will create a
revenue
metric in the dataset which will be used later in the definition of the moving metric. - Create a moving average metric:
Using the
trailing_period()
Function: This function turns therevenue
metric into a moving aggregation that sums revenues over intervals of 3 months. The sum is then divided by 3 to obtain the average value.Using the
window_avg()
Function: This function performs a secondary average over the calculatedrevenue
metric. We must ensures that therevenue
metric is sliced (grouped) and ordered by the Created Month dimension, with a frame of previous 2 rows and the current row.
- Visualize
The key difference between trailing_period()
and window_avg()
is that, trailing_period()
works at the time-series level, and ensures that the aggregation is done over a specified time window, regardless the table rows. You can see more details about how it works here.
On the other hand, window_avg()
works at the table rows level, and we must ensure that the table rows contain the correct data for the secondary aggregation to be meaningful. You can see more details about how it works here.
Thus,trailing_period
can handle gap in data. A trailing period measure of 2 months will use the calendar months to calculate the moving average. On the other hand, the window_avg function will use the previous row and the current row to calculate the moving average.
For example, in the following table with gaps in data, the trailing_period
function will return different results than the window_avg
function:
Month | Revenue | trailing_period | window_avg |
---|---|---|---|
Jan | 100 | 100 | 100 |
Mar | 100 | 50 | 100 |
Jun | 100 | 50 | 100 |
Another difference is that trailing_period
can handle the case where the time dimension are in different grains (e.g. month vs day), while window_avg
requires the time dimension to be in the same grain for the calculation to be meaningful.
trailing_period
are also calculated before visual filters are applied, meaning if you data was filtered to only show data from 2022, the average for the first 2 month would still be calculated over the full 3 months if available. window_avg
on the other hand will only calculate the average over the rows that are visible in the table.
Implementation
1. Create a normal aggregation metric
In the definition file of the e_commerce
dataset, we create a revenue
metric that combines fields from the two models order_items
and products
:
Dataset e_commerce {
...
metric revenue {
label: 'Revenue'
type: 'number'
definition: @aql order_items | sum(order_items.quantity * products.price) ;;
}
}
2. Create a moving average metric
2.1. Using trailing_period()
We will use the trailing_period()
function to make the revenue
metric to aggregate sales over an interval of 3 months. Here we will need to decide two things:
- The field that serves as the time dimension: Since revenue is an information about orders, we choose the field
orders.created_at
. Note that there is no need to convert the field to themonth
grain - the time grain will be decided in the interval argument. - The trailing interval: we use
interval(3 months)
to specify thatrevenue
will be aggregated for every 3-month period.
Dataset e_commerce {
...
metric revenue {
label: 'Revenue'
type: 'number'
definition: @aql order_items | sum(order_items.quantity * products.price) ;;
}
metric moving_avg_revenue_3m_tp {
label: 'trailing_period - 3M Moving Avg. Revenue'
type: 'number'
definition: @aql trailing_period(revenue, orders.created_at, interval(3 months)) / 3;;
}
}
The result:
The moving average value of the first two rows is low because we only take 1 and 2 months’ revenue and divide by 3. In visualization, we can filter them out if it makes more sense for our report.
2.1. Using window_avg()
With window_avg()
, we will need to specify the following:
- Aggregation window:
-2..0
means "from 2 previous rows to the current row" - Row order: To make sure the aggregation is calculated in the right time order, we specify
ecommerce_orders.created_at
field at theorder
argument.
Dataset e_commerce {
...
metric revenue {
label: 'Revenue'
type: 'number'
definition: @aql order_items | sum(order_items.quantity * products.price) ;;
}
metric moving_avg_revenue_3m_wa {
label: '3-month Moving Avg. Revenue (w. window_avg)'
type: 'number'
definition: @aql window_avg(revenue, -2..0, order: ecommerce_orders.created_at | month());;
}
}
The result:
The value of the first and the second row is the average of one and two rows respectively. If this does not make sense to your report, you can assign NULL to them:
Dataset e_commerce {
...
metric moving_avg_revenue_3m_wa {
label: 'window_avg - 3M Moving Avg. Revenue'
type: 'number'
definition: @aql case(
when: window_count(revenue, -2..0, order: ecommerce_orders.created_at | month()) < 3,
then: null,
else: window_avg(revenue, -2..0, order: ecommerce_orders.created_at | month())
);;
}
}
3. Visualize
As we can see from the line chart, the moving average line is much smoother than the actual revenue line:
In real-life situations where the month-by-month sales may fluctuate more wildly, moving averages will be a valuable tool for analysts to better display trends.