Skip to main content

Moving Average

Knowledge Checkpoint

A grasp of these concepts will help you understand this documentation better:

Introduction

Moving Average is a powerful analytical tool used to help identifying trends in time series data. By averaging the values of a variable over a rolling window, we can smooth out fluctuations and help the actual trend stands out better.

In this tutorial, we will create a report that visualizes both the monthly revenue and the 3-month-moving average revenue of an E-commerce business using the trailing_period() and window_avg() functions in Holistics.

Setup

In this guide, we will use the familiar e_commerce dataset:

// orders.model.aml
Model orders {
...
dimension id {...}
dimension user_id {...}
dimension created_at {...}
}

// order_items.model.aml
Model order_items.model.aml {
...
dimension order_id {...}
dimension product_id {...}
dimension quantity {...}
}

// products.model.aml
Model products {
...
dimension id {...}
dimension price {...}
dimension categor_id {...}
}

// e_commerce.dataset.aml
Dataset e_commerce {
...
models: [orders, order_items, products]
relationships: [
relationship(order_items.order_id > orders.id, true),
relationship(order_items.product_id > products.id, true)
]
}

High-level Flow

  1. Create a normal aggregation metric: We will create a revenue metric in the dataset which will be used later in the definition of the moving metric.
  2. Create a moving average metric:
  • Using the trailing_period() Function: This function turns the revenue metric into a moving aggregation that sums revenues over intervals of 3 months. The sum is then divided by 3 to obtain the average value.

  • Using the window_avg() Function: This function performs a secondary average over the calculated revenue metric. We must ensures that the revenue metric is sliced (grouped) and ordered by the Created Month dimension, with a frame of previous 2 rows and the current row.

  1. Visualize
info

The key difference between trailing_period() and window_avg() is that, trailing_period() works at the time-series level, and ensures that the aggregation is done over a specified time window, regardless the table rows. You can see more details about how it works here.

On the other hand, window_avg() works at the table rows level, and we must ensure that the table rows contain the correct data for the secondary aggregation to be meaningful. You can see more details about how it works here.

Thus,trailing_period can handle gap in data. A trailing period measure of 2 months will use the calendar months to calculate the moving average. On the other hand, the window_avg function will use the previous row and the current row to calculate the moving average.

For example, in the following table with gaps in data, the trailing_period function will return different results than the window_avg function:

MonthRevenuetrailing_periodwindow_avg
Jan100100100
Mar10050100
Jun10050100

Another difference is that trailing_period can handle the case where the time dimension are in different grains (e.g. month vs day), while window_avg requires the time dimension to be in the same grain for the calculation to be meaningful.

trailing_period are also calculated before visual filters are applied, meaning if you data was filtered to only show data from 2022, the average for the first 2 month would still be calculated over the full 3 months if available. window_avg on the other hand will only calculate the average over the rows that are visible in the table.

Implementation

1. Create a normal aggregation metric

In the definition file of the e_commerce dataset, we create a revenue metric that combines fields from the two models order_items and products :

Dataset e_commerce {
...

metric revenue {
label: 'Revenue'
type: 'number'
definition: @aql order_items | sum(order_items.quantity * products.price) ;;
}
}

2. Create a moving average metric

2.1. Using trailing_period()

We will use the trailing_period() function to make the revenue metric to aggregate sales over an interval of 3 months. Here we will need to decide two things:

  • The field that serves as the time dimension: Since revenue is an information about orders, we choose the field orders.created_at. Note that there is no need to convert the field to the month grain - the time grain will be decided in the interval argument.
  • The trailing interval: we use interval(3 months) to specify that revenue will be aggregated for every 3-month period.
Dataset e_commerce {
...

metric revenue {
label: 'Revenue'
type: 'number'
definition: @aql order_items | sum(order_items.quantity * products.price) ;;
}

metric moving_avg_revenue_3m_tp {
label: 'trailing_period - 3M Moving Avg. Revenue'
type: 'number'
definition: @aql trailing_period(revenue, orders.created_at, interval(3 months)) / 3;;
}
}

The result:

The moving average value of the first two rows is low because we only take 1 and 2 months’ revenue and divide by 3. In visualization, we can filter them out if it makes more sense for our report.

2.1. Using window_avg()

With window_avg(), we will need to specify the following:

  • Aggregation window: -2..0 means "from 2 previous rows to the current row"
  • Row order: To make sure the aggregation is calculated in the right time order, we specify ecommerce_orders.created_at field at the order argument.
Dataset e_commerce {
...

metric revenue {
label: 'Revenue'
type: 'number'
definition: @aql order_items | sum(order_items.quantity * products.price) ;;
}

metric moving_avg_revenue_3m_wa {
label: '3-month Moving Avg. Revenue (w. window_avg)'
type: 'number'
definition: @aql window_avg(revenue, -2..0, order: ecommerce_orders.created_at | month());;
}

}

The result:

The value of the first and the second row is the average of one and two rows respectively. If this does not make sense to your report, you can assign NULL to them:

Dataset e_commerce {
...

metric moving_avg_revenue_3m_wa {
label: 'window_avg - 3M Moving Avg. Revenue'
type: 'number'
definition: @aql case(
when: window_count(revenue, -2..0, order: ecommerce_orders.created_at | month()) < 3,
then: null,
else: window_avg(revenue, -2..0, order: ecommerce_orders.created_at | month())
);;
}

}

3. Visualize

As we can see from the line chart, the moving average line is much smoother than the actual revenue line:

In real-life situations where the month-by-month sales may fluctuate more wildly, moving averages will be a valuable tool for analysts to better display trends.


Let us know what you think about this document :)