Working with Non-additive Measures

This page explains how Holistics works with additive and non-additive measures.

What are additive measures and non-additive measures?

Non-additive measures are measures/calculation that cannot be aggregated multiple times because it will still produce incorrect results.

  • Additive measures usually involve COUNT, SUM calculations. These measures can be rolled up (aggregated) multiple times (e.g daily โ†’ weekly โ†’ monthly) and still produce correct results.
  • Non-additive measures involve calculations like COUNT DISTINCT, AVERAGE, MEDIAN. Due to the nature of calculation, these measures cannot be rolled up multiples times because it will produce incorrect results.

Let's take an example of an orders table where we're interested in reporting 2 key metrics (measures): a) Revenue, b) Active Users

Table orders {
id integer [PK]
user_id integer
created_at datetime
bigint order_value
}

You can aggregate Revenue in multiple aggregation rounds (daily โ†’ monthly). But you cannot aggregate Active Users the same way.

In other words, in the below 2 SQL statements, the calculation of monthly active users (MAUs) is incorrect.

-- Aggregate daily
CREATE TABLE orders_daily AS
SELECT
created_at::date as date_d
SUM(order_value) as daily_revenue -- correct
COUNT(DISTINCT user_id) as daily_unique_users --correct
FROM orders
GROUP BY 1;
-- Aggregate monthly
CREATE TABLE orders_monthly AS
SELECT
DATE_TRUNC(date_d, 'MONTH')::date as month_d
SUM(daily_revenue) as monthly_revenue -- correct
SUM(daily_unique_users) as monthly_unique_users --incorrect
FROM orders_daily
GROUP BY 1

MAUs cannot be calculated from DAUs, but have to get directly from raw:

SELECT
DATE_TRUNC(created_at, 'MONTH')::date as month_d
COUNT(DISTINCT user_id) as monthly_unique_users --correct
FROM orders
GROUP BY 1

In this case, Revenue is an additive metric, and Active Users is a non-additive metric.

How does Holistics modeling layer work with non-additive measures?

Holistics modeling can work well with non-additive measures when you have the raw data. Since Holistics can perform just-in-time SQL generation based on the list of fields the user select, instead of requiring pre-aggregation of data.

The above example can be modeled into Holistics modeling as:

Model orders {
type: 'table'
dimension id {
type: 'number'
}
dimension created_at {
type: 'datetime'
}
dimension user_id {
type: 'number'
}
measure unique_users {
type: 'number'
definition: @@sql COUNT(DISTINCT {{user_id}}) ;;
}
}

And when the end-user explores the data, a corresponding SQL will be generated based on the selection of the user.