Skip to main content

Handling Non-additive Metrics

Knowledge Checkpoint

A grasp of these concepts will help you understand this documentation better:

What are additive metrics and non-additive metrics?

Non-additive metrics are calculations that can only be aggregated once, and cannot be aggregated further (or it will produce nonsensical results). To be more detailed:

  • Additive metrics usually involve COUNT, SUM calculations. These metrics can be aggregated multiple times and still produce correct results.
  • Non-additive metrics involve calculations like COUNT DISTINCT, AVERAGE, MEDIAN. Due to the nature of calculation, these metrics cannot be rolled up multiples times because it will produce incorrect results.

Let's take an example of an orders table where we're interested in reporting 2 key metrics: Revenue and Active Users.

Table orders {
id integer [PK]
user_id integer
created_at datetime
bigint order_value
}

You can aggregate Revenue in multiple aggregation rounds (daily → monthly). But you cannot aggregate Active Users the same way.

In other words, in the below 2 SQL statements, the calculation of monthly active users (MAUs) is incorrect.

-- Aggregate daily
CREATE TABLE orders_daily AS
SELECT
created_at::date as date_d
SUM(order_value) as daily_revenue -- correct
COUNT(DISTINCT user_id) as daily_unique_users --correct
FROM orders
GROUP BY 1;

-- Aggregate monthly
CREATE TABLE orders_monthly AS
SELECT
DATE_TRUNC(date_d, 'MONTH')::date as month_d
SUM(daily_revenue) as monthly_revenue -- correct
SUM(daily_unique_users) as monthly_unique_users --incorrect
FROM orders_daily
GROUP BY 1

MAUs cannot be calculated from DAUs, but have to get directly from raw:

SELECT
DATE_TRUNC(created_at, 'MONTH')::date as month_d
COUNT(DISTINCT user_id) as monthly_unique_users --correct
FROM orders
GROUP BY 1

In this case, Revenue is an additive metric, and Active Users is a non-additive metric.

How does Holistics modeling layer work with non-additive metrics?

Holistics modeling can work well with non-additive metrics when you have the raw data. Since Holistics can perform just-in-time SQL generation based on the list of fields the user select, instead of requiring pre-aggregation of data.

The above example can be modeled into Holistics modeling as:

Model orders {
type: 'table'
dimension id {
type: 'number'
}
dimension created_at {
type: 'datetime'
}
dimension user_id {
type: 'number'
}
measure unique_users {
type: 'number'
definition: @@sql COUNT(DISTINCT {{user_id}}) ;;
}
}

And when the end-user explores the data, a corresponding SQL will be generated based on the selection of the user.


Let us know what you think about this document :)