Skip to main content

Calculate Running Total (Cumulative Total) using Holistics

info

Holistics has supported a native feature to calculate Running Total. Check out our documentation here.

Introduction

A running total is a common metric to gain insight into how an amount has accumulated over time. In this article, we'll show you how to set up cumulative number (running total) in Holistics.

Intro

Use Case

We'll use a simple table running_total_demo_data that contains 3 fields: date, sale, and product. There are 1379 rows, dates range from Jan 2016 to Jun 2020 with no sales made on certain days, prices are random positive values, and the product is assigned randomly between A/B/C.

Use case

Given the above demo data, we want to build 2 reports:

  • Running total of all sales (unsegmented).
  • Running total of sales broken down by individual product (segmented).

Running Total

High-level Solution

Currently, Holistics doesn't support native running total calculation, so we need to do some transformations using SQL queries to calculate the running totals.

We will create two query models that calculate running totals as follow:

  • We create rt_all (ordering, date, sales, running_all) where running_all is the cumulative sum of all products' sales.

    Running All

  • We create rt_products (ordering, date, product, sales, running_by_product) where running_by_product is the cumulative sum of each product's sales.

    Running By Product

To perform the cumulative calculation, in this guide we'll be using the self-join technique instead of window functions technique. Read on for more details.

Transforming Data

  1. First, we'll prep the data by adding a numbered ordering field to each row and have rt_prepped model.

  2. To calculate running total, we self-join the model back on itself, using the ordering and product fields as the join conditions to calculate the cumulative field running_total and save it to new models as below.

    1. We build rt_all model: running total by date
    2. We build rt_products model: running total by product and date

    Data model diagram

Adding ordering column

Our first step will be to create an ordering column. Besides being simpler for a lot of JOIN-related transformations, it's also easier for humans to read (especially when timestamps between transactions are too similar).

We create a Transform Model (which we'll name rt_prepped) with this simple code:

select
row_number () over () as ordering,
date,
sales,
product
from
{{ #running_total_demo_data}}

ROW_NUMBER() written in this way generates a sequential column that numbers each row. You can add an ORDER BY argument within the OVER() clause if you have another column that you would like to order it by.

Add Ordering

Note: If you are recording both positive and negative value transactions, you might want to use a DATE_PART() function to extract the dates based on the intervals of your choice (e.g. 'month' and/or 'year'), SUM() the values, and then group them by your chosen interval so that there is only one entry per interval. You will also have to pre-filter the segments you wish to exclude or segment your report by.

Building Running total by Date (Unsegmented)

Now, let's add a new cumulative column to tell people how total prices of all products do we have up to a particular day and save it to rt_all model

Running All

Query:

We are simply joining each row on its precedent and summing price totals to create our running_all field. Creating the ordering field makes my first JOIN condition easier. Without it, you might need to create a joining key.

SELECT
t1.ordering,
t1.date,
t1.sales,
sum(t2.sales) as running_all
FROM
{{ #rt_prepped t1}}
INNER JOIN {{ #rt_prepped t2}} ON t1.ordering>= t2.ordering

GROUP BY
t1.ordering,
t1.date,
t1.sales
ORDER BY
t1.id

Building Running total by Product & Date (Segmented)

Sometimes, we need to calculate the running total based on date by product.

Running By Product

In SQL, this might not be ideal as it would create separate running totals for each product type and you will have to do extra work to create an overall running total in your visualizations, but with Holistics it can actually cut down on your work!

Let's add second JOIN conditions to segment our data by the product field and then proceed.

SELECT
t1.ordering,
t1.date,
t1.product,
t1.sales,
sum(t2.sales) as running_by_product
FROM
{{ #rt_prepped t1}}
INNER JOIN {{ #rt_prepped t2}} ON t1.ordering>= t2.ordering AND t1.product = t2.product

GROUP BY
t1.ordering,
t1.product,
t1.date,
t1.sales
ORDER BY
t1.ordering

Both segmented and unsegmented models have their time and place when it comes to generating reports.

Caveat: Handle Days With No Data

There's a problem with the above models: days with no sales are not shown in the result table (e.g. no sales between Jan 07, 2016 and Jan 31, 2016). This is considered a bug in our query, and needs to be addressed. We do want that particular day to report 0 sales, instead of missing out on the value completely.

We fix this by creating a date dimensions model which generates a list of all dates between [min(date), max(date)] (using PostgreSQL's generate_series function) and then FULL JOIN it to the base model on the date/datetime field. You then further clean the data by either changing your price and other fields with CASE WHEN conditions to populate them with 0/'NIL' where appropriate.

Now, let's edit rt_prepped model with a more complicated SQL query as below:

with date_range as(
select
generate_series( min({{ #a.date}}), max({{ #a.date}}),'1d')::date as dates
from
{{ #running_total_demo_data a}}
)

select
row_number () over () as ordering,
date_range.dates as date,
coalesce(t1.product,'A') as product,
coalesce(t1.sales,0) as sales
FROM
{{ #running_total_demo_data t1}}
full join date_range on t1.date=date_range.dates

Visualizations and Reporting in Holistics

Let's walk through how to present the data using different types of charts.

Area charts

For a simple area chart, drag the Date field into the "X-Axis" area, and click on it to select your interval. We've chosen "Quarter".

Drag your segment Product into the "Legend area.

Drag your Running By Product field to your "Y-Axis" area, and click on it and select "Max".

As stated above, if you are recording both positive and negative transactions, you will have to pre-group and sum your transactions by your chosen interval, or else the visualization might be inaccurate.

Under "Styles", enable "Stack Series".

Area Chart Segment

Of course, you can always use the unsegmented version of your table.

Area Chart unSegment

Displaying using Pivot Tables

For segmented data in a pivot table, drag your Date to "Rows", your segments to "Columns" (in order of hierarchy), and Running All to values set to Max. Go to "Styles" to enable your Row Totals. Now you have all your running totals by segments and as a whole.

Pivot tables Segment

As with the Area Chart, if you want the unsegmented running total without restriction on your Date interval, either prep your data accordingly or use the unsegmented data.

Pivot tables UnSegment

Combination Chart

Just like the area chart, Date as your "X-Axis", segments in your legend, and "Max" Running Total as one of your Y-Axes with any other relevant metrics in other Y-Axes.

Combination Chart

Dashboard Filters

You can also create both selective and global filters on your Dashboard with a few clicks. This allows your business user to see only the data that is relevant to them without needing to run to an analyst to generate a whole new SQL query for each new report.

For our example, we've created a "Product" filter that affects only the segmented charts, and a global "Date" filter.

Dashboard

Summary

As you saw, in this guide we have shown you how to do a simple cumulative report using the power and flexibility of Holistics data modeling and visualization. We address a few points:

  • Calculate running total without window functions. Instead of that, INNER JOIN data model back on itself.
  • Avoid empty-row days with generated date ranges
  • The accumulating sums don't only sum for the selected date period. It also takes into account the running total from the beginning of time.

What's Ahead

Holistics is developing its own language which can be used in conjunction with SQL to simplify many business operations. For example, when our language is developed, an analyst will simply need to use this expression to create a running total column:

Model orders {
field cumu_sum = running_total(sum(revenue))
}

Until then, Holistics continues to cut down on an analysts' workload by modularization of queries and auto-generating complicated queries for the business user who only has to drag, drop, and play with filters.


Let us know what you think about this document :)