Handling nested JSON data in Holistics

Often times when you load data from NoSQL source into Holistics, the data will be stored as JSON/nested object and nested array. This document outlines different approaches for working with them effectively.

info

This guide is intended to be database-agnostic, but for simplicity's sake, we will use PostgreSQL syntax for demonstration. For other syntaxes: BigQuery and MySQL

If you use MongoDB and haven't loaded the data into a SQL database, please check out this MongoDB Reporting guide.

High-level Approach

JSON objects typically have 2 types of data structures that we need to deal with:

  • Non-array JSON
  • JSON array

If you are dealing with non-array JSON, the solution can be either:

  • Creating Custom Field that accesses the specific value inside the nested object, or
  • Flatten the nested object into a separate table using Holistics SQL Transform Model

If you are dealing with a JSON array, the solution is to unnest the array into a separate table, and join them along with the original table. We can do this using Holistics SQL Transform Model feature as well.

Handling non-array JSON objects

Creating Custom Field for a specific JSON property

Using Holistics Modeling, you can create a Custom Dimension that references inside the non-array JSON column to the property you want. This field will then be exposed to others to use.

In the example below, we extract the height field out from field size

{"h":14,"w":21,"uom":"cm"}

Formula for field height: ({{ #THIS.size }} ->> 'h')::numeric

Flattening your nested JSON into a separate table

A second approach would be writing a SQL model to hold your flattening logic, this basically creates a new, derived table from the original table with the flatten fields.

select
{{#inv.id}},
({{#inv.size}} ->> 'w')::numeric as width,
({{#inv.size}} ->> 'h')::numeric as height,
({{#inv.size}} ->> 'uom') as unit
from {{ #inventories_proper inv }}

Working with array JSON field

As mentioned, when you have an array JSON field, what we should do is to normalize the data, by bringing the JSON array into a proper relational table.

For example, you have a table model containing blog posts and the comments made to it. The tags field is a JSON array:

Let's say you want to know "How many blog posts are the for each tag?".

In this case, you will need to write a SQL model (let's call it posts_tags_unnested) to unnest the array so that each tag value is placed on a separate row. Post ID values will be repeated:

select
{{#p.id}} as post_id,
tags_unnested #>>'{}' as tags
from
{{ #posts p}},
json_array_elements({{#p.tags}}) as tags_unnested

Add relationships, and then create a dataset out of the original model and the newly created SQL model. Note that the relationship between posts.id and posts_tags_unnested.post_id is many to one:

To answer the question: "How many blog posts there are for each tag?", simply drag in the unnested Tags field and count distinct [Post ID] at the Dataset Explore screen.

Here's the full animated video of the steps: