Handling Nested JSON Data in Holistics
Often times when you load data from NoSQL source into Holistics, the data will be stored as JSON/nested object and nested array. This document outlines different approaches for working with them effectively.
If you use MongoDB and haven't loaded the data into a SQL database, please check out this MongoDB Reporting guide.
High-level Approach
JSON objects typically have 2 types of data structures that we need to deal with:
- Non-array JSON
- JSON array
If you are dealing with non-array JSON, the solution can be either:
- Create Custom Field that accesses the specific value inside the nested object, or
- Flatten the nested object into a separate model using Holistics SQL Query Model
If you are dealing with a JSON array, the solution is to unnest the array into a separate table, and join them along with the original table. We can do this using Holistics SQL Query Model feature as well.
Handling non-array JSON objects
With with non-array JSON, we can either:
- Create Custom Field that extracts specific values from the JSON object, or
- Flatten the nested object into a separate model using Holistics SQL Query Model.
Creating Custom Field for a specific JSON property
Using Holistics Modeling, you can create a Custom Dimension that references inside the non-array JSON column to the property you want. This field will then be exposed to others to use.
- 3.0
- 4.0
In the example below, we extract the height
field out from field size
.
{"h":14,"w":21,"uom":"cm"}
Formula for field height: ({{ #THIS.size }} ->> 'h')::numeric
Here is a step-by-step guide to create custom fields from JSON objects in Holistics 4.0.
Given the column info that contains JSON objects storing order information. Your data may look like this:
{ "customer": "Raju Kumar", "items": {"product": "coffee", "qty": 6}}
(Optional) Create a Table Model from your table.
In your model, extract JSON properties into Custom Fields using this syntax.
// Extract the first field, customer's name
dimension name {
label: 'Name'
type: 'text'
hidden: false
definition: @sql ({{ #SOURCE.info }} ->> 'customer') ;;
}
// This is used to access nested item
dimension order {
label: 'Order'
type: 'text'
hidden: false
definition: @sql ({{ #SOURCE.info }} -> 'items' ->> 'product') ;;
}
// This syntax is used to cast your data into numeric values
dimension quantity {
label: 'Quantity'
type: 'text'
hidden: false
definition: @sql ({{ #SOURCE.info }} -> 'items' ->> 'qty')::numeric ;;
}
- Your result should look like this.
Click to see full code example
// public_orders.model.aml
Model public_orders {
type: 'table'
label: 'Orders'
description: ''
data_source_name: 'demo'
dimension name {
label: 'Name'
type: 'text'
hidden: false
definition: @sql ({{ #SOURCE.info }} ->> 'customer') ;;
}
dimension order {
label: 'Order'
type: 'text'
hidden: false
definition: @sql ({{ #SOURCE.info }} -> 'items' ->> 'product') ;;
}
dimension quantity {
label: 'Quantity'
type: 'text'
hidden: false
definition: @sql ({{ #SOURCE.info }} -> 'items' ->> 'qty')::numeric ;;
}
table_name: '"public"."orders"'
}
Flattening your nested JSON into a separate table
A second approach would be writing a SQL model to hold your flattening logic, this basically creates a new, derived table from the original table with the flatten fields.
- 3.0
- 4.0
select
{{#inv.id}},
({{#inv.size}} ->> 'w')::numeric as width,
({{#inv.size}} ->> 'h')::numeric as height,
({{#inv.size}} ->> 'uom') as unit
from {{ #inventories_proper inv }}
Given that your orders table contains a column info that is of JSON type storing order information. Let's flatten order_info column into a separate Query Model (Query Model), order_info.
Create a Table Model from your table if you haven't already.
(Important) By default, the auto-generated JSON field will have the type composite. As Holistics 4.0 does not support this data type at the moment, change the type of your JSON field into text.
Let's build a Query Model (Query Model) from this Table Model. The syntax for this model is as below.
select
({{ #public_orders1.info }} ->> 'customer') as name,
({{ #public_orders1.info }} -> 'items' ->> 'product')as product,
({{ #public_orders1.info }} -> 'items' ->> 'qty')::numeric as quantity
from {{ #public_orders1 }};;You should be able to successfully create the
order_info
Query Model that contains the flattened JSON object.
Click to see full code example
// order_info.model.aml
// This is a Query Model
import 'path/to/table/model/in/step/one' { public_orders as model__public_orders }
Model order_info {
type: 'query'
label: 'Order Info'
description: ''
data_source_name: 'elephant'
dimension name {
label: 'Name'
type: 'text'
hidden: false
definition: @sql {{ #SOURCE.name }};;
}
dimension product {
label: 'Product'
type: 'text'
hidden: false
definition: @sql {{ #SOURCE.product }};;
}
dimension quantity {
label: 'Quantity'
type: 'number'
hidden: false
definition: @sql {{ #SOURCE.quantity }};;
}
owner: '[email protected]'
query: @sql
select
({{ #public_orders.info }} ->> 'customer') as name,
({{ #public_orders.info }} -> 'items' ->> 'product')as product,
({{ #public_orders.info }} -> 'items' ->> 'qty')::numeric as quantity
from {{ #public_orders }};;
models: [
model__public_orders
]
}
Working with array JSON field
When you have an array JSON field, what we should do is to normalize the data, by bringing the JSON array into a proper relational table.
- 3.0
- 4.0
For example, you have a table model containing blog posts and the comments made to it. The tags
field is a JSON array:
Let's say you want to know "How many blog posts are the for each tag?".
In this case, you will need to write a SQL model (let's call it posts_tags_unnested
) to unnest the array so that each tag value is placed on a separate row. Post ID values will be repeated:
select
{{#p.id}} as post_id,
tags_unnested #>>'{}' as tags
from
{{ #posts p}},
json_array_elements({{#p.tags}}) as tags_unnested
Add relationships, and then create a dataset out of the original model and the newly created SQL model. Note that the relationship between posts.id
and posts_tags_unnested.post_id
is many to one:
To answer the question: "How many blog posts there are for each tag?", simply drag in the unnested Tags field and count distinct [Post ID] at the Dataset Explore screen.
Here's the full animated video of the steps:
Given the column info that contains JSON objects storing information of people's posts. Your data may look like this:
'{ "author": "Helen", "tags": ["home-grooming","beauty"]}’
'{ "author": "Mary", "tags": ["healthy","lifestyle"]}’
'{ "author": "Bob", "tags": ["healthy","lifestyle"]}’
Note that this JSON has a tags
field that contains a JSON array. Let's transform the data so that we can answer the question: "How many blog posts there are for each tag?".
Create a Table Model from your table if you haven't already.
(Important) By default, the auto-generated JSON field will have the type composite. As Holistics 4.0 does not support this data type at the moment, change the type of your JSON field into text.
Next, extract the JSON array from the tags field into a Custom Dimension using this syntax below.
dimension tags {
label: 'Tags'
type: 'text'
hidden: false
definition: @sql
select
({{ #SOURCE.info }} -> 'tags') as tags;;
}Let's build a Query Model (Query Model) to unnested the values in the JSON array of the newly-created dimension above. The syntax for this model is as below.
select
{{ #public_posts.id }} as post_id,
tags_unnested #>>'{}' as tags
from
{{ #public_posts }},
json_array_elements({{ #public_posts.tags }} :: json) as tags_unnestedFinally, create a Dataset that contains the Query Model created in step 5
post_info_unnested.model.aml
and the original Table Model in step 1public_posts.model.aml
.Then, set-up the relationship between
posts.id
andposts_info_unnested.post_id
to be many to one by adding this RelationshipConfig into the Dataset definition.relationships: [
RelationshipConfig {
rel: Relationship {
type: 'many_to_one'
from: ref('public_posts','id')
to: ref('post_info_unnested','post_id')
}
active: true
}
]Let's see this in action. To answer the question: "How many blog posts there are for each tag?", simply drag in the unnested Tags field and count distinct [Post ID] at the Dataset Explore screen.
Click to see full code example for the Table Model used in the tutorial
// public_post.model.aml
// This is the Table Model
Model public_posts {
type: 'table'
label: 'Posts'
description: ''
data_source_name: 'elephant'
dimension id {
label: 'Id'
type: 'number'
hidden: false
definition: @sql {{ #SOURCE.id }};;
}
dimension info {
label: 'Info'
type: 'text'
hidden: false
definition: @sql {{ #SOURCE.info }};;
}
dimension tags {
label: 'Tags'
type: 'text'
hidden: false
definition: @sql
select
({{ #SOURCE.info }} -> 'tags') as tags;;
}
owner: '[email protected]'
table_name: '"public"."posts"'
}
Click to see full code example for the Query Model used in the tutorial
// post_info_unnested.model.aml
// This is the Query Model
import '/path/to/public_posts.model.aml' { public_posts as model__public_posts }
Model post_info_unnested {
type: 'query'
label: 'Post Info Unnested'
description: ''
data_source_name: 'elephant'
dimension post_id {
label: 'Post Id'
type: 'number'
hidden: false
definition: @sql {{ #SOURCE.post_id }};;
}
dimension tags {
label: 'Tags'
type: 'text'
hidden: false
definition: @sql {{ #SOURCE.tags }};;
}
owner: '[email protected]'
query: @sql
select
{{#p.id}} as post_id,
tags_unnested #>>'{}' as tags
from
{{ #public_posts p }},
json_array_elements({{#p.tags}}) as tags_unnested;;
models: [
model__public_posts
]
}
Click to see full code example for the Dataset used in the tutorial
// posts.dataset.aml
// This is the Dataset
import '/path/to/public_posts.model.aml' { public_posts }
import '/path/to/post_info_unnested.model.aml' { post_info_unnested }
Dataset posts {
label: 'Posts'
description: ''
data_source_name: 'elephant'
models: [
post_info_unnested,
public_posts
]
relationships: [
RelationshipConfig {
rel: Relationship {
type: 'many_to_one'
from: ref('public_posts','id')
to: ref('post_info_unnested','post_id')
}
active: true
}
]
owner: '[email protected]'
}