Transform Data

How to transform data

In Holistics, you can transform data by creating Transform Models. The general workflow is:

  1. Go to Data Modeling page, click Create -> Add Data Model from SQL
  2. Write a transformation SQL, validate & Save the model.
  3. Optional: Persist the model to the database to improve query performance, and set schedule to periodically run the transformation.

For example, we have a model called order_master in which we pre-join order_items, orders and products:

select
{{#o.id}} as order_id
, {{#o.user_id}}
, {{#o.status }}
, {{#o.order_created_date}} as order_created_date
, {{#o.order_created_month}} as order_created_month
, {{#o.delivery_attempts}}
, {{#o.discount}}
, {{#oi.quantity }}
, {{#oi.product_id}}
, {{#p.merchant_id}}
, {{#p.price}}
from {{ #ecommerce_order_items AS oi }}
left join {{ #ecommerce_orders AS o }} on {{#oi.order_id}} = {{#o.id}}
left join {{ #ecommerce_products AS p }} on {{#oi.product_id}} = {{#p.id}}

Query syntax depends entirely on the SQL flavor of your databases - Holistics only add a few syntax for model and field reference. For more details, please visit Data Modeling Syntax.

Complex transformation

In Holistics, we encourage users to refactor your complex transformations into components that can be used in several places. Holistics can recognize the dependency between your models and determine the sequence of transformation.

In the following transformation, we reuse model order_master:

with base as (
select
o.user_id
, {{#oi.orders_count}} as total_orders_count
, {{#oi.delivered_orders_count}} as delivered_orders_count
, {{#oi.gmv}} as gmv
, {{#oi.nmv}} as nmv
, min( {{#o.order_created_date}}) as first_order_date
, max( {{#o.order_created_date}}) as last_order_date
, current_date - max( {{#o.order_created_date}}) as days_from_last_order
from {{ #orders_master oi }}
left join {{ #ecommerce_orders o }} on o.id = oi.order_id
group by 1
)
select
u.id as user_id
, total_orders_count
, delivered_orders_count
, gmv
, nmv
, first_order_date
, last_order_date
, days_from_last_order
from {{#ecommerce_users u}}
left join base on u.id = base.user_id

When you run this final query, Holistics will know that the transform model order_master must be run first. If you save the query to a model called user_facts_aggr, you can also have a dependency graph of all models involved:

Changes you made in upstream parent models will be carried downstream. If in model orders_master we changed the definition of gmv field and got different values, this change will reflect to user_facts_aggr.