Skip to main content

Transform Model

Definition

When you have duplicate SQL logic in multiple reports, or a slow-running complex SQL query you want to speed up performance, one possible solution is to pre-transform the data and centralize the logic. Transform Model is designed to help with that use case.

In Holistics, a Transform Model (or SQL Model) is a data model created from a SQL SELECT statements that perform data transformations on other tables/models. Think of transform model as a view (or materialized view) in your database.

Benefits:

  • Query performance: Pre-aggregate large query that scan many rows into smaller table for performance improvement.
  • Reusability of logic: Building reusable models to be used in multiple places, avoid repeating the same SQL query/logic.

At the moment, you can create Transform Models using SQL. In the future, you'll be able to do this with Python or other languages.

Does the results of the query store back to my database? By default it doesn't store the query results into your database. However, you can turn on Persistence mode and it will be written into your database table.

Note: Holistics's Transform Model is best used for straight-forward, small scale transformation. For more advanced transformation capabilities, we recommend you leverage dedicated transformation tools.

Creating Transform Model

Create 3.0 Transform Model

To create a Transform Model in Holistics 3.0:

  1. Go to Data Modeling page and navigate to a folder that you want to place your model in.
  2. Click the (+) button next to the folder's name on the left panel, or click Create on the top right corner of the screen.
  3. Select Add Data Model from → Data Transform, and the SQL editor will appear. From here you can start writing your transformation SQL.
  4. After finishing your SQL, Run & Validate to preview your transformed data, then lick Save to finish the process.

For example, I'm writing a model to combine order_items, orders and products models into a master model so I can calculate metrics like GMV, NMV... later:

transform model

The syntax depends on your database's SQL flavor (in the example above, we used a PostgreSQL database). Therefore, you do not need to learn a totally new query language, aside from some additional Holistics syntax:

  • Model reference: {{ #model_name as alias }} instead of select directly from tables. This is required so that Holistics's engine can recognize the other models that your transformation depends on.

  • Field reference: {{ #alias.field_name }} or {{ #model_name.field_name }} . This is to ensure Holistics's engine to pickup only necessary fields, and to make use of any Calculated Fields/Measures you pre-created on any of the referenced models.

In your SQL query, it is recommended that you should reference directly to other data models, instead of querying the physical table. Doing this will:

  • Decouple the logical and physical aspect of the database. You woulnd't be able to worry about whether there exists an orders_master table in the underlying database.
  • Let Holistics build a proper dependency graph between the data models, helping with data flow.

Holistics will compile the query into valid SQL syntax of your database. For more details on the syntax, please visit Data Modeling Syntax.

Storage Settings (or Model Persistence)

When defining your Transform Model, you have the option to write (persist) the data back to a table in your SQL database. That means at scheduled intervals, the models' SQL query will be executed, and write the results into a database table.

Enable Storage Settings for Holistics 3.0

You can choose to persist your model by toggling the Enable Settings option.

To learn more about all available Storage Modes, please visit Model Storage Settings.

Enable Storage Settings

This is not an option that you should enable by default, but you should base the decision on the nature of your model.

Refer to the sections below for the differences if you choose to turn on/off this setting.

If you choose to turn on Storage Settings

The Transform Model will now be backed by a physical table in your data warehouse. The table will be updated with new data following a schedule of your choice. Think of this similar to "materialized view" concept in standard SQL databases.

When you refer to a persisted model in another transformation, you will be querying from the table instead of rerunning the whole transformation sequence. The final query will look like this:

select
field_1
, field_2
, calculation_1
from persisted_model_a
info
  • Pros: Reduce the amount of actual data scanned when you explore data from this model -> lighter load on your system, and shorter query time.
  • Cons: Data can be stale or persistence schedule between models can be mismatch, which produces wrong results. (Flow-based persistence will be available in the future)

For more details on how the persistence works, please check our docs on Storage Settings.

If you choose to turn off Storage Settings

The model will now resemble a "view" in your database. When you refer to a non-persistence model, the model's full SQL will be inserted in the final SQL and you will rerun the whole transformation sequence.

When querying a "non-persisted transform model", the compiled SQL will usually contain a CTE (SQL WITH statement). Something like:

with model_a as (
select
field_1
, field_2
, some_calculation as calculation_1
from source_table
)

select
field_1
, field_2
, calculation_1
from model_a
info
  • Pros: Have visibility of the whole transformation sequence
  • Cons: Possible higher load to your database, and slower query.

When should I enable Storage Settings for my Transform Models?

Here is the general rule of thumb to help you decide whether to turn on Storage Settings for your models:

✔️ You should turn it on when:

  • It is an upstream model that runs slowly (due to complex query, or large amount data is scanned)
  • It is a downstream model that your end users will explore frequently. This way they can have a faster exploration experience.

❌ You should not turn it on when:

  • The transformation makes little changes to the data (mostly renaming, concatenating...)
  • You need to ensure a series of data transformation use and produce absolutely up-to-date data.

Model Dependencies

Holistics uses the modeling syntax to know the dependencies between SQL models and determine the sequence of transformation.

For example, when you run the following query, Holistics knows that model order_master, ecommerce_orders and ecommerce_users must be executed first.

with base as (
select
{{ #o.user_id }}

, {{ #oi.orders_count }} as total_orders_count
, {{ #oi.delivered_orders_count }} as delivered_orders_count
, {{ #oi.gmv }} as gmv
, {{ #oi.nmv }} as nmv
, min( {{ #o.order_created_date }} ) as first_order_date
, max( {{ #o.order_created_date }}) as last_order_date
, current_date - max( {{ #o.order_created_date }}) as days_from_last_order
from {{ #orders_master oi }}
left join {{ #ecommerce_orders o }} on {{ #o.id }} = {{ #oi.order_id }}
group by 1
)

select
{{ #u.id }} as user_id
, total_orders_count
, delivered_orders_count
, gmv
, nmv
, first_order_date
, last_order_date
, days_from_last_order

from {{ #ecommerce_users u }}
left join base on {{ #u.id }} = base.user_id

If you save the query to a model called user_facts_aggr, Holistics can generate a dependency map for that model.

Changes you made in the parent model will be carried over subsequent models. If in model orders_master we changed the definition of gmv field and got different values, this change will reflect to user_facts_aggr.

Troubleshooting

Effects when changing model structure

When editing a Transform Model, if you change the structure of the result set (add fields, remove fields...), there will be side effects:

  • If your Transform Model has persistence, the persistence will be reset
  • Removing/renaming a field will:
    • Invalidate any custom dimensions or measures that are referring to it in their formulas
    • Invalidate any relationships that are pointing to that field
    • Break any Transform Model or report that refers to that field
  • Merely changing a field's data type will not reset the persistence setting. However, it can cause unforeseen issues in custom dimensions, measures and models using that field.

Broken downstream models

Transform Model can be broken when the structure of an upstream model is changed. For example:

  • When the model's query refers to fields/measures that are no longer available, or invalid in the upstream model.
  • When the upstream model is invalid.





Let us know what you think about this document :)