In Holistics, a Transform Model (or SQL Model) is a data model created from a SQL
SELECT statements that perform data transformations on other tables/models.
Think of transform model as a view (or materialized view) in your database.
You can create Transform Model for many different purposes:
- To do data cleaning, standardizing operations from the raw data (usually stored in based models)
- To pre-aggregate large amount of data for performance improvement purpose
- Building reusable models to be used in multiple places, avoid repeating the same SQL query/logic.
At the moment, you can create Transform Models using SQL. In the future, you'll be able to do this with Python or other languages.
Creating Transform Model
Simply go to Data Modeling page, click Create -> Add Data Model from SQL
Querying other models
In your SQL query, it is recommended that you should reference directly to other data models, instead of querying the physical table. For example:
Doing this will:
- Decouple the logical and physical aspect of the database. You woulnd't be able to worry about whether there exists an
orders_mastertable in the underlying database.
- Let Holistics build a proper dependency graph between the data models, helping with data flow.
Holistics will compile the query into valid SQL syntax of your database. For more details on the syntax, please visit Data Modeling Syntax.
Persisting model's results into SQL table
When defining your Transform Model, you have the option to write (persist) the data back to a table in your SQL database. That means at scheduled interval, the models' SQL query will be executed, and write the results into a database table.
If you select this option, the Transform Model will be backed by a physical table in your data warehouse. The table will be updated with new data following a schedule of your choice. Think of this similar to "materialized view" concept in standard SQL databases.
When you refer to a persisted model in another transformation, you will be querying from the table instead of rerunning the whole transformation sequence. The final query will look like this:
- Pros: Reduce the amount of actual data scanned when you explore data from this model -> lighter load on your system, and shorter query time.
- Cons: Data can be staled or persistence schedule between models can be mismatch, which produces wrong results. (Flow-based persistence will be available in the future)
Advise: persist your transform model when
- It is an upstream model that runs slowly (due to complex query, or large amount data is scanned)
- It is a downstream model that your end users will explore frequently. This way they can have a faster exploration experience.
If you choose to not persist your Transform Model, the model will resemble a "view" in your database. When you refer to a non-persistence model, the model's full SQL will be inserted in the final SQL and you will rerun the whole transformation sequence.
When querying a "non-persisted transform model", the compiled SQL will usually contain a CTE (SQL
WITH statement). Something like:
- Pros: Have visibility of the whole transformation sequence
- Cons: Possible higher load to your database, and slower query.
Advise: keep your model not persisted when
- The transformation makes little changes to the data (mostly renaming, concatenating...)
- You need to ensure a series of data transformation use and produce absolutely up-to-date data.
Holistics uses the modeling syntax to know the dependencies between SQL models and determine the sequence of transformation.
For example, when you run the following query, Holistics knows that model
ecommerce_users must be executed first.
If you save the query to a model called
user_facts_aggr, Holistics can generate a dependency map for that model.
Changes you made in the parent model will be carried over subsequent models. If in model
orders_master we changed the definition of
gmv field and got different values, this change will reflect to
Effects when changing model structure
When editing a Transform Model, if you change the structure of the result set (add fields, remove fields...), there will be side effects:
- If your Transform Model has persistence, the persistence will be reset
- Removing/renaming a field will:
- Invalidate any custom dimensions or measures that are referring to it in their formulas
- Invalidate any relationships that are pointing to that field
- Break any Transform Model or report that refers to that field
- Merely changing a field's data type will not reset the persistence setting. However, it can cause unforeseen issues in custom dimensions, measures and models using that field.
Broken downstream models
Transform Model can be broken when the structure of an upstream model is changed. For example:
- When the model's query refers to fields/measures that are no longer available, or invalid in the upstream model.
- When the upstream model is invalid.