Query Model

info

If you are familiar with dimension modeling tools like dbt or Dataform, Holistics' query models serve similar purpose to those tools, providing a light-weight way to do dimensional modeling.

For more details on when to model in dbt vs Holistics, please refer to this document. We also support dbt integration that allows you to use both dbt and Holistics seamlessly.

Introduction

When you have duplicate SQL logic in multiple reports, or a slow-running complex SQL query you want to speed up performance, one possible solution is to pre-transform the data and centralize the logic. Query Model is designed to help with that use case.

In Holistics, a Query Model (or SQL Model) is a data model created from a SQL select statements that perform data transformations on other tables/models. Think of query model as a view (or materialized view) in your database.

Benefits:

Query performance: Pre-aggregate large query that scan many rows into smaller table for performance improvement.
Reusability of logic: Building reusable models to be used in multiple places, avoid repeating the same SQL query/logic.
Maintainability of logic: Breaking down complex logic to multiple query models make it easier to understand and modify the query/logic.
Custom join logic: Allowing custom join logic by creating a query model that joins multiple models together.
Query parameters: Allows the model's logic to by dynamic based on user inputs.

tip

Store query result in the database: By default the query result is calculated at run time and is not stored in your database. To store query result, you will need to use Model Persistence.
Holistics's Query Model is best used for straight-forward, small scale transformation. For more advanced transformation features (for example, incremental transformation), we recommend using a dedicated transformation tool, such as dbt or Dataform.

Create a Query Model

To create a new query model, go to the Development, click on the + symbol next to a folder, and select Add Query Model. A new screen will appear, and from here you can write the query of the model.

After running the query and save the model, a new .model.aml file will be created.

Query Model Syntax

info

Please refer to AML Query Model Reference to learn more about all available parameters and example usage.

Below is a sample of how the Query Model syntax will look like. All of these codes will be generated when you create the model using the graphical UI, but you can also create the query model manually by writing from scratch:

Model model_name {
  type: 'query'
  label: "Model Label"
  description: ""
  owner: '[email protected]'
  data_source_name: 'data_source_name'

  models: [model_a, model_b]
  query: @sql 
    select 
      {{ #model_a.field_name }} 
    FROM {{ #model_a }};;

  dimension dimension_name_1 {
    // dimension 1 definition
    label: 'Dimension Label 1'
    type: 'text | number | date | datetime | truefalse'
    description: 'Field Description'
    hidden: true|false
    definition: @sql {{#SOURCE.column_name}};;
  }

  measure measure_name_1 {
    // measure 1 definition
    label: 'Measure Name'
    type: 'number | date | datetime'
    definition: @sql {{#SOURCE.column_name}};;
    aggregation_type: 'count | count distinct | sum | avg'
  }
}

Model Persistence

info

Please refer to Query Model Persistence document to learn more about all persistence modes and how they work.

With this feature, you can execute the model's query at scheduled intervals, and store the result in a database table.

To turn on Persistence for your Model, add persistence param to your query model:

  Model orders {
  type: 'query'
  label: "Orders"
  data_source_name: 'demodata'
  owner: '[email protected]'

  query: @sql select * FROM ecommerce.orders ;;

  // Specify persistence mode here
  persistence: FullPersistence {
      schema: 'persisted'
      on_cascade: 'rebuild'
    }
  }

  dimension id {
    label: 'Id'
    type: 'number'
  }
  // dimension ...
}

You will also need a schedules.aml file in the root folder of your AML project with the following content:

const schedules = [
  // Schedule orders model to run every 10 minutes
  Schedule { models: [orders], cron: '0,10,20,30,40,50 * * * *' }
]

Effects of Model Persistence

The following sections will describe the effects of Model Persistence on the queries generated from your models.

With Persistence

The Query Model will now be backed by a physical table in your data warehouse. The table will be updated with new data following a schedule of your choice. Think of this similar to "materialized view" concept in standard SQL databases.

When you refer to a persisted model in another transformation, you will be querying from the table instead of rerunning the whole transformation sequence. The final query will look like this:

select field_1, field_2, calculation_1 from persisted_model_a

Pros & Cons

Pros: Reduce the amount of actual data scanned when you explore data from this model -> lighter load on your system, and shorter query time.
Cons: Data can be stale or persistence schedule between models can be mismatch, which produces wrong results.

Without Persistence

The model will now resemble a "view" in your database. When you refer to a non-persistence model, the model's full SQL will be inserted in the final SQL and you will rerun the whole transformation sequence.

When querying a "non-persisted query model", the compiled SQL will usually contain a CTE (SQL WITH statement). Something like:

with model_a as (
  select
    field_1
    , field_2
    , some_calculation as calculation_1
  from source_table
)

select
  field_1
  , field_2
  , calculation_1
from model_a

Pros & Cons

Pros: Have visibility of the whole transformation sequence
Cons: Possible higher load to your database, and slower query.

When should I use Persistence?

Here is the general rule of thumb to help you decide whether to turn on Model Persistence for your models:

✔️ You should use Model Persistence when:

It is an upstream model that runs slowly (due to complex query, or large amount data is scanned)
It is a downstream model that your end users will explore frequently. This way they can have a faster exploration experience.

❌ You should not use Model Persistence when:

The transformation makes little changes to the data (mostly renaming, concatenating...)
You need to ensure a series of data transformation use and produce absolutely up-to-date data.

Query Parameters

Typically, when you filter on a query model, the filtering happens as follows:

Holistics executes the SQL of the model without filter, and returns a result set
Filter condition is applied on the result set

In many cases, this behavior is not desirable (for example, when the underlying table is large). With Query Parameters, users can insert the filtering condition directly into the model's query:

Default Behavior
With Query Parameter

# 'order_aggregated' model
select
  order_date,
  count(*) as order_count
from orders

group by 1

# Apply "Order Date = '2024-01-01'" filter from dashboard
with order_aggregated as (
  select
    order_date,
    count(*) as order_count
  from orders

  group by 1  
)

select
  order_date,
  order_count
from order_aggregated
where order_date = '2024-01-01'  # Filter outside of CTE

# 'order_aggregated' model
select
  order_date,
  count(*) as order_count
from orders
where {% filter(order_date_param) %} orders.order_date {% end %}
group by 1

# Apply "Order Date = '2024-01-01'" filter from dashboard
with order_aggregated as (
  select
    order_date,
    count(*) as order_count
  from orders
  where order_date = '2024-01-01'  # Filter inside of CTE
  group by 1  
)

select
  order_date,
  order_count
from order_aggregated

info

Please refer to Query Parameters document for more details about this feature.

Model Dependencies

Holistics uses the modeling syntax to know the dependencies between SQL models and determine the sequence of transformation.

For example, when you run the following query, Holistics knows that model order_master, ecommerce_orders and ecommerce_users must be executed first.

with base as (
  select
    {{ #o.user_id }}
    , {{ #oi.orders_count }} as total_orders_count
    , {{ #oi.delivered_orders_count }}  as delivered_orders_count
    , {{ #oi.gmv }} as gmv
    , {{ #oi.nmv }} as nmv
    , min( {{ #o.order_created_date }} ) as first_order_date
    , max( {{ #o.order_created_date }}) as last_order_date
    , current_date - max( {{ #o.order_created_date }}) as days_from_last_order
  from {{ #orders_master oi }}
  left join {{ #ecommerce_orders o }} on {{ #o.id }} = {{ #oi.order_id }}
  group by 1
)

select
    {{ #u.id }} as user_id
    , total_orders_count
    , delivered_orders_count
    , gmv
    , nmv
    , first_order_date
    , last_order_date
    , days_from_last_order

from {{ #ecommerce_users u }}
left join base on {{ #u.id }} = base.user_id

If you save the query to a model called user_facts_aggr, Holistics can generate a dependency map for that model.

Changes you made in the parent model will be carried over subsequent models. If in model orders_master we changed the definition of gmv field and got different values, this change will reflect to user_facts_aggr.

Troubleshooting

Effects when changing model structure

When editing a query model, if you change the structure of the result set (add fields, remove fields...), there will be side effects:

If your query model has persistence, the persistence will be reset
Removing/renaming a field will:
- Invalidate any custom dimensions or measures that are referring to it in their formulas
- Invalidate any relationships that are pointing to that field
- Break any query model or report that refers to that field
Merely changing a field's data type will not reset the persistence setting. However, it can cause unforeseen issues in custom dimensions, measures and models using that field.

Broken downstream models

A query model can be broken when the structure of its upstream model is changed. For example:

When the model's query refers to fields/measures that are no longer available, or invalid in the upstream model.
When the upstream model is invalid.

Introduction​

Create a Query Model​

Query Model Syntax​

Model Persistence​

Effects of Model Persistence​

With Persistence​

Without Persistence​

When should I use Persistence?​

Query Parameters​

Model Dependencies​

Troubleshooting​

Effects when changing model structure​

Broken downstream models​

Introduction

Create a Query Model

Query Model Syntax

Model Persistence

Effects of Model Persistence

With Persistence

Without Persistence

When should I use Persistence?

Query Parameters

Model Dependencies

Troubleshooting

Effects when changing model structure

Broken downstream models