Model Storage Settings

Introduction

In both Holistics's Import Models and Transform Models, Holistics writes data to your data warehouse to either make it available for querying, or to improve query performance. There are four writing modes supported in Holistics:

  • Full mode: The destination table will be replaced completely by a new result set.
  • Append mode: A full snapshot of the source data will be appeneded to the destination table, and old record are left intact. This mode is available for Import Models only.
  • Incremental mode: New records will be appended to the destination table.
  • Upsert mode: A combination of UPDATE and INSERT. Records with changes will be updated, and new records will be inserted into the destination table.

These modes behave a bit differently between Import Models and Transforms Models, and we will go into the details below.

Full mode

You may want to use Full mode if your data is small, the records change regularly, and you do not need to retain the history of your data.

Note: If you are importing a large table (like events or log tables), you should avoid Full mode and use Incremental mode instead.

Mechanism:

This mode works similarly in both Import and Transform Models. Basically, Holistics will drop the old table in your destination if (if it already exists), and replace it with new data.

  • In Import Models, this means the destination table will be replaced by the most recent snapshot of your source data.
  • In Transform Model, the destination table will be replaced by the new result set produced by the model's SQL.

Append mode

You may want to use Append mode if your source records are updated directly, and you want to retain a log of those changes. In this mode, a full copy of your source table will be appended to the destination table.

Note: The Append mode is only availabe for Import Models.

Mechanism:

Holistics will get all records from the source, and insert them to the existing destination table. Old records are left untouched.

  • Pros: Can retain historical changes of your data.
  • Cons: Your destination table will become bloated quickly if the run frequency is high.

Incremental mode

This mode behaves differently between Import and Transform Models: it can reduce the amount of data scanned in Import Models, but cannot do so in Transform Models.

In Import Models

This should be used if your source table is large, and past records do not change. In this mode, only new records from the model's query will be appended to the destination table.

Note: This mode is only available for SQL data sources.

Mechanism:

To use Incremental mode, you only need to specify an increment column, and Holistics will use that column to determine the data range to be queried from the source. Depending on your case, this can be a created_at or an updated_at column.

How the import will work:

  • Step 1: get the current max value of the increment column from the destination table (let's call it max_value_of_destination)

    SELECT MAX(incremental_column) FROM dest_table
  • Step 2: Only retrieve records whose value of the increment column is greater than the current max value above:

    SELECT source_data
    FROM source_table
    WHERE incremental_col > max_value_of_destination
  • Step 3: Insert all new records retrieved in Step 2 to the destination table.

This way, the import run time is reduced significantly because Holistics only needs to scan a section of your data, not the whole source table.

In Transform Models

At the moment, Holistics Transform Model cannot dynamically limit the range of data it will query. The Incremental Mode can only determine which records to be inserted into the destination table.

How it works:

  • Step 1: get the max value of the increment column from the destination table (let's call it max_value_of_destination):

    SELECT MAX(incremental_column) FROM dest_table
  • Step 2: Run the model's SQL as-is to produce a result set. For example, we have an orders_aggregation model that counts the number of orders created daily:

    SELECT
    created_date
    , count(*) as orders_count
    FROM orders
    group by 1
  • Step 3: Holistics uses max_value_of_destination to filter the result set produced in Step 2, and insert the filtered records into the destination table.

    Basically, the process will look something like:

    INSERT INTO dest_schema.orders_aggregation
    WITH base as (
    SELECT
    created_date
    , count(*) as orders_count
    FROM orders
    group by 1
    )
    SELECT * FROM base
    WHERE created_date > {{max_value_of_destination}} -- Filter is outside of the CTE

This means the whole source table is still scanned, and the process can be potentially costly.

Upsert mode

This mode also behaves differently between Import and Transform Models.

In Import Models

In case your source table is large and past records are updated, this should be used.

  • If there are new records, those will be appended to the destination table.
  • If the existing records from the source table are updated, we will update the corresponding records in the destination table.

Note: This mode is only available for SQL data sources.

Mechanism:

To use this mode, you will need to specify both the Primary Key and the increment column. Normally they are the the id and updated_at columns.

  • Step 1: Get the max value of hte increment column from the destination table (let's call it max_value_of_destination')

    SELECT MAX(incremental_col) FROM des_table
  • Step 2: Retrieve the source records with increment column's value greater than max_value_of_destination

    SELECT primary_key, source_data
    FROM source_table
    WHERE incremental_col_source > max_value_of_destination
  • Step 3: Delete records in the destination table with primary_key existing in the result set of Step 2.

  • Step 4: Insert all new records produced in Step 2 to the destination table.

Similar to Incremental mode, this can also reduce your data import run time.

In Transform Models

Upsert mode in Transform Models shares the limitation with the Incremental mode: the WHERE filter is not applied within the initial query, but applied to the result set produced by the query.

Therefore, Upsert mode in Transform Models is still potentially costly.

What if I still want to do incremental transform?

As stated in Transform Models docs, we do not intend to make this feature a full-fledged ETL solution. Normally for more advanced data transformation use cases, we recommend our customers to checkout dedicated transformation tools like dbt or Dataform.