Using Holistics Import allows you to quickly ETL/sync data from different sources into your data-warehouse automatically.
As your source data grows everyday, incremental Import allows you to load data incrementally, loading only the data that's been changed. This reduces the need to load old/untouched data.
When Your Past Records Data Don't Change
If your past records/data don't change (example: weblogs, click streams, etc), on every run, Holistics simply figure out the new data that haven't been imported over, and perform the import.
You need to specify to Holistics what is your
increment_column. It will rely on this column
to decide what data to extract from source to load into data warehouse.
Examples of increment_column:
- ID (auto-increasing primary key)
- timestamp / created_at: timestamp at which the record is created
When Your Past Records Data Change
This is when your past records data can change over time (example would be
In this case Holistics would require you to specify an additional column called
increment_pk_key (or primary key / unique column). This
is needed to perform an upsert operation (or INSERT ON DUPLICATE OVERWRITE), where existing records in your data-warehouse will
In the above case, your table would need to contain 2 columns:
updated_at: timestamp at which your record is last updated
id: table's unique or primary key
Handling Deleted Records
In either of the above case, if you have records in source that get hard-deleted, doing incremental import will not be able to capture these deleted records and apply in the destination table. In this case there are 2 ways to handle this.
Doing Regular Full Import
You can schedule a full import every week to reload the entire table data.
Moving To Soft-Delete
The recommended ways to handle this is to move your application logic to using soft-delete (i.e just mark the record as deleted instead of actually deleting it). Usually this is implemented by:
- Adding a
deleted_atcolumn to your table, default to
NULL(not deleted). A record with non-NULL value is considered deleted.
- When a record is deleted, set both
updated_atto the current time.
With this setup, the Incremental approach as described above will work as normal.
A Note On Soft-Delete
Note that aside from serving the incremental import purpose above, soft-deletion is also our recommended way:
Operations: You don't lose track of old data - a hard-deleted record is gone forever and you would never be able to restore it. A user is accidentally deleted cannot be restored. A deleted order cannot be traced back.
Analytics: Your records don't just sit alone but it has relational with other data (a booking would have user_id). If we hard-delete the user, we won't be able to make sense of the booking data any longer.