Skip to main content

Transforming data

After you import data from your sources, data is still in "raw" state and needs to be transformed to be usable in reports and explorations. For example:

  • You imported data from MongoDB, and the data you need is still in a nested field
  • You imported data from two financial platforms of your company, and need to combine them to have a full view of your customer activities.

This guide will walk you through the steps to transform your data with Holistics's Query Model.

What is a Query Model?

A Query Model is simply a SELECT statement that performs data transformation, coupled with mechanisms to add metadata and schedule your transformation jobs.

query model

At the moment, only Analysts and Admins can create Query Models.

How to transform data?

To begin, please follow these steps:

  1. Go to Data Modeling page and navigate to a folder that you want to place your model in.
  2. Click the (+) button next to the folder's name on the left panel, or click Create on the top right corner of the screen.
  3. Select Add Data Model from → Data Transform, and the SQL editor will appear. From here you can start writing your transformation SQL.
  4. After finishing your SQL, Run & Validate to preview your transformed data, then lick Save to finish the process.

For example, I'm writing a model to combine order_items, orders and products models into a master model so I can calculate metrics like GMV, NMV... later:

query model

The syntax depends on your database's SQL flavor (in the example above, we used a PostgreSQL database). Therefore, you do not need to learn a totally new query language, aside from some additional Holistics syntax:

  • Model reference:

    {{ #model_name as alias }}
    instead of select directly from tables. This is required so that Holistics's engine can recognize the other models that your transformation depends on.

  • Field reference:

    {{ #alias.field_name }
    or
    {{ #model_name.field_name }}
    . This is to ensure Holistics's engine to pickup only necessary fields, and to make use of any Calculated Fields/Measures you pre-created on any of the referenced models.

query model

Schedule your transformation

You can also set schedules to periodically write your transformation result to the database to improve later queries' performance by following these steps:

  1. Toggle on the Storage Setting at the top right corner
  1. Configure settings for the transformation, like destination, run frequency or transformation mode on the Storage Settings modal.

    For more detailed information, please refer to our docs: Storage Settings

storage settings

Reuse Calculated Fields / Measures in transformations

In Calculated Fields and Measures guide, we have already known that they are virtual fields created on a model to combine existing fields/measures. These fields and measures are normally used in the drag-and-drop exploration interface, but you can also use them to make your analytics code cleaner.

For example, in model order_item_transform and ecommerce_orders,I have created a few measures like orders_count and delivered_orders_count as follows:

//orders_count
count({`{#THIS.id}`})

// delivered_orders_count
count(case when {`{#THIS.status}`} = 'delivered' then {`{#THIS.id}`} else null end)

Now I want to write a Query Model to aggregate some statistics of each user. Normally I will need to write the whole CASE ... WHEN ... clause again, but here I only need to refer to the easure's name:

measures

Calculated Fields can also be used in the same way:

calculated fields

Holistics's SQL generation engine will parse the field and measure reference, and generate the appropriate query to be run against your database.

Notes:

  • Measures and Calculated Fields are simply SQL snippets, so when you use them in queries you have to give them a name (alias), or the resulted column will be named by your database's engine.
  • Measures are still aggregations (SUM, COUNT, MAX, MIN...) so you still need the GROUP BY clause at the end of your query.

Calculated Fields/Measures reference in your query is a very powerful tool, especially when you want to reuse your logic and improve your analytics code readability. In this article, we only briefly see what it can do. For more use cases, you can check the following articles:

  • How to model your MongoDB data
  • How to work with Pipedrive custom fields
  • Event modeling with Snowplow

See you in the next guide!


Let us know what you think about this document :)