Skip to main content

dbt to Holistics Validation

Coming Soon

This feature is under development and will be coming soon!

Introduction

dbt-holistics-validation-intro

When developing in dbt, the typical workflow is:

  1. Apply the dbt change to a local/development database
  2. Go to the BI tool (Holistics) to test if the new database schema breaks anything e.g. dashboards

One of the most common error that we see is the “Column not found” error, this usually happens when a column is renamed or drop in the database.

Manually compare the Holistics models with the dbt models is not efficient at all. That’s why we’re developing an automated process to help analysts quickly detect broken Holistics models based on the dbt changes

High-level solution

Holistics provides a CLI tool so that you can trigger the validation on your CI/CD server.

Steps:

  1. Install the Holistics CLI tool

  2. Provide the dbt manifest.json and catalogs.json for the CLI

    1. These files are generated by dbt using dbt run and dbt docs generate
  3. Trigger the CLI validation feature

    $ holistics-cli dbt validate --manifest manifest.json \
    --catalogs cataglogs.json
    --branch=dev_1
  4. Analyze the result from the script

      {
    'ecommerce_model_cities.model.aml':
    {
    'dbt_model': 'model.ecommerce_dbt.model_cities',
    'added': [],
    'removed': [],
    'changed': ['id']
    },
    'ecommerce_model_users.model.aml':
    {
    'dbt_model': 'model.ecommerce_dbt.model_users',
    'added': ['sign_up_date'],
    'removed': ['deleted_at'],
    'changed': []
    }
    }

    dbt model without Holistics models: []

    Holistics models without dbt nodes: []

    (please note this is not the final structure since we’re still developing it)

Detailed design

Internally, the CLI compares the dbt artifacts and the Holistics data models (obtain via Holistics API)

For example:

  • This is how the column id is described inside the catalogs.json of dbt
"metadata":
{
"type": "table",
"schema": "ecommerce",
"name": "model_users",
"database": "airy-berm-145910"
},
"columns": {
"id":
{
"type": "INT64",
"index": 1,
"name": "id",
},
}
  • This is how the column id is described in Holistics as a dimension
"data_source": {
"id": 24914,
"dbtype": "bigquery"
},
"dimensions": [
{
"name": "id",
"label": "Id",
"type": "text",
"definition": "@sql {{ #SOURCE.id }}"
},
]

The CLI tool matches the dbt model with the Holistics model, then it compares the columns and the dimensions to detect any differences.

FAQs

  1. Can I use the dbt validation without using the CLI?
  • Not for now, we’re trying to ship the CLI first, however, we definitely bring this to our web app when the feature is stable

Let us know what you think about this document :)