dbt to Holistics Validation
This feature is under development and will be coming soon!
Introduction
When developing in dbt, the typical workflow is:
- Apply the dbt change to a local/development database
- Go to the BI tool (Holistics) to test if the new database schema breaks anything e.g. dashboards
One of the most common error that we see is the “Column not found” error, this usually happens when a column is renamed or drop in the database.
Manually compare the Holistics models with the dbt models is not efficient at all. That’s why we’re developing an automated process to help analysts quickly detect broken Holistics models based on the dbt changes
High-level solution
Holistics provides a CLI tool so that you can trigger the validation on your CI/CD server.
Steps:
Install the Holistics CLI tool
Provide the dbt
manifest.json
andcatalogs.json
for the CLI- These files are generated by dbt using
dbt run
anddbt docs generate
- These files are generated by dbt using
Trigger the CLI validation feature
$ holistics-cli dbt validate --manifest manifest.json \
--catalogs cataglogs.json
--branch=dev_1Analyze the result from the script
{
'ecommerce_model_cities.model.aml':
{
'dbt_model': 'model.ecommerce_dbt.model_cities',
'added': [],
'removed': [],
'changed': ['id']
},
'ecommerce_model_users.model.aml':
{
'dbt_model': 'model.ecommerce_dbt.model_users',
'added': ['sign_up_date'],
'removed': ['deleted_at'],
'changed': []
}
}
dbt model without Holistics models: []
Holistics models without dbt nodes: [](please note this is not the final structure since we’re still developing it)
Detailed design
Internally, the CLI compares the dbt artifacts and the Holistics data models (obtain via Holistics API)
For example:
- This is how the column
id
is described inside thecatalogs.json
of dbt
"metadata":
{
"type": "table",
"schema": "ecommerce",
"name": "model_users",
"database": "airy-berm-145910"
},
"columns": {
"id":
{
"type": "INT64",
"index": 1,
"name": "id",
},
}
- This is how the column
id
is described in Holistics as a dimension
"data_source": {
"id": 24914,
"dbtype": "bigquery"
},
"dimensions": [
{
"name": "id",
"label": "Id",
"type": "text",
"definition": "@sql {{ #SOURCE.id }}"
},
]
The CLI tool matches the dbt model with the Holistics model, then it compares the columns and the dimensions to detect any differences.
FAQs
- Can I use the dbt validation without using the CLI?
- Not for now, we’re trying to ship the CLI first, however, we definitely bring this to our web app when the feature is stable