Skip to main content

Automated dbt Integration with Holistics via GitHub Actions

Knowledge Checkpoint

We think it would be best if you have a clear understanding of these concepts before reading this documentation:

To get dbt’s metadata into Holistics, we have to manually upload the manifest.json file using Holistics CLI. This is described in details here.

To automate this process, we can integrate it as part of a CI/CD workflow such as Github Actions.

Prerequisites

  • We will assume that you are already familiar with the tools mentioned in this guide. Refer to Knowledge Checkpoint box at the top of the guide for the list of necessary concepts/ tools.
  • You must already have a working dbt project, with valid database credentials. In this guide, we will use BigQuery, but the approach remains the same with other databases
  • You must already have a Holistics API Token. If you are unsure how, please refer to the Getting Started guide.

Outcome

Upon reading this guide, you should be able to create a Github Actions workflow that generates manifest.json file from the main/master branch of your dbt project, and uploads this file to Holistics using Holistics CLI. This workflow can be triggered:

  • manually
  • with a fixed schedule
  • or by a Pull Request’s merge event.

Step-by-step Guide

Example of a dbt profile that we use this guide:

holistics-bigquery:
target: prod

outputs:
prod:
type: bigquery
method: service-account
project: <your-bq-project-name>
dataset: dbt
threads: 32
timeout_seconds: 300
priority: interactive
keyfile: "specify_your_keyfile_path_here"

You can either use Service Account File by specifying a keyfile or use Service Account JSON on production environment.

Prepare necessary credentials with Action secrets

To use sensitive data like credentials in a workflow, the recommended way is to create and store them as Action secrets. Refer to this guide for more information.

Below is the list of credentials required for this workflow:

  • Your data-warehouse credentials.
  • Holistics API Token. This is used to authenticate your requests to Holistics. Read more about them in the Getting Started guide.

GitHub Action Secret

Create your workflow file

  1. In your repository, create the .github/workflows/ directory to store your workflow files.
  2. In the .github/workflows/ directory, create a new file called auto_dbt_integration.yml. Let’s define our workflow in this file.

GitHub Workflow File

Define the basic information for your workflow

name: Update dbt project state
on:
schedule:
- cron: '0 0 */2 * *'
workflow_dispatch:
jobs:
update-project-state:
runs-on: ubuntu-latest
env:
DBT_PROFILES_DIR: ./dbt_profiles
HOLISTICS_API_TOKEN: ${{ secrets.HOLISTICS_API_TOKEN }}
BQ_DEV_CREDENTIAL: ${{ secrets.BQ_DEV_CREDENTIAL }}

This part lets you define the display name of the workflow, and events that trigger it. In our code snippet, we have:

  • (Optional) name: Specify name of this workflow
  • on: Specify the triggers for this workflow. We have two triggers:
    • schedule: This workflow will be triggered on a schedule defined with cron syntax.
    • workflow_dispatch: This workflow will be triggered manually. For more information, check out this documentation.
  • jobs: A workflow needs at least one job.
    • The job is named updated-project-state.
    • runs-on: Its runner uses ubuntu-latest. This ensures you always use the most recent stable Ubuntu version.
    • Under the env key are the environment variables:
      • DBT_PROFILES_DIR: In your dbt repository, there should be a directory containing the profiles.yml file. Specify the directory path here.
      • HOLISTICS_API_TOKEN: If you have defined this credentials in Preparation task, simply reference it here with this syntax ${{ secrets.HOLISTICS_API_TOKEN }}
      • BQ_DEV_CREDENTIAL: If you have defined this credentials in Preparation task, simply reference it here with this syntax: ${{ secrets.BQ_DEV_CREDENTIAL }}

Now that we have the prerequisites, it’s time to write the job definition.

Write step definitions for your job

A job will consist of multiple steps. Let’s go through each step one by one.

Step 1: Check-out code from master

Action: https://github.com/actions/checkout

- name: Checkout master
uses: actions/checkout@v4
with:
ref: master

This action will clone the dbt code from the master branch of your repo into the runner's workspace.

Step 2: Read credentials from secret

- name: Read BigQuery credentials from secret
run: |
echo "$BQ_DEV_CREDENTIAL" | base64 -d -i > dbt_profiles/cred.json;

In this step, we echo the value of the env var BQ_DEV_CREDENTIAL into a JSON file, so that the dbt profile can read it.

Notes:

Step 3: Install Holistics CLI

The Holistics CLI is a self-contained binary that doesn't require any dependencies like Ruby or Python. You can install it directly using a simple curl command.

- name: Install Holistics CLI
run: |
curl -fsSL https://raw.githubusercontent.com/holistics/holistics-cli/refs/heads/master/install.sh | bash
echo "$HOME/.holistics/bin" >> $GITHUB_PATH

This installation script downloads the CLI binary and places it in $HOME/.holistics/bin. The second line adds this directory to the GitHub Actions PATH so subsequent steps can use the holistics command.

Step 4: Setting up Python & dbt

Action: https://github.com/marketplace/actions/setup-python

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: 3.7

After that, set up dbt:

- name: Set up dbt
run: |
pip3 install dbt-bigquery;
dbt deps;

Since version 1.0.0, you will need to specify the dbt adapter when installing. pip3 install dbt-bigquery will install dbt-core, along with the dependencies for dbt to run on BigQuery.

You will also need to run dbt deps to install packages needed in the dbt project.

Step 5: Generate dbt manifest file & upload to Holistics

The next step is to create the manifest.json file and upload it to Holistics.

- name: Generate manifest file for prod target
shell: bash
run: |
dbt compile --no-version-check -t prod;

- name: Upload manifest file to Holistics
run: |
holistics auth <server_location> $HOLISTICS_API_TOKEN
holistics dbt upload --file-path ./target/manifest.json --data-source=your_dbt_database_name

Notes:

  • dbt's manifest.json file can be generated by several dbt's commands, but dbt compile is the fastest and safest one to use. The command only parses the project and compiles dbt models into normal SQL files and nothing would be written to your database.

    The manifest file will be created at path ./target/manifest.json

  • Authentication with server location: By default, holistics auth connects to the Asia-Pacific server. If your Holistics account is on a different server, specify it explicitly:

    holistics auth eu $HOLISTICS_API_TOKEN    # For EU server
    holistics auth us $HOLISTICS_API_TOKEN # For US server
    holistics auth $HOLISTICS_API_TOKEN # For Asia-Pacific server (default)
  • It is important that the --data-source matches the database of the dbt target you use to generate the manifest file.

    In this case, the prod target and the your_dbt_database_name data source both refer to the same BigQuery project. In fact, they both use the same Service Account file.

(Optional) Trigger workflow when a pull request is merged

In case you want your Holistics AML project to always be updated with the latest metadata from dbt, you can set up the workflow to be triggered when you merge a Pull Request.

name: Update dbt project state
on:
pull_request: # Trigger workflow upon closing a PR
types: [closed]
workflow_dispatch:

jobs:
update-project-state:
# Run workflow only when the PR is closed after merging
if: github.event.pull_request.merged == true
runs-on: ubuntu-latest

To do so, add pull_request as a trigger event.

The line if: github.event.pull_request.merged == true ensures the workflow is only triggered when the PR closing event comes from a merging action. Without it, if when you close a PR without merging, this workflow will be triggered.

Code Example

Example of a full workflow definition:

name: Update dbt project state
on:
schedule:
- cron: '0 0 */2 * *'
workflow_dispatch:

jobs:
update-project-state:
runs-on: ubuntu-latest

env:
DBT_PROFILES_DIR: ./dbt_profiles
HOLISTICS_API_TOKEN: ${{ secrets.HOLISTICS_API_TOKEN }}
BQ_DEV_CREDENTIAL: ${{ secrets.BQ_DEV_CREDENTIAL }}

steps:

# Preparations
- name: Checkout master
uses: actions/checkout@v4
with:
ref: master

- name: Read BigQuery credentials from secret
run: |
mkdir -p dbt_profiles/;
echo "$BQ_DEV_CREDENTIAL" | base64 -d -i > dbt_profiles/cred.json;
shell: bash

- name: Install Holistics CLI
run: |
curl -fsSL https://raw.githubusercontent.com/holistics/holistics-cli/refs/heads/master/install.sh | bash
echo "$HOME/.holistics/bin" >> $GITHUB_PATH

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: 3.7

- name: Set up dbt
run: |
pip3 install dbt-bigquery;
dbt deps;

- name: Generate manifest file for prod target
shell: bash
run: |
dbt compile --no-version-check -t prod;

- name: Upload manifest file to Holistics
run: |
# Add server location if not using Asia-Pacific: holistics auth us $HOLISTICS_API_TOKEN
holistics auth $HOLISTICS_API_TOKEN
holistics dbt upload --file-path ./target/manifest.json --data-source=bigquerydw

Let us know what you think about this document :)