DataHub integration
This feature is currently in development and not yet available. Contact Holistics to sign up for early access.
Introduction
DataHub is an open-source metadata platform that helps organizations discover, understand, and govern their data assets. By integrating Holistics with DataHub, you can catalog your BI layer alongside your data warehouse tables, creating a unified view of your entire data stack.
This integration automatically ingests Holistics models, datasets, dashboards, and charts into DataHub, along with the lineage relationships between them. This enables data teams to answer questions like "which dashboards will be affected if I change this database table?" or "where does this metric come from?"
The connector now consumes the canonical output of holistics aml lineage. That canonical output is a graph of AML-native concepts such as models, datasets, dashboards, viz blocks, source tables, and typed edges between them. The connector then maps the subset relevant to DataHub into DataHub entities.
How it works
The integration uses a git-based approach similar to DataHub's LookML connector. Since Holistics is fully as-code with all assets defined in AML files, the connector reads your AML project directly without needing API access.
Here's how the ingestion process works:
- Get your AML project - Either clone from a git repository or use a local directory
- Run the Holistics CLI - The
holistics aml lineagecommand compiles your AML files and outputs a canonical lineage graph - Parse and transform - The connector parses the canonical
nodesandedgesand maps Holistics entities to DataHub entities - Emit to DataHub - Metadata is pushed to your DataHub instance via the standard ingestion framework
This approach ensures that the metadata in DataHub always reflects what's defined in your AML code, while keeping the CLI output aligned to AML concepts instead of a DataHub-specific schema.
What gets synced
The connector maps Holistics concepts to DataHub entities as follows:
| Holistics Concept | AML File | DataHub Entity | Subtype |
|---|---|---|---|
| Model | .model.aml | Dataset | View |
| Dataset | .dataset.aml | Dataset | Explore |
| Dashboard | .page.aml | Dashboard | - |
| VizBlock (chart) | (within page) | Chart | - |
| Dimension | (field in model) | SchemaField | Tagged holistics:dimension |
| Measure | (field in model) | SchemaField | Tagged holistics:measure |
For each model, the connector extracts schema information including field names, types, descriptions, and whether fields are dimensions or measures. This metadata appears in DataHub's schema tab, helping users understand the semantic layer without leaving the data catalog.
Lineage
One of the most valuable aspects of this integration is automatic lineage extraction. The connector builds lineage at multiple levels:
- Dashboard to Charts - Each chart is linked to its parent dashboard
- Charts to Models - Charts reference the specific model fields they visualize
- Datasets to Models - Datasets are linked to all the models they include
- Models to Source Tables - Table models are connected to their underlying database tables
The canonical lineage graph may also contain additional AML concepts, such as non-viz dashboard blocks or filter-block lineage. The DataHub connector intentionally ignores concepts that do not map to current DataHub entities, rather than requiring the CLI to omit them.
This multi-level lineage enables powerful impact analysis. When someone wants to modify a database table, they can trace through DataHub to see exactly which Holistics models, datasets, and dashboards depend on it.
To establish lineage from Holistics models to your source database tables, you need to configure connection mapping. This tells the connector how to translate Holistics data source names (like bigquery_prod) to DataHub platform identifiers. See the setup guide for details.
Getting started
Ready to set up the integration? Head to the setup guide for step-by-step instructions on installing the connector and configuring your first ingestion.