Skip to main content

AML Dataset

Knowledge Checkpoint

A grasp of these concepts will help you understand this documentation better:

Introduction

In Holistics, datasets are defined in .dataset.aml files. The full dataset file name has the form dataset_name.dataset.aml. The dataset definition typically contains the following information:

  • Dataset metadata: dataset labels, descriptions, owners
  • Data Source reference: users' exploration activities will use this source
  • Data models included
  • Relationships
  • Metrics
  • Dataset view definition

The following section will list all current dataset parameters.

Parameter definition

Parameter nameDescription
datasetSpecify the dataset's unique name in the workspace
labelSpecifies how the dataset will appear in the Ready-to-explore Dataset
descriptionAdd dataset description
ownerDefine who should be in charge of managing the current dataset
data_source_nameSpecify the database that Holistics will execute the generated query against (in the dataset)
relationshipsSpecify relationship and their configuration among added models
modelsSpecify which models will be used in the dataset
viewDefine how models and fields are displayed in Preview / Dataset Exploration
dimensionDefine cross-model dimensions in the dataset
metricDefine metrics to be used in the dataset
contextConfigure analysis interactions for the dataset, including breakdown dimension lists and underlying data views
settingsConfigure dataset-level settings, such as enabling or disabling analysis interactions
pre_aggregatesDefine pre-aggregated tables for Aggregate Awareness, with built-in or external persistence
permissionDefine row-level permission rules to filter data based on user attributes. (Coming soon)

Dataset syntax examples

Core: metadata, models, and relationships

Every dataset starts with metadata and declares which models and relationships to include:

Dataset ecommerce {
label: '[Demo] Ecommerce'
description: 'Demo dataset for E-commerce use cases'
owner: '[email protected]'
data_source_name: 'demodb'

models: [
ecommerce_orders,
ecommerce_order_items,
ecommerce_users,
ecommerce_products,
ecommerce_categories
]

relationships: [
relationship(ecommerce_orders.user_id > ecommerce_users.id, true),
relationship(ecommerce_order_items.order_id > ecommerce_orders.id, true),
relationship(ecommerce_order_items.product_id > ecommerce_products.id, true),
relationship(ecommerce_products.category_id > ecommerce_categories.id, true)
]
}

Dimensions and metrics

You can define cross-model dimensions and metrics directly in the dataset using AQL expressions. For full details, see Dataset Fields.

Dataset ecommerce {
// ... models and relationships omitted

// Cross-model dimension
dimension full_name {
model: ecommerce_users
type: 'text'
label: 'Full Name'
definition: @aql concat(ecommerce_users.first_name, ' ', ecommerce_users.last_name);;
}

// Simple aggregation
metric count_orders {
label: 'Count Orders'
type: 'number'
definition: @aql count(ecommerce_orders.id) ;;
}

// Cross-model aggregation
metric sum_order_value {
label: 'Sum Order Values'
type: 'number'
definition: @aql sum(ecommerce_order_items, ecommerce_order_items.quantity * ecommerce_products.price) ;;
}

// Derived metric referencing other metrics
metric average_order_value {
label: 'Average Order Value'
type: 'number'
definition: @aql sum_order_value / count_orders;;
}
}

View, context, and settings

Use view to organize how models and fields appear in the exploration UI. Use context to configure drill-down and break-down dimension lists and underlying data views. Use settings to enable or disable analysis interactions.

Dataset ecommerce {
// ... models, relationships, dimensions, metrics omitted

// Organize the exploration UI
view {
model ecommerce_orders { }
model ecommerce_users { }

group relevant_models {
model ecommerce_products { }
model ecommerce_categories { }
}

group business_metrics {
metric sum_order_value
metric average_order_value
}
}

// Configure analysis interactions
context {
analysis {
// Breakdown dimension groups for drill-down
breakdown {
group location {
label: 'Locations'
fields: [
r(ecommerce_users.country),
r(ecommerce_users.city),
]
}

group product {
label: 'Products'
fields: [
r(ecommerce_products.category),
r(ecommerce_products.name),
]
}
}

// Underlying data views
underlying_data {
metric count_orders {
view list_of_orders {
label: 'List of Orders'
fields: [
r(ecommerce_orders.id),
r(ecommerce_orders.created_date),
r(ecommerce_orders.status),
r(ecommerce_users.full_name),
]
}
}
}
}
}

// Dataset-level settings
settings {
analysis_interactions {
breakdown {
enabled: true
}
view_underlying_data {
enabled: true
}
}
}

// Pre-aggregated table for Aggregate Awareness
pre_aggregates: {
agg_orders: PreAggregate {
dimension created_at_day {
for: r(ecommerce_orders.created_at)
time_granularity: "day"
}
dimension status {
for: r(ecommerce_orders.status)
}

measure count_orders {
for: r(ecommerce_orders.id)
aggregation_type: 'count'
}

persistence: FullPersistence {
schema: 'persisted'
}
}
}

// Row-level permission (coming soon)
permission regional_access {
field: r(ecommerce_orders.region)
operator: 'matches_user_attribute'
value: 'region'
}
}

Open Markdown
Let us know what you think about this document :)