Skip to main content

AQL Rules and Best Practices

This guide outlines essential rules and best practices for writing AQL expressions in Holistics. Following these rules ensures your semantic layer is performant, maintainable, and produces accurate results.

Model Rules

Models should have a primary key

While not strictly required, defining primary keys is strongly recommended. Primary keys enable AQL to optimize queries by eliminating unnecessary GROUP BY clauses and support complex WHERE conditions. Without a primary key, Holistics cannot determine the grain of your data, which may lead to less efficient queries and potential fan-out issues.

// ✅ Good
Model orders {
dimension id {
type: 'number'
primary_key: true
}
}

// ❌ Bad - no primary key defined
Model orders {
dimension id { type: 'number' }
}

Dimension types must match underlying data

Mismatched data types can cause runtime errors and incorrect results. Always ensure your AQL dimension types align with the actual database column types. Holistics does not perform automatic type casting because it impacts performance and consistent casting between incompatible types (like text to datetime) is not always possible.

// ✅ Good
dimension created_at {
type: 'datetime' // matches timestamp column
}

// ❌ Bad
dimension created_at {
type: 'text' // timestamp column as text
}

Relationship Rules

Relationships must match the actual data cardinality

Incorrectly defined relationships (e.g., marking a many-to-many as many-to-one) will produce wrong results. Always verify your relationships against the actual data.

// ✅ Good - verified many-to-one relationship
relationship(orders.user_id > users.id, true)

// ❌ Bad - incorrect cardinality (actually many-to-many)
relationship(users.id > products.id, true)

Keys must exist in both models

Both the foreign key and primary key referenced in a relationship must exist in their respective models. Missing keys will cause validation errors.

// ✅ Good - both keys exist
Model orders {
dimension user_id { type: 'number' }
}
Model users {
dimension id { type: 'number', primary_key: true }
}
relationship(orders.user_id > users.id, true)

// ❌ Bad - customer_id doesn't exist in orders
relationship(orders.customer_id > users.id, true)

The "one" side of a relationship must have unique values

For fan-out prevention to work correctly, the key on the "one" side of a many-to-one relationship must be unique. Non-unique keys will cause incorrect aggregations.

// ✅ Good - cities.id is primary key (guaranteed unique)
relationship(users.city_id > cities.id, true)

// ❌ Bad - cities.name is not unique (many cities share the same name)
relationship(users.city_name > cities.name, true)

Avoid ambiguous relationship paths

Multiple paths between models create ambiguity that can lead to incorrect joins. Design your relationships to have clear, unambiguous paths. See Common Relationship Problems for detailed examples.

// ❌ Bad - circular reference creates ambiguity
relationship(orders.user_id > users.id, true)
relationship(users.current_order_id > orders.id, true)

// ✅ Good - clear hierarchy
relationship(orders.user_id > users.id, true)
relationship(order_items.order_id > orders.id, true)

Cross-model dimensions require valid relationship path

You can only reference fields from models that have a valid many-to-one or one-to-one path. Attempting to reference unrelated models will fail.

// ✅ Good - valid path through relationships
Model order_items {
dimension product_name {
definition: @aql products.name ;; // order_items -> products relationship exists
}
}

// ❌ Bad - no relationship path
Model order_items {
dimension user_name {
definition: @aql users.name ;; // no direct relationship
}
}

Dimension Expression Rules

Dimensions can only reference fields from accessible models

A dimension can reference its own model's fields and fields from models connected via many-to-one relationships. This includes transitive relationships where you can access fields through a chain of many-to-one relationships.

// ✅ Good - direct relationship
dimension customer_name {
model: orders
definition: @aql users.name ;; // each order has one user
}

// ✅ Good - transitive relationship
dimension customer_city {
model: orders
definition: @aql cities.name ;; // each order has one user, each user has one city
}

// ❌ Bad - trying to access many side
dimension user_orders {
model: users
definition: @aql orders.id ;; // each user has many orders - cannot access
}

// ❌ Bad - no relationship path
dimension product_category {
model: users
definition: @aql products.category ;; // users and products are not related
}

Use dimensionalize() when accessing the many side

To access data from the "many" side of a relationship in a dimension, you must use aggregation with dimensionalize(). This converts the aggregated metric into a dimension that can be used for grouping and filtering.

// ✅ Good - using dimensionalize to access many side
dimension total_spent {
model: users
definition: @aql sum(orders.amount) | dimensionalize(users.id) ;;
}

// ✅ Good - counting related records
dimension order_count {
model: users
definition: @aql count(orders.id) | dimensionalize(users.id) ;;
}

// ❌ Bad - trying to access many side directly
dimension last_order_amount {
model: users
definition: @aql orders.amount ;; // Error: cannot access many side
}

Use dimensionalize() only once at the top level

The dimensionalize() function should only be used once per dimension and at the top level of the expression. Nesting it within other expressions or mixing different grains can lead to unexpected behavior.

// ✅ Good - single dimensionalize at top level
dimension customer_lifetime_value {
model: users
definition: @aql sum(orders.amount) | dimensionalize(users.id) ;;
}

// ❌ Bad - nested dimensionalize
dimension complex_calc {
model: users
definition: @aql
sum(orders.amount) | dimensionalize(users.id) +
count(reviews.id) | dimensionalize(users.id)
;;
}

// ❌ Bad - dimensionalize not at top level
dimension avg_order_by_category {
model: categories
definition: @aql
avg(sum(orders.amount) | dimensionalize(users.id))
;;
}

Window functions in dimensions operate on full table

Window functions in dimensions always run on the complete underlying table before any filters are applied. This is important for performance considerations and understanding your results.

// Understanding the scope
dimension revenue_rank {
model: orders
definition: @aql rank(order: orders.amount | desc()) ;;
// Note: This ranks ALL orders by amount, not just filtered ones
// If you filter to show only 2024 orders, you might see ranks like 145, 203, 567
// because the ranking was done across all orders in the table
}

dimension user_age_rank {
model: users
definition: @aql rank(order: users.age | desc()) ;;
// This ranks ALL users by age, even if you later filter to a specific country
// Filtered results might show non-consecutive ranks
}

To limit window functions to specific filters, you have two approaches:

// Approach 1: Use dimensionalize with metric and where
dimension revenue_rank_2024 {
model: orders
definition: @aql
rank(order: min(orders.amount) | where(orders.created_at >= @2024) | desc())
| dimensionalize(orders.id, orders.amount)
;;
// This moves calculation to a context with unique orders.amount values
}

// Approach 2: Use case() to control ranking
dimension revenue_rank_2024_only {
model: orders
definition: @aql
case(
when: orders.created_at >= @2024,
then: rank(order: case(
when: orders.created_at >= @2024,
then: orders.amount,
else: 999999999 // Push non-2024 orders to bottom
) | desc()),
else: null
)
;;
// Returns rank for 2024 orders only, null for others
}

Metric Expression Rules

Avoid using dimensions without aggregation

While technically possible to use dimensions without aggregation in metrics, this creates fragile metrics that require the dimension to be included in the end-user's explore to avoid errors. This can cause confusion for end users who typically expect metrics to aggregate data.

// ✅ Good
metric total_revenue {
definition: @aql sum(orders.amount) ;;
}

// ❌ Bad - dimension without aggregation
metric total_revenue {
definition: @aql orders.amount ;;
}

Only introduce grouping when absolutely necessary

Metrics should be flexible and adapt to any grouping chosen by end users in their explorations. Avoid hard-coding time periods or dimensions into metric names and definitions. Only use grouping within metrics for nested aggregations where the grouping is essential to the calculation itself.

// ❌ Bad - returns a table instead of a scalar value
metric monthly_sales {
definition: @aql
orders
| group(orders.created_at | month())
| select(sum(orders.amount))
;; // This returns a table with monthly values, not a single number!
}

// ✅ Good - let users choose the time period
metric sales {
definition: @aql sum(orders.amount) ;;
}

// ✅ Good - grouping is necessary for the calculation, returns a scalar
metric avg_monthly_sales {
definition: @aql
orders
| group(orders.created_at | month())
| select(sum(orders.amount))
| avg() // Returns single number: average of monthly totals
;;
}

Source table must be specified for cross-model calculations

When combining fields from multiple models in a calculation, AQL cannot infer which table should be the base for aggregation. Explicitly specifying the source table ensures correct results.

// ✅ Good - explicit source table
metric revenue {
definition: @aql sum(order_items, order_items.quantity * products.price) ;;
}

// ❌ Bad - ambiguous source
metric revenue {
definition: @aql sum(order_items.quantity * products.price) ;;
}

When using group(), always follow with select() or filter()

While aggregate functions can follow group() directly, this triggers a warning (WARN-300) and is less readable. Always use select() or filter() after group() to maintain clear, grouped table structures.

// ✅ Good - using select() after group()
metric avg_items_per_order {
definition: @aql
orders
| group(orders.id)
| select(item_count: count(order_items.id))
| avg(item_count)
;;
}

// ⚠️ Works but triggers WARN-300
metric avg_items_per_order {
definition: @aql
orders
| group(orders.id)
| avg(count(order_items.id))
;;
}

// ❌ Bad - trying to nest aggregations directly
metric avg_items_per_order {
definition: @aql avg(count(order_items.id)) ;;
}

Level of Detail functions must reference valid dimensions

The dimensions specified in of_all() must exist and be accessible in the current context.

// ✅ Good
metric percent_of_total {
definition: @aql
safe_divide(revenue * 100, revenue | of_all(products))
;;
}

// ❌ Bad - referencing non-existent dimension
metric percent_of_total {
definition: @aql
safe_divide(revenue * 100, revenue | of_all(product))
;;
}

Best Practices

Performance Best Practices

Use PreAggregates for frequently accessed metrics

PreAggregates significantly improve query performance by pre-computing and storing aggregated data. They're especially valuable for metrics that are accessed frequently or involve complex calculations across large datasets. For implementation details and best practices, see Aggregate Awareness and Building Multiple PreAggregates.

Maintainability Best Practices

Document complex logic

Add descriptions to explain business logic, especially for complex calculations.

// ✅ Good
metric customer_churn_rate {
label: 'Customer Churn Rate'
description: 'Percentage of customers who cancelled in the period'
definition: @aql
safe_divide(
count(users.id) | where(users.status == 'churned') * 100,
count(users.id) | where(users.created_at < first(users.period_param))
)
;;
}

Reuse metrics instead of duplicating logic

Build complex metrics from simpler ones to maintain consistency and reduce errors.

// ✅ Good - reusing base metrics
metric revenue {
definition: @aql sum(order_items.amount) ;;
}

metric revenue_last_month {
definition: @aql revenue | relative_period(orders.created_at, interval(-1 month)) ;;
}

// ✅ Good - reusing base metrics
metric revenue_vs_last_month {
definition: @aql revenue - revenue_last_month ;;
}

// ❌ Bad - duplicating logic
metric revenue_vs_last_month {
definition: @aql
sum(order_items.amount)
- (sum(order_items.amount) | relative_period(orders.created_at, interval(-1 month)))
;;
}

Data Quality Best Practices

Handle null values appropriately

Use safe functions to prevent division by zero and handle nulls gracefully. Important difference: Unlike SQL, AQL returns null (not 0) when no data matches a filter. Use coalesce to provide default values when needed.

// ✅ Good - handles division by zero
metric average_order_value {
definition: @aql safe_divide(revenue, order_count) ;;
}

// ✅ Good - handles null from filters
metric completed_orders {
definition: @aql
coalesce(
count(orders.id) | where(orders.status == 'completed'),
0
)
;;
}

// ❌ Bad - division by zero risk
metric average_order_value {
definition: @aql revenue / order_count ;;
}

// ❌ Bad - returns null when no completed orders exist
metric completed_orders {
definition: @aql count(orders.id) | where(orders.status == 'completed') ;;
}

Common Pitfalls to Avoid

Choosing the wrong source table for aggregation

In most cases, AQL can automatically infer the correct source table from your expression, so you don't need to specify it explicitly. However, when you DO specify a source table, make sure it matches what you're trying to aggregate.

// ✅ Good - let AQL infer the source table
metric order_count {
definition: @aql count(orders.id) ;;
// AQL knows to use 'orders' table
}

// ✅ Good - let AQL infer for calculations too
metric revenue {
definition: @aql sum(order_items.quantity * order_items.price) ;;
// AQL knows to use 'order_items' table
}

// ❌ Bad - explicitly choosing the wrong source table
metric order_count {
definition: @aql count(order_items, orders.id) ;;
// This counts order_items rows, not orders!
// If an order has 5 items, it gets counted 5 times
}

// ✅ Good - explicit source table when mixing models
metric revenue {
definition: @aql sum(order_items, order_items.quantity * products.price) ;;
// Explicit source needed when fields come from different models
}

The rule of thumb: Let AQL infer the source table unless you're mixing fields from different models. When you do specify it explicitly, ensure it matches the grain of what you're measuring.


Let us know what you think about this document :)