Skip to main content

AQL in Action: Practical E-commerce Examples

Background

This guide demonstrates AQL (Analytics Query Language) capabilities through practical queries on a simplified e-commerce data model. AQL is Holistics' semantic query language that provides a high-level abstraction for analytics queries, compiling to optimized SQL across different database platforms.

These examples highlight various semantic modeling concepts using familiar business models like customers, orders, products, and merchants.

E-commerce Example Data Model

Our examples use a comprehensive e-commerce data model with the following models:

Database Schema

Models and Relationships in AQL

Models

The foundation of AQL is built on models, which represent business entities as tables in your data warehouse.

Core Models in Our Example:

  • users - based on the users table. Each row represents a unique customer.
  • orders - based on the orders table. Each row represents a unique order.
  • products - based on the products table. Each row represents a unique product.
  • merchants - based on the merchants table. Each row represents a unique merchant.
  • cities - based on the cities table. Each row represents a unique city.
  • countries - based on the countries table. Each row represents a unique country.
  • categories - based on the categories table. Each row represents a unique product category.
  • order_items - based on the order_items table. Each row represents a unique product within an order.

Relationships

Models become powerful when connected through relationships. These tell AQL how to join models and enable cross-model analysis without writing complex SQL.

Key Relationships in Our Model:

Each order item connects to both an order and a product. The order_itemsorders relationship (many:1) links line items to their parent order via order_items.order_id = orders.id. Similarly, order_itemsproducts (many:1) connects each line item to its product through order_items.product_id = products.id.

Example Overview

These examples demonstrate different concepts of using AQL for semantic modeling:

QueryDescriptionComplexityAQL Concept Demonstrated
Q01"Pricing Summary Report"BasicSlicing metrics by dimensions, reusable metrics, filters
Q02"Product Category Analysis"BasicAQL dimensions, dimension-based filtering
Q03"Cross-Model Revenue Calculations"IntermediateCombining models, cross-model aggregations
Q04"Filtered Metrics and Ratios"IntermediateFiltered metrics, ratio metrics, model combination
Q05"Customer Segmentation"AdvancedUsing dimensionalize for aggregated dimensions
Q06"Customer Distribution Analysis"AdvancedCreating distributions with dimensionalize
Q07"Level of Detail: Product Performance Analysis"AdvancedUsing of_all() for percent of total calculations
Q08"Nested Aggregation: Customer Engagement Trends"AdvancedMulti-level aggregations with group() and select()
Q09"Product Performance Rankings"AdvancedWindow functions: rank(), ntile()
Q10"Year-over-Year Revenue Comparison"AdvancedPeriod comparison with relative_period()

In Holistics, data exploration is typically done through the UI, which generates AQL queries in the explore { } format shown below.

For learning and brevity purposes, we'll show these raw explore blocks throughout this guide. However, keep in mind that you won't write these directly - you'll only define metrics and dimensions in your AML models, and the UI will generate the appropriate explore syntax for you.

Basic Level

These examples cover essential AQL fundamentals - working with single models, creating reusable metrics and dimensions, and applying filters.

Q01: "Pricing Summary Report"

Learn here:

  • How to slice and dice metrics by dimensions of a single table
  • How to make a reusable metric object
  • How to use filters

This question counts business metrics by the status of orders - what was delivered, cancelled, and refunded.

An explore in AQL is a collection of dimensions and metrics. By default, every metric is grouped by all the dimensions.

So, to implement this, we need to slice the data by the status of orders - the dimension orders.status.

Counting orders is the basic explore using the count() aggregate function:

explore {
dimensions {
orders.status
}

measures {
order_count: count(orders.id)
}
}

You can ad hoc count on any model using the Holistics UI.

The Q01 query requires several additional metrics. Let's add "Unique Users Count" and "Average number of orders per User".

We can build the explore to include those metrics directly:

explore {
dimensions {
orders.status
}

measures {
order_count: count(orders.id),
unique_users: count_distinct(orders.user_id),
orders_per_user: count(orders.id) / count_distinct(orders.user_id)
}
}

This computes them in an explore, but this does not make those metrics part of the semantic model. To add them to the semantic model, we would create them in the dataset definition.

Metrics are standalone aggregations. They can be defined once and reused anywhere.

We could create the unique_users and orders_per_user metrics in the dataset with their AQL definition, as:

// In ecommerce.dataset.aml
metric unique_users {
label: 'Unique Users'
type: 'number'
definition: @aql count_distinct(orders.user_id) ;;
}

metric orders_per_user {
label: 'Orders Per User'
type: 'number'
definition: @aql count(orders.id) / count_distinct(orders.user_id) ;;
}

Now the explore can look like:

explore {
dimensions {
orders.status
}

measures {
order_count: count(orders.id),
unique_users,
orders_per_user
}
}

But what if we want to see these metrics only on a subset of the data? Let's add filters:

explore {
dimensions {
orders.status
}

measures {
order_count: count(orders.id),
unique_users,
orders_per_user
}

filters {
orders.created_at <= @2024,
orders.status in ['delivered', 'cancelled', 'refunded']
}
}

This will not count any orders after January 1, 2024. The @2024 syntax is a time expression that represents the start of year 2024.

Q02: "Product Category Analysis"

Learn here:

  • How to build AQL dimensions
  • How to use AQL dimensions in filters

This question focuses on finding how many products are not in the "Electronics" category. Products outside of electronics might represent the diversity of the catalog.

This example demonstrates how to identify products in specific categories. Let's say we need to count non-electronics products for inventory planning or to analyze sales performance by category type.

The way to identify a non-electronics product in the data is to check if the product's category name doesn't contain "Electronics": categories.name not ilike '%Electronics%'.

It's easy to count non-electronics products in an explore:

explore {
measures {
product_count: count(products.id)
}

filters {
categories.name not ilike '%Electronics%'
}
}

But what if we want to use the logic of identifying non-electronics products elsewhere?

So far, all the dimensions we've used were columns in the data. An AQL dimension is like a "virtual column".

Let's codify this logic in a reusable way by creating an AQL dimension in the dataset called is_electronics:

// In ecommerce.dataset.aml
dimension is_electronics {
model: products
label: 'Is Electronics'
type: 'truefalse'
definition: @aql categories.name ilike '%Electronics%' ;;
}

Now we can filter out electronics products:

explore {
measures {
product_count: count(products.id)
}

filters {
products.is_electronics is not true
}
}

We can also use is_electronics as a dimension to group by and see the distribution:

explore {
dimensions {
products.is_electronics
}

measures {
product_count: count(products.id),
avg_price: avg(products.price)
}
}

If a calculation is defined in an explore, it is not reusable elsewhere. Define it as an AQL dimension or metric in the dataset to reuse it and manage changes.

Intermediate Level

These examples build on the basics by combining multiple models, creating filtered metrics, and working with ratios and cross-model calculations.

Q03: "Cross-Model Revenue Calculations"

Learn here:

  • How to combine different models
  • How to create metrics across multiple models
  • How to use cross-model aggregations

This question calculates revenue for unshipped orders by customer, demonstrating cross-model calculations.

Let's start with a basic explore showing orders and customers:

explore {
dimensions {
customer_name: users.full_name,
order_id: orders.id,
order_date: orders.created_at
}
}

Now we want to calculate revenue for each order. Since revenue comes from order_items.quantity × products.price, we need cross-model aggregation.

The challenge here is that calculating order revenue requires combining data from multiple models. In AQL, relationships between models are encoded ahead of time:

  • Each order_item has one order
  • Each order_item has one product

This means that from order_items, we can directly access columns from orders and products as if they were columns of order_items. However, from orders, we cannot directly access order_items columns without aggregation because one order can have many order_items.

In AQL, when you need to aggregate across multiple models, you must specify at which model to aggregate:

explore {
dimensions {
customer_name: users.full_name,
order_id: orders.id,
order_date: orders.created_at
}

measures {
order_revenue: sum(order_items, products.price * order_items.quantity)
}
}

Notice how sum(order_items, ...) tells AQL to aggregate at the order_items level. The expression products.price * order_items.quantity works because each order_item has exactly one product.

To make this reusable, let's define revenue metrics in the dataset:

// In ecommerce.dataset.aml
metric sum_revenue {
label: 'Sum Revenue'
type: 'number'
definition: @aql sum(order_items, products.price * order_items.quantity) ;;
}

metric avg_revenue {
label: 'Average Revenue'
type: 'number'
definition: @aql avg(order_items, products.price * order_items.quantity) ;;
}

Now we can use these metrics anywhere.

Let's add a filter for unshipped orders and sort by revenue:

explore {
dimensions {
customer_name: users.full_name,
order_id: orders.id,
order_date: orders.created_at
}

measures {
order_revenue: sum_revenue
}

filters {
orders.status != 'delivered'
}

sorts {
order_revenue desc
}
}

This demonstrates how cross-model references work in AQL - the relationships are pre-defined, and AQL automatically handles the joins when you aggregate across models.

Q04: "Filtered Metrics and Ratios"

Learn here:

  • How to create filtered metrics
  • How to build ratio metrics
  • How to combine different models in a single metric

This question calculates what percentage of revenue comes from products created during a promotional period.

The challenge is creating metrics that filter data in different ways and then building ratios between them.

The where() function allows you to create filtered versions of metrics. This is useful when you want to calculate metrics for a subset of data without modifying existing metrics.

First, let's define what we mean by "promotional products" - products created in a specific time period. Notice how we reuse the sum_revenue metric from Q03 and apply a filter to it:

// We can define this as a filtered metric in the dataset
metric promo_revenue {
label: 'Promotional Revenue'
type: 'number'
definition: @aql
sum_revenue
| where(products.created_at >= @2024-01 and products.created_at < @2024-02)
;;
}
Knowledge Checkpoint

The pipe operator (|) chains operations on expressions. In this example, sum_revenue | where(...) takes the result of sum_revenue and applies a filter condition to it. Learn more about pipe operators in AQL.

Now we can use these metrics in an explore:

explore {
dimensions {
orders.created_at | month()
}

measures {
promo_revenue,

total_revenue: sum_revenue,

promo_percentage: safe_divide(promo_revenue, sum_revenue) * 100
}
}

Q05: "Customer Segmentation"

Learn here:

  • How to use dimensionalize to create dimensions with aggregations
  • How to filter based on aggregated dimensions
  • How to identify customers or merchants meeting specific criteria

This question demonstrates how to create a dimension that identifies "high-priority" customers based on their order history and then use it for filtering.

We can create a boolean dimension that identifies VIP customers based on their lifetime value and order count:

dimension is_vip_customer {
model: users
label: 'Is VIP Customer'
type: 'truefalse'
definition: @aql
(sum_revenue > 10000 and count(orders.id) >= 5)
| dimensionalize(users.id)
;;
}

Now we can use this dimension in explores to analyze VIP versus non-VIP customers:

explore {
dimensions {
orders.status,
}

measures {
order_count: count(orders.id),
revenue: sum_revenue,
}

filters {
users.is_vip_customer is true
}
}

Q06: "Customer Distribution Analysis"

Learn here:

  • How to use dimensionalize to create bucketed dimensions
  • How to create distribution histograms
  • How to analyze customer behavior patterns

This question creates a distribution of customers by their order count - essentially a histogram of customer engagement.

First, let's create a dimension that groups customers by their order frequency:

dimension customer_order_frequency {
model: users
label: 'Customer Order Frequency'
type: 'text'
definition: @aql
case(
count(orders.id) >= 10, '10+ orders',
count(orders.id) >= 5, '5-9 orders',
count(orders.id) >= 2, '2-4 orders',
count(orders.id) = 1, '1 order',
'No orders'
)
| dimensionalize(users.id)
;;
}

Now we can analyze the distribution:

explore {
dimensions {
customer_order_frequency: users.customer_order_frequency
}

measures {
customer_count: count(users.id),
total_revenue: sum_revenue,
avg_revenue_per_customer: sum_revenue / count(users.id)
}

sorts {
customer_order_frequency desc
}
}

We can also create a more granular distribution using exact order counts:

dimension exact_order_count {
model: users
label: 'Exact Order Count'
type: 'number'
definition: @aql count(orders.id) | dimensionalize(users.id) ;;
}

// Use it to create a detailed histogram
explore {
dimensions {
users.exact_order_count
}

measures {
customer_count: count(users.id),
}

filters {
users.exact_order_count <= 20 // Focus on customers with 20 or fewer orders
}
}

This creates a frequency distribution showing the exact customer count for each order frequency, helping identify engagement patterns.

Advanced Level

These examples showcase powerful AQL capabilities including customer segmentation with dimensionalize, Level of Detail calculations with of_all(), and nested aggregations for complex analytical scenarios.

Q07: "Level of Detail: Product Performance Analysis"

Learn here:

  • How to use of_all() for Level of Detail calculations
  • How to calculate percent of total across different dimensions
  • How to create metrics that ignore certain grouping dimensions

This question demonstrates calculating each product's contribution to total sales, regardless of other dimensions in the analysis.

Level of Detail (LoD) calculations allow you to specify exactly which dimensions a metric should be grouped by, regardless of the dimensions present in your explore. The of_all() function is key to this - it tells AQL to ignore certain dimensions when calculating a metric.

Let's say we want to see each product's share of total revenue, broken down by merchant and time period. Without LoD, the "total" would be calculated within each merchant-period combination. With LoD, we can calculate the true global total.

First, let's create the base revenue metric and a global total using of_all():

// In ecommerce.dataset.aml
metric product_percent_of_total {
label: 'Product % of Total'
type: 'number'
definition: @aql safe_divide(sum_revenue * 100.0, sum_revenue | of_all(products)) ;;
}
Knowledge Checkpoint

of_all(products) tells AQL to calculate the total revenue across all products, ignoring dimensions in products. This creates a Level of Detail calculation where the total remains constant regardless of other dimensions in your analysis.

Now we can use these metrics in explores that show product performance across different cuts:

explore {
dimensions {
products.name
}

measures {
product_revenue,
product_percent_of_total,
}
}

Learn here:

  • How to use nested aggregations with group() and select()
  • How to create metrics that aggregate other aggregations
  • How to analyze trends in aggregated customer behavior

This question demonstrates calculating the average monthly customer acquisition rate by year - a classic nested aggregation scenario.

Nested aggregations allow you to perform calculations on already-aggregated data. For example, you might want to:

  1. First: Calculate monthly customer acquisition counts
  2. Then: Average those monthly counts by year

This requires two levels of aggregation, which AQL handles with the group() and select() pattern.

Let's build this step by step. First, we need a basic customer acquisition metric:

// In ecommerce.dataset.aml
metric new_customers {
label: 'New Customers'
type: 'number'
definition: @aql count(users.id) ;;
}

Now we can create a nested aggregation that first groups by month, then averages by year:

metric avg_monthly_acquisition {
label: 'Average Monthly Acquisition'
type: 'number'
definition: @aql
users
| group(users.created_at | month())
| select(new_customers)
| average()
;;
}

Let's break down this nested aggregation:

  1. users - Start from the users model
  2. | group(users.created_at | month()) - Group the data by month
  3. | select(new_customers) - Calculate the new_customers metric for each month
  4. | average() - Average those monthly values

Notice that the avg_monthly_acquisition metric contains no year-specific logic - it simply calculates the average of monthly values. This makes it highly reusable! When used in an explore with a year dimension, this will show the average monthly acquisition rate for each year:

explore {
dimensions {
users.created_at | year()
}

measures {
avg_monthly_acquisition
}
}

But we could just as easily use it with other dimensions like country or merchant:

explore {
dimensions {
countries.name
}

measures {
avg_monthly_acquisition // Same metric, now showing avg monthly acquisition by country
}
}

Q09: "Product Performance Rankings"

Learn here:

  • How to use rank() to identify top performers
  • How to use ntile() to create performance quartiles
  • How rankings work with dimensionalize for reusable dimensions

This question demonstrates using ranking functions to identify top-performing products and create performance tiers.

When creating ranking dimensions based on metrics, we need to use dimensionalize to ensure the calculation happens at the right level of detail:

// In ecommerce.dataset.aml
dimension product_rank_by_revenue {
model: products
label: 'Product Rank by Revenue'
type: 'number'
definition: @aql
rank(order: sum_revenue | desc())
| dimensionalize(products.id)
;;
}

dimension product_rank_in_category {
model: products
label: 'Product Rank in Category'
type: 'number'
definition: @aql
rank(
order: sum_revenue | desc(),
partition: categories.name
)
| dimensionalize(products.id)
;;
}
Knowledge Checkpoint

dimensionalize() is crucial when using aggregated metrics in dimensions. It ensures the calculation is performed at the specified grain (products.id) before being used as a dimension.

Creating performance quartiles helps segment products into tiers:

dimension product_performance_tier {
model: products
label: 'Product Performance Tier'
type: 'text'
definition: @aql
case(
when: ntile(4, order: sum_revenue | desc()) = 1, then: 'Top 25%',
when: ntile(4, order: sum_revenue | desc()) = 2, then: '50-75%',
when: ntile(4, order: sum_revenue | desc()) = 3, then: '25-50%',
else: 'Bottom 25%'
)
| dimensionalize(products.id)
;;
}

These ranking dimensions enable powerful analyses:

// Find top 5 products in each category
explore {
dimensions {
categories.name,
products.name
}

measures {
revenue: sum_revenue
}

filters {
products.product_rank_in_category <= 5
}

sorts {
categories.name asc,
products.product_rank_in_category asc
}
}

// Analyze performance distribution
explore {
dimensions {
products.product_performance_tier
}

measures {
product_count: count(products.id),
total_revenue: sum_revenue,
avg_revenue_per_product: sum_revenue / count(products.id)
}
}

Rankings can also be calculated ad-hoc in explores without pre-defining dimensions:

// Dynamic ranking within current context
explore {
dimensions {
merchants.merchant_name,
products.name
}

measures {
revenue: sum_revenue,
rank_within_merchant: rank(order: sum_revenue | desc())
}
}

Q10: "Year-over-Year Revenue Comparison"

Learn here:

  • How to use relative_period() for time-based comparisons
  • How to create reusable period comparison metrics
  • How to calculate percent changes over time

This question demonstrates creating flexible period comparison metrics that adapt to different time contexts.

Period comparisons are essential for understanding business trends. The key is creating metrics that are relative rather than fixed to specific time periods, making them highly reusable.

Let's create year-over-year comparison metrics:

// In ecommerce.dataset.aml
metric revenue_previous_year {
label: 'Revenue Previous Year'
type: 'number'
definition: @aql
sum_revenue
| relative_period(orders.created_at, interval(-1 year))
;;
}

metric revenue_yoy_change {
label: 'Revenue YoY Change ($)'
type: 'number'
definition: @aql sum_revenue - revenue_previous_year ;;
}

metric revenue_yoy_percent_change {
label: 'Revenue YoY % Change'
type: 'number'
definition: @aql
safe_divide(
(sum_revenue - revenue_previous_year) * 100.0,
revenue_previous_year
)
;;
}
Knowledge Checkpoint

relative_period() shifts the time context relatively. If you're looking at March 2024, it will compare to March 2023. This makes the metric work at any time granularity - yearly, quarterly, monthly, or even daily.

These metrics automatically adapt to your exploration context:

// Monthly year-over-year comparison
explore {
dimensions {
orders.created_at | month()
}

measures {
current_revenue: sum_revenue,
previous_year_revenue: revenue_previous_year,
yoy_percent_change: revenue_yoy_percent_change
}

filters {
orders.created_at matches @2024
}
}

// Same metrics work for quarterly analysis
explore {
dimensions {
orders.created_at | quarter(),
categories.name
}

measures {
current_revenue: sum_revenue,
yoy_growth: revenue_yoy_percent_change
}
}

You can create more sophisticated comparisons:

// Compare to different periods
metric revenue_previous_month {
label: 'Revenue Previous Month'
type: 'number'
definition: @aql
sum_revenue
| relative_period(orders.created_at, interval(-1 month))
;;
}

metric revenue_2_years_ago {
label: 'Revenue 2 Years Ago'
type: 'number'
definition: @aql
sum_revenue
| relative_period(orders.created_at, interval(-2 years))
;;
}

// Compound growth rate
metric revenue_2y_cagr {
label: '2-Year CAGR'
type: 'number'
definition: @aql
(pow(sum_revenue / revenue_2_years_ago, 0.5) - 1) * 100
;;
}

These comparison metrics work seamlessly with other dimensions:

// YoY comparison by merchant
explore {
dimensions {
merchants.merchant_name,
orders.created_at | year()
}

measures {
revenue: sum_revenue,
yoy_growth: revenue_yoy_percent_change
}

filters {
orders.created_at >= @2023
}

sorts {
merchants.merchant_name asc,
orders.created_at | year() desc
}
}

The beauty of relative period calculations is their flexibility. The same metric definitions work whether you're analyzing daily, monthly, or yearly trends, and across any dimension combination.


Let us know what you think about this document :)