Skip to main content

Grouping and Aggregation

Knowledge Checkpoint

A grasp of these concepts will help you understand this documentation better:

As you have learned in Cross-Model Calculation, AQL allows you to access and operate on data from multiple models in a single query.

It works well when you only need to a single row from other models for your calculation. But what if your calculation requires multiple rows from other models or even the current model? This is where grouping and aggregation come in.

Grouping

Grouping in AQL is done with the group function. By itself, it doesn't do much, other than return a table with all the unique values of the dimension of your choice.

order_items | group(order_items.product_id)

Obviously, this is not very useful. But its true power comes when you combine it with other functions.

Accessing multiple rows from other models

The special thing about the group function is that when you iterate over its rows, reference to any model will return all rows related to the current row, instead of just one row.

Take for example, this query will return a table with multiple order_items in a single row. Each row will contain all order_items that belongs to the same product_id.

order_items
| group(order_items.product_id)
| select(order_items.product_id, order_items)
This is for illustration purpose only. In practice, we don't allow you to do this since we don't know how to display this in a meaningful way yet.

And since it returns multiple rows, you can use aggregator functions to aggregate them into a single scalar value.

This applies to all models, as long as they are related to the current model. For example, you can also get all order_items that belong to the current product in the same way, starting from the products model.

Count the number of orders for each product
products 
| group(products.id, products.name)
| select(products.name, count(order_items.order_id))

Aside: How is the query generated?

For the curious readers, who want to know how AQL generates the query for "accessing multiple row", here is a simplified version of the algorithm for getting all order_items that belongs to each product.id:

  1. AQL first generates a CTE to get all the unique values of the id column in the products model.

     WITH group_headers AS (
    SELECT DISTINCT products.id AS "product->id"
    FROM products
    )
  2. Then it generates a CTE to get all the order_items that belongs to each product_id in the grouping_headers CTE.

     grouped_rows AS (
    SELECT
    products.id AS "product->id"
    -- let's just assume this is how AQL merge multiple rows
    , json_agg(order_items) AS "order_items"

    -- if you use an aggregator function, it will be pre-aggregated here like this
    --, sum(order_items.quantity) AS "sum_order_items->quantity"
    FROM order_items
    LEFT JOIN products ON order_items.product_id = products.id
    GROUP BY 1
    )
  3. Finally, it uses those CTEs to get the final result.

     WITH group_headers AS ( ... ),
    grouped_rows AS ( ... )
    SELECT
    group_headers."product->id" AS "product->id"
    , grouped_rows."order_items" AS "order_items"
    FROM group_headers
    LEFT JOIN grouped_rows ON group_headers."product->id" = grouped_rows."product->id"

The final results will look something like this:

grouping query result

Note that, this is just a simplified version of the algorithm. In practice, AQL will use some heuristics to optimize the query, like eliminating unncecessary CTEs, or pre-aggreate the rows if one of the aggregator functions is used. For example, if you start from the order_items model and has aggregation the query will be as simple as this:

SELECT
products.id AS "product->id"
, sum(order_items.quantity) AS "sum_order_items->quantity"
FROM order_items
LEFT JOIN products ON order_items.product_id = products.id
GROUP BY 1

Let us know what you think about this document :)