Grouping and Aggregation
A grasp of these concepts will help you understand this documentation better:
As you have learned in Cross-Model Calculation, AQL allows you to access and operate on data from multiple models in a single query.
It works well when you only need to a single row from other models for your calculation. But what if your calculation requires multiple rows from other models or even the current model? This is where grouping and aggregation come in.
Grouping
Grouping in AQL is done with the group function. By itself, it doesn't do much, other than return a table with all the unique values of the dimension of your choice.
order_items | group(order_items.product_id)
Obviously, this is not very useful. But its true power comes when you combine it with other functions.
Accessing multiple rows from other models
The special thing about the group
function is that when you iterate over its rows, reference to any model will return all rows related to the current row, instead of just one row.
Take for example, this query will return a table with multiple order_items
in a single row. Each row will contain all order_items
that belongs to the same product_id
.
order_items
| group(order_items.product_id)
| select(order_items.product_id, order_items)
And since it returns multiple rows, you can use aggregator functions to aggregate them into a single scalar value.
This applies to all models, as long as they are related to the current model. For example, you can also get all order_items
that belong to the current product in the same way, starting from the products
model.
products
| group(products.id, products.name)
| select(products.name, count(order_items.order_id))
- Result
- SQL
WITH "aql__t1" AS (
SELECT
"products"."id" AS "products->id",
"products"."name" AS "products->name"
FROM
"demo"."products" "products"
GROUP BY
1,
2
), "aql__t2" AS (
SELECT
"products"."id" AS "products->id",
"products"."name" AS "products->name",
COUNT("order_items"."order_id") AS "count_order_items->order_id"
FROM
"demo"."order_items" "order_items"
LEFT JOIN "demo"."products" "products" ON "order_items"."product_id" = "products"."id"
GROUP BY
1,
2
)
SELECT
"aql__t1"."products->name" AS "products->name",
"aql__t2"."count_order_items->order_id" AS "count_order_items->order_id"
FROM
"aql__t1"
LEFT JOIN "aql__t2" ON ("aql__t1"."products->id" = "aql__t2"."products->id" OR ("aql__t1"."products->id" IS NULL AND "aql__t2"."products->id" IS NULL)) AND
("aql__t1"."products->name" = "aql__t2"."products->name" OR ("aql__t1"."products->name" IS NULL AND "aql__t2"."products->name" IS NULL))
Aside: How is the query generated?
For the curious readers, who want to know how AQL generates the query for "accessing multiple row", here is a simplified version of the algorithm for getting all order_items
that belongs to each product.id
:
AQL first generates a CTE to get all the unique values of the
id
column in theproducts
model.WITH group_headers AS (
SELECT DISTINCT products.id AS "product->id"
FROM products
)Then it generates a CTE to get all the
order_items
that belongs to eachproduct_id
in thegrouping_headers
CTE.grouped_rows AS (
SELECT
products.id AS "product->id"
-- let's just assume this is how AQL merge multiple rows
, json_agg(order_items) AS "order_items"
-- if you use an aggregator function, it will be pre-aggregated here like this
--, sum(order_items.quantity) AS "sum_order_items->quantity"
FROM order_items
LEFT JOIN products ON order_items.product_id = products.id
GROUP BY 1
)Finally, it uses those CTEs to get the final result.
WITH group_headers AS ( ... ),
grouped_rows AS ( ... )
SELECT
group_headers."product->id" AS "product->id"
, grouped_rows."order_items" AS "order_items"
FROM group_headers
LEFT JOIN grouped_rows ON group_headers."product->id" = grouped_rows."product->id"
The final results will look something like this:
Note that, this is just a simplified version of the algorithm. In practice, AQL will use some heuristics to optimize the query, like eliminating unncecessary CTEs, or pre-aggreate the rows if one of the aggregator functions is used. For example, if you start from the order_items
model and has aggregation the query will be as simple as this:
SELECT
products.id AS "product->id"
, sum(order_items.quantity) AS "sum_order_items->quantity"
FROM order_items
LEFT JOIN products ON order_items.product_id = products.id
GROUP BY 1