Cross-Model Calculation
A grasp of these concepts will help you understand this documentation better:
AQL supports cross-model calculation, which allows you to access and operate on data from multiple models in a single query. This feature comes in handy when you need data from multiple models to perform a calculation or to create a new metric.
Semantic Data Modeling Layer
If you are familiar with SQL, you can think of cross-model operation as a JOIN
operation. However, unlike SQL, AQL operates on a semantic Data Modeling Layer which encodes the relationship between data ahead of time. This allows AQL to perform cross-model calculation without needing you to write a JOIN
statement every time you need to access data from multiple models.
At its core, this data modeling layer is an abstraction of your business logic. Each model represents a business entity, and the relationship between them represents how these entities are related to each other. For example:
- The
order_items
model represents the items in an order, and each item belongs to exactly one order, and contains exactly one product. - The
orders
model represents the orders, and each order belongs to exactly one customer. - The
products
model represents the products, and each product belongs to exactly one category, and belongs to exactly one merchant.
Cross-Model Reference
AQL leverages the knowledge about your models to allow you to reference columns from multiple models in a single query. For example, when you're writing a query that operates on the order_items
model, you can reference data from the orders
model and the products
model directly.
Here AQL knows that the current order_items
row related to exactly one products
row, and was able to include that row in your calculation.
This is not limited to just one level of relationship. You can even reference columns from models that are 2 or more levels away from the source model, as long as the transitive relationship between your source model and the target is still many-to-one or one-to-one.
Since one order_items
includes one products
, and products
belongs to one merchants
, transitively speaking, one order_items
also belongs to exactly one merchants
. Therefore, you can reference columns from the merchants
model directly in your query if you're operating on the order_items
model.
And in this case, since order_items
is the root (facts table) of your Data Modeling Layer, you can reference columns from any model in your Data Modeling Layer.
Accessing models on the many-side of a relationship
Up until now, we have only talked about referencing columns from models on the one-side of a relationship. But what if you want to reference columns from models on the many-side of a relationship?
The short answer is: you can't. This is because the many-side of a relationship is not a single row, but a collection of rows. Therefore, it is not possible to do meaningful calculations on the many-side of a relationship. E.g. What does it mean to multiply a orders.discount
with a collection of order_items.value
?
Here since we are operating on the orders
model, we cannot reference columns from the order_items
model directly.
The long answer is: you can, but you have to aggregate the many-side of the relationship first. This is done by using the sum function (or other aggregator functions) to aggregate the many-side of the relationship into a single value. E.g. sum(order_items.value)
will return the total value of all order_items
in the current orders
row.
orders
| group(orders.id, orders.discount)
| select(value: (1 - orders.discount) * sum(order_items.value))
| sum()
- Result
- SQL
WITH "aql__t1" AS (
SELECT
"orders"."id" AS "orders->id",
"orders"."discount" AS "orders->discount"
FROM
"demo"."orders" "orders"
GROUP BY
1,
2
), "aql__t3" AS (
SELECT
("products"."price" * "order_items"."quantity") AS "value",
"order_items"."order_id" AS "order_id"
FROM
"demo"."order_items" "order_items"
LEFT JOIN "demo"."products" "products" ON "order_items"."product_id" = "products"."id"
), "aql__t2" AS (
SELECT
"orders"."id" AS "orders->id",
"orders"."discount" AS "orders->discount",
SUM("order_items"."value") AS "sum_order_items->value"
FROM
"aql__t3" "order_items"
LEFT JOIN "demo"."orders" "orders" ON "order_items"."order_id" = "orders"."id"
GROUP BY
1,
2
)
SELECT
SUM(((1 - "aql__t1"."orders->discount") * "aql__t2"."sum_order_items->value")) AS "sum_value"
FROM
"aql__t1"
LEFT JOIN "aql__t2" ON ("aql__t1"."orders->discount" = "aql__t2"."orders->discount" OR ("aql__t1"."orders->discount" IS NULL AND "aql__t2"."orders->discount" IS NULL)) AND
("aql__t1"."orders->id" = "aql__t2"."orders->id" OR ("aql__t1"."orders->id" IS NULL AND "aql__t2"."orders->id" IS NULL))
Aside: How is the JOIN
generated?
For the curious readers, who want to know how AQL generates the JOIN
statement behide the scene, here is a simplified version of the algorithm:
- For each referenced model, AQL will find the shortest path from the source model to the target model.
- If this path exists and do not cause a fan-out (going from the one-side of a relationship to the many-side of a relationship) then AQL will generate a
JOIN
statement to the one-side of each relationship along the path.many-to-one
relationship will use aLEFT JOIN
.one-to-one
relationship will use aFULL JOIN
.
- Otherwise, AQL will throw an error.