Skip to main content

Cross-Model Calculation

Knowledge Checkpoint

A grasp of these concepts will help you understand this documentation better:

AQL supports cross-model calculation, which allows you to access and operate on data from multiple models in a single query. This feature comes in handy when you need data from multiple models to perform a calculation or to create a new metric.

Semantic Data Modeling Layer

If you are familiar with SQL, you can think of cross-model operation as a JOIN operation. However, unlike SQL, AQL operates on a semantic Data Modeling Layer which encodes the relationship between data ahead of time. This allows AQL to perform cross-model calculation without needing you to write a JOIN statement every time you need to access data from multiple models.

At its core, this data modeling layer is an abstraction of your business logic. Each model represents a business entity, and the relationship between them represents how these entities are related to each other. For example:

  • The order_items model represents the items in an order, and each item belongs to exactly one order, and contains exactly one product.
  • The orders model represents the orders, and each order belongs to exactly one customer.
  • The products model represents the products, and each product belongs to exactly one category, and belongs to exactly one merchant.

Cross-Model Reference

AQL leverages the knowledge about your models to allow you to reference columns from multiple models in a single query. For example, when you're writing a query that operates on the order_items model, you can reference data from the orders model and the products model directly.

Here AQL knows that the current order_items row related to exactly one products row, and was able to include that row in your calculation.

This is not limited to just one level of relationship. You can even reference columns from models that are 2 or more levels away from the source model, as long as the transitive relationship between your source model and the target is still many-to-one or one-to-one.

Since one order_items includes one products, and products belongs to one merchants, transitively speaking, one order_items also belongs to exactly one merchants. Therefore, you can reference columns from the merchants model directly in your query if you're operating on the order_items model.

And in this case, since order_items is the root (facts table) of your Data Modeling Layer, you can reference columns from any model in your Data Modeling Layer.

Accessing models on the many-side of a relationship

Up until now, we have only talked about referencing columns from models on the one-side of a relationship. But what if you want to reference columns from models on the many-side of a relationship?

The short answer is: you can't. This is because the many-side of a relationship is not a single row, but a collection of rows. Therefore, it is not possible to do meaningful calculations on the many-side of a relationship. E.g. What does it mean to multiply a orders.discount with a collection of order_items.value?

Here since we are operating on the orders model, we cannot reference columns from the order_items model directly.

The long answer is: you can, but you have to aggregate the many-side of the relationship first. This is done by using the sum function (or other aggregator functions) to aggregate the many-side of the relationship into a single value. E.g. sum(order_items.value) will return the total value of all order_items in the current orders row.

Multiply orders discount with the sum of their items' value
orders 
| group(orders.id, orders.discount)
| select(value: (1 - orders.discount) * sum(order_items.value))
| sum()

Aside: How is the JOIN generated?

For the curious readers, who want to know how AQL generates the JOIN statement behide the scene, here is a simplified version of the algorithm:

  1. For each referenced model, AQL will find the shortest path from the source model to the target model.
  2. If this path exists and do not cause a fan-out (going from the one-side of a relationship to the many-side of a relationship) then AQL will generate a JOIN statement to the one-side of each relationship along the path.
    • many-to-one relationship will use a LEFT JOIN.
    • one-to-one relationship will use a FULL JOIN.
  3. Otherwise, AQL will throw an error.

Let us know what you think about this document :)