Skip to main content

AQL in 30 minutes

AQL is a metrics-centric query language that both allows users to query to get a data table output like SQL, and defining independent metrics (aggregation definitions) to be reused in later table queries. In this short tutorial, we will go through a few examples that give you a sense of what AQL is and how to use it.

Prerequisites

First we need to define models/datasets with AML. Given that we already defined data models in AML, we can use AQL to query those models

An AQL expression only works within the context of a dataset as it requires knowledge of the models and relationships between them defined in the dataset

// Define users model 
Model users {
...
dimension id {...}
dimension users {...}
dimension age {...}
dimension country_id {...}
}

// Define countries model
Model countries {
...
dimension id {...}
dimension name {...}
}

// Define e_commerce dataset
Dataset e_commerce {
models: [users, countries]
relationships: [relationship(users.country_id > countries.id, true)]
}

The Basic

Here are a few basic queries:

// Querying a table/model
users // equivalent to `select * from {{#users}}`

// Select only subset of columns/fields
select(users, users.id, users.gender)

// Aggregation (without grouping)
avg(users, users.age)

The pipe operator

Unlike SQL where the whole query requires to be a fixed pattern (select...from...where...group by...), AQL allows a query to be broken down into smaller components joined by the "pipe operator". We believe this syntax allows for a more natural way to map user's mental model to the resulting query. Generally pipe works like this: users | avg(users.age) is equivalent to avg(users, users.age). In general, this means the part before the pipe is fed into the function behind the pipe as the first argument.

// Same select expression but with pipe 
users | select(users.id, users.gender)

// Same average aggregation with pipe
users | avg(users.age)

// Filtering data
users
| filter(gender == 'Female')
| select(id, name)

// returns a single number representing average age of all users in the table

Cross-model calculation

The key difference between AQL and SQL is that since we already defined relationships in the dataset with AML, user doesn't need to explicit join the required table to get desired result. Examples:

// Select users with country name from joined "country" model
users
| select(users.name, country.name)

// Filtering on another joined model
users
| filter(country.name == 'Singapore')
| select(id, name)

Notice that there is no need for explicitly join between user and country tables like in SQL.

Grouping and aggregation

Without grouping, aggregation is evaluated on the whole table e.g. users | avg(users.age) will returns a single number representing the average age of all users in users table. With grouping, aggregation is evaluated in groups:

users
| group(users.gender)
| select(users.gender, count(users.id))

// Grouping works across models, too
users
| group(countries.name)
| select(countries.name, count(users.id))

The way group works is pretty similar to SQL's "group by". Your subsequent select must match the grouping previously used in group. For example the following will throw an error:

users
| group(users.gender)
| select(users.gender, countries.name, count(users.id))

However, there is a difference. The following query will, for example, returns a table with 2 rows "Male" and "Female":

// Grouping without aggregation
users
| group(users.gender)

For more details, you can read the reference page for group function.

Reusing aggregation with measure

We want to reuse count(users.id) for later use

Model users {
...
measure user_count {
definition: @aql count(users.id);;
}
}

Then we can replace previous query with the following

users
| group(users.gender)
| select(users.gender, users.user_count)

The previous change doesn't seem to gain much benefit for us given the hassle of modifying the AML code. We will see, however, the benefits in subsequent examples.

Table expression vs Metric expression

In the previous query examples, we always started with a table and ended with a table. Those are a kind of AQL expression called Table Expression. Functions like select, filter, group works on table and returns new table and so they are called Table Functions.

We also saw how an aggregation i.e. count(users.id) can be reused by putting it into the AML measure definition. This kind of aggregation code is call Metric Expression, representing a reusable business metric. In addition, the functions that work on metrics and returns new metrics are called Metrics Functions. We will learn more complex examples of metric expressions and functions in the next examples.

Let's say we want to count number of female users, grouped by country. You can use table expression like so:

// Filtering aggregation with `where`
users
| filter(users.gender == 'Female')
| group(countries.name)
| select(countries.name, count(users.id))

This works fine, but what if we want to reuse a metric similar to the example above? With metric functions, we can do it like this:

users
| group(countries.name)
| select(countries.name, count(user.id) | where(users.gender == 'Female'))

Notice the where part behind the count(user.id). This effectively creates a query that is equivalent to the previous query but attaching the filtering logic into the metric itself.

And we can make it reusable:

Model users {
...
measure female_user_count {
definition: @aql count(users.id) | where(users.gender == 'Female') ;;
}
}

// And the resulting query becomes
users
| group(countries.name)
| select(countries.name, users.female_user_count)

Can you see the benefits? By pushing the logic into the reused metric definition, you don't need to worry about where to put the filter function later on, especially when the query gets more complicated. Thus, it's generally advisable to use a metric expression with where instead of a table expression with filter inside metric definitions as it is more reusable. The exception, for example, is when you want to precisely control the filtering condition such as the case of nested aggregation.

info

In contrast to SQL, where the focus is primarily on tables, AQL treats table expressions as a means to an end: defining metrics. In particular, a table expression often serves as the source table expression within the metric expression definition. Although we started this tutorial with table expressions to cater to SQL users' familiarity, it’s important to emphasize that metric expressions are the central component of AQL.

What is a metric

A metric is basically just a piece of aggregation logic (represented by aggregation functions such as sum, count, avg, etc.) attached with some context. In the case above, the context is simply the condition of users.gender == 'Male', which is attached to the aggregation logic by the where function.

There are other functions that allow you to manipulate the context of a metric. You can change the condition, the relationship, adding window logic to a metrics with these functions. In the next part we will discuss another type of context modifier called Level of Detail.

Level of Detail

A metric by itself is basically just a piece of aggregation calculation logic and can't returns data. Only when used in the context of a query with accompanying group and select that we can get a table data result. When we allow defining an independent metric with metric expression, however, sometimes it makes sense to add a fixed "grouping" to that metrics, which we call "grains".

For example, we want to calculate "Percentage of users by country over total users". In SQL, you can solve this using a CTE, a subquery or the over window function. These solutions work, but the aggregation logic is coupled with the implementation details of each solution.

In AQL, we can write a query like this:

users
| group(countries.name)
| select(
countries.name,
count(users.id) / (count(users.id) | of_all(countries))
)

Notice 2 things from the above example:

  • We can create a new metric by creating a ratio of two metrics
  • of_all is a metric function that allow a metric to "ignore" all the grouping/filtering part of the query. In this case, it allows the users count to be calculated across all countries and makes percentage calculation possible.

The key advantage of this over the SQL solutions is the fact that with this new approach, it is crystal clear that the requirement is basically about creating a new ratio metric where the denominator is the same as the numerator except for the "grains", or "level of detail". The logic also totally decoupled from how it is calculated and can even be made reusable:

Dataset users_countries {
models: [users, countries]
relationships: [relationship(users.country_id > countries.id, true)]
// ...
metric user_count {
definition: @aql count(users.id) ;;
}
metric user_count_across_countries {
definition: @aql user_count | of_all(countries) ;;
}
metric pct_user_count_across_countries {
definition: @aql user_count / users.user_count_across_countries ;;
}
}

You can learn more about Level of Detail here.

Conclusion

Hope you have a taste of AQL and its power! There are a lot more to learn about AQL. You can read more about Table Expressions and Metric Expression, AQL Functions to build up those expressions. You can also check out the analytic use cases such as Percent of Total or Cohort Analysis to see more AQL in action.

Cheers!


Let us know what you think about this document :)