AQL in 30 minutes
AQL is a metrics-centric query language that both allows users to query to get a data table output like SQL, and defining independent metrics (aggregation definitions) to be reused in later table queries. In this short tutorial, we will go through a few examples that give you a sense of what AQL is and how to use it.
Prerequisites
First we need to define models/datasets with AML. Given that we already defined data models in AML, we can use AQL to query those models
An AQL expression only works within the context of a dataset as it requires knowledge of the models and relationships between them defined in the dataset
// Define users model
Model users {
...
dimension id {...}
dimension users {...}
dimension age {...}
dimension country_id {...}
}
// Define countries model
Model countries {
...
dimension id {...}
dimension name {...}
}
// Define e_commerce dataset
Dataset e_commerce {
models: [users, countries]
relationships: [relationship(users.country_id > countries.id, true)]
}
The Basic
Here are a few basic queries:
// Querying a table/model
users // equivalent to `select * from {{#users}}`
// Select only subset of columns/fields
select(users, users.id, users.gender)
// Aggregation (without grouping)
avg(users, users.age)
The pipe operator
Unlike SQL where the whole query requires to be a fixed pattern (select...from...where...group by...), AQL allows a query to be broken down into smaller components joined by the "pipe operator". We believe this syntax allows for a more natural way to map user's mental model to the resulting query. Generally pipe works like this: users | avg(users.age)
is equivalent to avg(users, users.age)
. In general, this means the part before the pipe is fed into the function behind the pipe as the first argument.
// Same select expression but with pipe
users | select(users.id, users.gender)
// Same average aggregation with pipe
users | avg(users.age)
// Filtering data
users
| filter(gender == 'Female')
| select(id, name)
// returns a single number representing average age of all users in the table
Cross-model calculation
The key difference between AQL and SQL is that since we already defined relationships in the dataset with AML, user doesn't need to explicit join the required table to get desired result. Examples:
// Select users with country name from joined "country" model
users
| select(users.name, country.name)
// Filtering on another joined model
users
| filter(country.name == 'Singapore')
| select(id, name)
Notice that there is no need for explicitly join between user
and country
tables like in SQL.
Grouping and aggregation
Without grouping, aggregation is evaluated on the whole table e.g. users | avg(users.age)
will returns a single number representing the average age of all users in users table. With grouping, aggregation is evaluated in groups:
users
| group(users.gender)
| select(users.gender, count(users.id))
// Grouping works across models, too
users
| group(countries.name)
| select(countries.name, count(users.id))
The way group works is pretty similar to SQL's "group by". Your subsequent select
must match the grouping previously used in group
. For example the following will throw an error:
users
| group(users.gender)
| select(users.gender, countries.name, count(users.id))
However, there is a difference. The following query will, for example, returns a table with 2 rows "Male" and "Female":
// Grouping without aggregation
users
| group(users.gender)
For more details, you can read the reference page for group
function.
Reusing aggregation with measure
We want to reuse count(users.id) for later use
Model users {
...
measure user_count {
definition: @aql count(users.id);;
}
}
Then we can replace previous query with the following
users
| group(users.gender)
| select(users.gender, users.user_count)
The previous change doesn't seem to gain much benefit for us given the hassle of modifying the AML code. We will see, however, the benefits in subsequent examples.
Table expression vs Metric expression
In the previous query examples, we always started with a table and ended with a table. Those are a kind of AQL expression called Table Expression. Functions like select
, filter
, group
works on table and returns new table and so they are called Table Functions.
We also saw how an aggregation i.e. count(users.id)
can be reused by putting it into the AML measure definition. This kind of aggregation code is call Metric Expression, representing a reusable business metric. In addition, the functions that work on metrics and returns new metrics are called Metrics Functions. We will learn more complex examples of metric expressions and functions in the next examples.
Let's say we want to count number of female users, grouped by country. You can use table expression like so:
// Filtering aggregation with `where`
users
| filter(users.gender == 'Female')
| group(countries.name)
| select(countries.name, count(users.id))
This works fine, but what if we want to reuse a metric similar to the example above? With metric functions, we can do it like this:
users
| group(countries.name)
| select(countries.name, count(user.id) | where(users.gender == 'Female'))
Notice the where
part behind the count(user.id)
. This effectively creates a query that is equivalent to the previous query but attaching the filtering logic into the metric itself.
And we can make it reusable:
Model users {
...
measure female_user_count {
definition: @aql count(users.id) | where(users.gender == 'Female') ;;
}
}
// And the resulting query becomes
users
| group(countries.name)
| select(countries.name, users.female_user_count)
Can you see the benefits? By pushing the logic into the reused metric definition, you don't need to worry about where to put the filter
function later on, especially when the query gets more complicated. Thus, it's generally advisable to use a metric expression with where
instead of a table expression with filter
inside metric definitions as it is more reusable. The exception, for example, is when you want to precisely control the filtering condition such as the case of nested aggregation.
In contrast to SQL, where the focus is primarily on tables, AQL treats table expressions as a means to an end: defining metrics. In particular, a table expression often serves as the source table expression within the metric expression definition. Although we started this tutorial with table expressions to cater to SQL users' familiarity, it’s important to emphasize that metric expressions are the central component of AQL.
What is a metric
A metric is basically just a piece of aggregation logic (represented by aggregation functions such as sum
, count
, avg
, etc.) attached with some context. In the case above, the context is simply the condition of users.gender == 'Male'
, which is attached to the aggregation logic by the where
function.
There are other functions that allow you to manipulate the context of a metric. You can change the condition, the relationship, adding window logic to a metrics with these functions. In the next part we will discuss another type of context modifier called Level of Detail.
Level of Detail
A metric by itself is basically just a piece of aggregation calculation logic and can't returns data. Only when used in the context of a query with accompanying group
and select
that we can get a table data result. When we allow defining an independent metric with metric expression, however, sometimes it makes sense to add a fixed "grouping" to that metrics, which we call "grains".
For example, we want to calculate "Percentage of users by country over total users". In SQL, you can solve this using a CTE, a subquery or the over
window function. These solutions work, but the aggregation logic is coupled with the implementation details of each solution.
In AQL, we can write a query like this:
users
| group(countries.name)
| select(
countries.name,
count(users.id) / (count(users.id) | of_all(countries))
)
Notice 2 things from the above example:
- We can create a new metric by creating a ratio of two metrics
of_all
is a metric function that allow a metric to "ignore" all the grouping/filtering part of the query. In this case, it allows the users count to be calculated across all countries and makes percentage calculation possible.
The key advantage of this over the SQL solutions is the fact that with this new approach, it is crystal clear that the requirement is basically about creating a new ratio metric where the denominator is the same as the numerator except for the "grains", or "level of detail". The logic also totally decoupled from how it is calculated and can even be made reusable:
Dataset users_countries {
models: [users, countries]
relationships: [relationship(users.country_id > countries.id, true)]
// ...
metric user_count {
definition: @aql count(users.id) ;;
}
metric user_count_across_countries {
definition: @aql user_count | of_all(countries) ;;
}
metric pct_user_count_across_countries {
definition: @aql user_count / users.user_count_across_countries ;;
}
}
You can learn more about Level of Detail here.
Conclusion
Hope you have a taste of AQL and its power! There are a lot more to learn about AQL. You can read more about Table Expressions and Metric Expression, AQL Functions to build up those expressions. You can also check out the analytic use cases such as Percent of Total or Cohort Analysis to see more AQL in action.
Cheers!