window_avg
Readings that will help you understand this documentation better:
Definition
A window aggregation function that returns the average of rows in a range relative to the current row.
Syntax
window_avg(agg_expr)
window_avg(agg_expr, order: order_expr, ...)
window_avg(agg_expr, range, order: order_expr, ...)
window_avg(agg_expr, range, order: order_expr, ..., reset: partition_expr, ...)
window_avg(agg_expr, range, order: order_expr, ..., partition: partition_expr, ...)
window_avg(count(users.id))
window_avg(count(users.id), order: count(users.id) | desc())
window_avg(count(users.id), -2..2, order: users.created_at | month())
window_avg(count(users.id), order: users.created_at | month(), reset: users.gender)
Input
agg_expr
(required): An aggregation expression to be averaged.range
(optional): A range of rows to include in the average. Negative values indicate rows before the current row, and positive values indicate rows after the current row, while 0 indicates the current row. If the beginning or end of the range is not specified, the range will include all rows from the beginning or end of the table. By default, if the range is not specified:- If
order
is specified, the range is..0
(from the first row to the current row). - If
order
is not specified, the range is..
(from the first row to the last row).
- If
order
(required, repeatable): A field that is used for ordering. The order is default to ascending. The order can be set explicitly withasc()
ordesc()
.dangerIf the specified order does not uniquely identify rows, the result of the function can be non-deterministic. For example, if you use
order: users.age
, and there are multiple users with the same age in the same partition, the result can be unexpected.partition
orreset
(repeatable, optional): A field that is used for partitioning the table. If partitions are not specified:- If
order
is specified, the table will be partitioned by all other grouping columns. - If
order
is not specified, the table will be considered as a single partition.
- If
Output
The average of the current row and the rows within the specified range.
Sample Usages
explore {
dimensions {
users.gender,
orders.status
}
measures {
count(orders.id),
_avg: window_avg(count(orders.id), ..)
}
}
- Result
- SQL
WITH "aql__t1" AS (
SELECT
"users"."gender" AS "users->gender",
"orders"."status" AS "orders->status",
COUNT("orders"."id") AS "count_orders->id"
FROM
"demo"."orders" "orders"
LEFT JOIN "demo"."users" "users" ON "orders"."user_id" = "users"."id"
GROUP BY
1,
2
)
SELECT
"aql__t1"."users->gender" AS "users->gender",
"aql__t1"."orders->status" AS "orders->status",
"aql__t1"."count_orders->id" AS "count_orders->id",
AVG("aql__t1"."count_orders->id") OVER ( ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS "_avg"
FROM
"aql__t1"
WHERE
"aql__t1"."count_orders->id" IS NOT NULL
The most common use case for window_avg
is to calculate the average of a column. In this example, we calculate the average of the count of users.
explore {
dimensions {
orders.created_at | year()
}
measures {
count(orders.id),
moving_avg: window_avg(count(orders.id), -2..2, order: orders.created_at | year())
}
}
- Result
- SQL
WITH "aql__t1" AS (
SELECT
(DATE_TRUNC ( 'year', (CAST ( "orders"."created_at" AS timestamptz )) AT TIME ZONE 'Asia/Saigon' )) AT TIME ZONE 'Asia/Saigon' AS "orders->created_at->year",
COUNT("orders"."id") AS "count_orders->id"
FROM
"demo"."orders" "orders"
GROUP BY
1
)
SELECT
TO_CHAR((CAST ( "aql__t1"."orders->created_at->year" AS timestamptz )) AT TIME ZONE 'Asia/Saigon', 'YYYY-MM-DD HH24:MI:SS.US') AS "orders->created_at->year",
"aql__t1"."count_orders->id" AS "count_orders->id",
AVG("aql__t1"."count_orders->id") OVER ( ORDER BY "aql__t1"."orders->created_at->year" ASC ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING) AS "moving_avg"
FROM
"aql__t1"
WHERE
"aql__t1"."count_orders->id" IS NOT NULL
We can also use it to calculate the moving average of a column. In this example, we calculate the moving average of the count of users, ordered by the year they were created. Notice that we are using the range
parameter to specify the range of rows to include in the average. In this case, we are calculating the average of the current row and the two rows before and after it (i.e. a total of 5 rows) with the range -2..2
.