Build multiple PreAggregates using AML Extend
Introduction
When setting up Aggregate Awareness, it is a common need to create different PreAggregates for different time granularities so that you can configure more efficient persistence pipelines. For example:
- A PreAggregate with time granularity
month
only needs to be persisted once a month. - While PreAggregate with time granularity
week
needs to be persisted once a week.
To conveniently generate multiple PreAggregates for different time granularities, you can leverage AML Reusability!
Without reusability
Here's how you define it without reusability.
- Codes
- Screenshots
pre_aggregates: {
aggr_movie_ratings_monthly: PreAggregate {
persistence: IncrementalPersistence {
schema: 'persisted'
incremental_column: 'timestamp'
}
dimension timestamp {
for: ref('public_ratings', 'timestamp')
time_granularity: 'month'
}
measure highest_rating {
for: ref('public_ratings', 'rating')
aggregation_type: 'max'
}
measure lowest_rating {
for: ref('public_ratings', 'rating')
aggregation_type: 'min'
}
measure sum_rating {
for: ref('public_ratings', 'rating')
aggregation_type: 'sum'
}
}
aggr_movie_ratings_weekly: PreAggregate {
persistence: IncrementalPersistence {
schema: 'persisted'
incremental_column: 'timestamp'
}
dimension timestamp {
for: ref('public_ratings', 'timestamp')
time_granularity: 'week'
}
measure highest_rating {
for: ref('public_ratings', 'rating')
aggregation_type: 'max'
}
measure lowest_rating {
for: ref('public_ratings', 'rating')
aggregation_type: 'min'
}
measure sum_rating {
for: ref('public_ratings', 'rating')
aggregation_type: 'sum'
}
}
aggr_movie_ratings_daily: PreAggregate {
persistence: IncrementalPersistence {
schema: 'persisted'
incremental_column: 'timestamp'
}
dimension timestamp {
for: ref('public_ratings', 'timestamp')
time_granularity: 'day'
}
measure highest_rating {
for: ref('public_ratings', 'rating')
aggregation_type: 'max'
}
measure lowest_rating {
for: ref('public_ratings', 'rating')
aggregation_type: 'min'
}
measure sum_rating {
for: ref('public_ratings', 'rating')
aggregation_type: 'sum'
}
}
}
As shown in this example, we have to repeat many things: persistence
, highest_rating
, lowest_rating
, sum_rating
.
- If we want to add more measures in the future, we have to add 3 times.
- If we want to create more pre-aggregates in, for example,
year
, we again have to repeat almost everything.
Refactoring using AML Extend
Now let's refactor them more better reusability using AML Extend.
We can update the above codes using 2 steps:
Step 1: Pick one PreAggregate (e.g. aggr_movie_ratings_daily
) and turn it into a variable.
- To declare a variable, you need to do it outside of your
Dataset
declaration. - You can also declare this variable in a separate file!
- Codes
- Screenshots
PreAggregate aggr_movie_ratings_daily {
persistence: IncrementalPersistence {
schema: 'persisted'
incremental_column: 'timestamp'
}
dimension timestamp {
for: ref('public_ratings', 'timestamp')
time_granularity: 'day'
}
measure highest_rating {
for: ref('public_ratings', 'rating')
aggregation_type: 'max'
}
measure lowest_rating {
for: ref('public_ratings', 'rating')
aggregation_type: 'min'
}
measure sum_rating {
for: ref('public_ratings', 'rating')
aggregation_type: 'sum'
}
}
Step 2: Create other PreAggregates by extending the variable we just created.
- Codes
- Screenshots
pre_aggregates: {
aggr_movie_ratings_monthly: aggr_movie_ratings_daily.extend({
dimension timestamp {
for: ref('public_ratings', 'timestamp')
time_granularity: 'month'
}
})
aggr_movie_ratings_weekly: aggr_movie_ratings_daily.extend({
dimension timestamp {
for: ref('public_ratings', 'timestamp')
time_granularity: 'week'
}
})
aggr_movie_ratings_daily: aggr_movie_ratings_daily
}
Just like that, we reduced 66 lines of codes into 35 lines of codes, making it more maintainable and more readable at the same time.
AML Extend has made this so convenient!