Skip to main content

Build multiple PreAggregates using AML Extend

Introduction

When setting up Aggregate Awareness, it is a common need to create different PreAggregates for different time granularities so that you can configure more efficient persistence pipelines. For example:

  • A PreAggregate with time granularity month only needs to be persisted once a month.
  • While PreAggregate with time granularity week needs to be persisted once a week.

To conveniently generate multiple PreAggregates for different time granularities, you can leverage AML Reusability!

Without reusability

Here's how you define it without reusability.

pre_aggregates: {
aggr_movie_ratings_monthly: PreAggregate {
persistence: IncrementalPersistence {
schema: 'persisted'
incremental_column: 'timestamp'
}
dimension timestamp {
for: ref('public_ratings', 'timestamp')
time_granularity: 'month'
}
measure highest_rating {
for: ref('public_ratings', 'rating')
aggregation_type: 'max'
}
measure lowest_rating {
for: ref('public_ratings', 'rating')
aggregation_type: 'min'
}
measure sum_rating {
for: ref('public_ratings', 'rating')
aggregation_type: 'sum'
}
}
aggr_movie_ratings_weekly: PreAggregate {
persistence: IncrementalPersistence {
schema: 'persisted'
incremental_column: 'timestamp'
}
dimension timestamp {
for: ref('public_ratings', 'timestamp')
time_granularity: 'week'
}
measure highest_rating {
for: ref('public_ratings', 'rating')
aggregation_type: 'max'
}
measure lowest_rating {
for: ref('public_ratings', 'rating')
aggregation_type: 'min'
}
measure sum_rating {
for: ref('public_ratings', 'rating')
aggregation_type: 'sum'
}
}
aggr_movie_ratings_daily: PreAggregate {
persistence: IncrementalPersistence {
schema: 'persisted'
incremental_column: 'timestamp'
}
dimension timestamp {
for: ref('public_ratings', 'timestamp')
time_granularity: 'day'
}
measure highest_rating {
for: ref('public_ratings', 'rating')
aggregation_type: 'max'
}
measure lowest_rating {
for: ref('public_ratings', 'rating')
aggregation_type: 'min'
}
measure sum_rating {
for: ref('public_ratings', 'rating')
aggregation_type: 'sum'
}
}
}

As shown in this example, we have to repeat many things: persistence, highest_rating, lowest_rating, sum_rating.

  • If we want to add more measures in the future, we have to add 3 times.
  • If we want to create more pre-aggregates in, for example, year, we again have to repeat almost everything.

Refactoring using AML Extend

Now let's refactor them more better reusability using AML Extend.

We can update the above codes using 2 steps:

Step 1: Pick one PreAggregate (e.g. aggr_movie_ratings_daily) and turn it into a variable.

Notes
  • To declare a variable, you need to do it outside of your Dataset declaration.
  • You can also declare this variable in a separate file!
PreAggregate aggr_movie_ratings_daily {
persistence: IncrementalPersistence {
schema: 'persisted'
incremental_column: 'timestamp'
}
dimension timestamp {
for: ref('public_ratings', 'timestamp')
time_granularity: 'day'
}
measure highest_rating {
for: ref('public_ratings', 'rating')
aggregation_type: 'max'
}
measure lowest_rating {
for: ref('public_ratings', 'rating')
aggregation_type: 'min'
}
measure sum_rating {
for: ref('public_ratings', 'rating')
aggregation_type: 'sum'
}
}

Step 2: Create other PreAggregates by extending the variable we just created.

pre_aggregates: {
aggr_movie_ratings_monthly: aggr_movie_ratings_daily.extend({
dimension timestamp {
for: ref('public_ratings', 'timestamp')
time_granularity: 'month'
}
})
aggr_movie_ratings_weekly: aggr_movie_ratings_daily.extend({
dimension timestamp {
for: ref('public_ratings', 'timestamp')
time_granularity: 'week'
}
})
aggr_movie_ratings_daily: aggr_movie_ratings_daily
}

Just like that, we reduced 66 lines of codes into 35 lines of codes, making it more maintainable and more readable at the same time.

AML Extend has made this so convenient!


Let us know what you think about this document :)