Dataset Dimension
The following resources may help you grasp the concept of this document better:
Introduction
Typically, a dimension is defined as a field inside a data model, and it reflects a certain attribute of the model's data.
However, in complex reporting use cases, there is a need to create a dimensions that combines information across multiple models. For example in a Cohort Retention report, we need a Acquisition Cohort dimension that reflects the month that a user placed their first order. In other words, the dimension combines information from the users
and orders
model.
Holistics's approach to solve this use case is to create a new dimension at the dataset level instead of model level.
How to create dimensions in datasets
There are two ways to create dimensions in datasets:
- Create dimensions via UI in the Data tab.
- Create dimensions as-code in the Code tab.
Please note that dataset dimensions are persisted to datasets directly. They can be viewed and used by anyone who has access to these datasets. If you are just exploring different ways to answer a business question, you may want to conside creating Ad-hoc Fields instead.
Create Dataset Dimension
Unlike model fields, you can only use AQL to define dimensions and metrics in datasets, SQL definitions are not supported.
A dimension can be declared inside a dataset using this syntax:
Dataset dataset_name {
(...)
dimension dimension_name {
model: source_model_name
type: 'text' | 'number' | 'date' | 'datetime' | 'truefalse'
label: 'Dimension Label'
definition: @aql aql_expression ;;
}
}
The model
parameter defines the source model from which you will start referencing fields.
For more details relating to source model and field referencing, please refer to the Cross-Model Reference document.
In the following sections, we will go through different cases of dataset dimension.
Single-model dimension
Single-model dimensions only reference fields of one model in their definitions. For example, the following dimensions only transform dimensions of the users
model:
Dataset e_commerce {
(...)
dimension full_name {
model: users
type: 'text'
label: 'Full name'
definition: @aql concat(users.first_name, ' ', users.last_name);;
}
dimension age_by_year {
model: users
type: 'text'
label: 'Full name'
definition: @aql date_diff('day', users.birth_date, @now) / 365;;
}
}
In this case, we recommend defining these dimensions at the model level, so that information relating to a model is centralized in one place.
Single-model dimension should be defined at the model level instead of dataset level.
Cross-model dimension
Cross-model dimensions reference dimensions of multiple models that have relationships with one another.
For example, in the Cohort Retention guide, we have the following dimension definition:
Dataset e_commerce {
(...)
relationships: [relationship(orders.user_id > users.id, true)]
dimension acquisition_month_cohort {
model: users
type: 'date'
label: 'Acquisition Month Cohort'
definition: @aql min(orders.created_at | month()) | dimensionalize(users.id);;
}
}
In case the models have a 1-1 relationship, you can freely combine fields. However, in case the models have an n-1 relationship, the behavior is a bit more complicated.
For an in-depth explanation of the cross-model reference mechanism, please check our document about Cross-Model Reference.
Source model is on the one-side
If the dimension's source model is on the one-side of the relationship, and you try to do a simple combination of the fields, the calculation will be blocked by the "Cannot combine selected fields due to potential fan-out issues." error.
For example, with models orders
and users
having a n-1 relationship, the following dimension will be invalid:
Dataset e_commerce {
(...)
relationships: [relationship(orders.user_id > users.id, true)]
// Invalid dimension
dimension simple_combination {
model: users // source model on the one-side
type: 'number'
label: 'Simple Combination'
definition: @aql users.id + orders.id ;;
}
}
The dimension only works when you aggregate the many-side with an aggregator function, like using the min()
function in the acquisition_month_cohort
example above.
For more in-depth explanation of this behavior, please refer to the following documents:
Source model is on the many-side
If the dimension's source model is on the many-side of the relationship, you can freely reference fields of the one-side model. For example, this would work:
Dataset e_commerce {
(...)
relationships: [relationship(orders.user_id > users.id, true)]
// Valid dimension
dimension simple_combination {
model: orders // source model on the many-side
type: 'number'
label: 'Simple Combination'
definition: @aql users.id + orders.id ;;
}
}