Dataset Dimension

Useful resources

The following resources may help you grasp the concept of this document better:

Introduction

Typically, a dimension is defined as a field inside a data model, and it reflects a certain attribute of the model's data.

However, in complex reporting use cases, there is a need to create a dimensions that combines information across multiple models. For example in a Cohort Retention report, we need a Acquisition Cohort dimension that reflects the month that a user placed their first order. In other words, the dimension combines information from the users and orders model.

Holistics's approach to solve this use case is to create a new dimension at the dataset level instead of model level.

How to create dimensions in datasets

There are two ways to create dimensions in datasets:

Create dimensions via UI in the Data tab.
Create dimensions as-code in the Code tab.

Please note that dataset dimensions are persisted to datasets directly. They can be viewed and used by anyone who has access to these datasets. If you are just exploring different ways to answer a business question, you may want to conside creating Ad-hoc Fields instead.

Create Dataset Dimension

info

Unlike model fields, you can only use AQL to define dimensions and metrics in datasets, SQL definitions are not supported.

A dimension can be declared inside a dataset using this syntax:

Dataset dataset_name {
  (...)

  dimension dimension_name {
    model: source_model_name
    type: 'text' | 'number' | 'date' | 'datetime' | 'truefalse'
    label: 'Dimension Label'
    definition: @aql aql_expression ;;
  }
}

The model parameter defines the source model from which you will start referencing fields.

tip

For more details relating to source model and field referencing, please refer to the Cross-Model Reference document.

In the following sections, we will go through different cases of dataset dimension.

Single-model dimension

Single-model dimensions only reference fields of one model in their definitions. For example, the following dimensions only transform dimensions of the users model:

Dataset e_commerce {
  (...)

  dimension full_name {
    model: users 
    type: 'text'
    label: 'Full name'
    definition: @aql concat(users.first_name, ' ', users.last_name);;
  }

  dimension age_by_year {
    model: users 
    type: 'text'
    label: 'Full name'
    definition: @aql date_diff('day', users.birth_date, @now) / 365;;
  }
}

In this case, we recommend defining these dimensions at the model level, so that information relating to a model is centralized in one place.

Best Practices

Single-model dimension should be defined at the model level instead of dataset level.

Cross-model dimension

Cross-model dimensions reference dimensions of multiple models that have relationships with one another.

For example, in the Cohort Retention guide, we have the following dimension definition:

Dataset e_commerce {
  (...)
  relationships: [relationship(orders.user_id > users.id, true)]

  dimension acquisition_month_cohort {
    model: users 
    type: 'date'
    label: 'Acquisition Month Cohort'
    definition: @aql min(orders.created_at | month()) | dimensionalize(users.id);;
  }
}

In case the models have a 1-1 relationship, you can freely combine fields. However, in case the models have an n-1 relationship, the behavior is a bit more complicated.

Cross-Model Reference

For an in-depth explanation of the cross-model reference mechanism, please check our document about Cross-Model Reference.

Source model is on the one-side

If the dimension's source model is on the one-side of the relationship, and you try to do a simple combination of the fields, the calculation will be blocked by the "Cannot combine selected fields due to potential fan-out issues." error.

For example, with models orders and users having a n-1 relationship, the following dimension will be invalid:

Dataset e_commerce {
  (...)

  relationships: [relationship(orders.user_id > users.id, true)]
  
  // Invalid dimension
  dimension simple_combination {
    model: users  // source model on the one-side
    type: 'number'
    label: 'Simple Combination'
    definition: @aql users.id + orders.id ;;
  }
}

The dimension only works when you aggregate the many-side with an aggregator function, like using the min() function in the acquisition_month_cohort example above.

tip

For more in-depth explanation of this behavior, please refer to the following documents:

Source model is on the many-side

If the dimension's source model is on the many-side of the relationship, you can freely reference fields of the one-side model. For example, this would work:

Dataset e_commerce {
  (...)

  relationships: [relationship(orders.user_id > users.id, true)]
  
  // Valid dimension
  dimension simple_combination {
    model: orders  // source model on the many-side
    type: 'number'
    label: 'Simple Combination'
    definition: @aql users.id + orders.id ;;
  }
}

Introduction​

How to create dimensions in datasets​

Create Dataset Dimension​

Single-model dimension​

Cross-model dimension​

Source model is on the one-side​

Source model is on the many-side​