Handle Path Ambiguity in Dataset
A grasp of these concepts will help you understand this documentation better:
Introduction
In this document, we will go through two common scenarios in which ambiguous path will happen, and how to handle them.
Example 1: Role-playing Dimension
Definition
In Star Schema design, a role-playing dimensions are dimensions that are used more than once in a fact table, each time with a different meaning or role.
In Holistics, this design can be imitated by creating multiple relationships between two tables. For example, in an Ecommerce company, the date dimension model has three relationships to the Orders
facts. The same dimension table can be used to filter the facts by order date, delivery date, or cancelled date.
While this design is possible, it's important to understand that there can only be one active relationship between two Holistics models. All remaining relationships must be disabled to make this Dataset not being ambiguous.
Having a single active relationship means there is a default filter propagation from date
dim model to orders
fact model. In this instance, the active relationship is set to the most common filter that is used by reports, which is the Date → Orders Created Date
and what you can only answer is how many orders has been created (instead of how many orders that has been delivered
or cancelled
)
Solution
To solve the problem of this design, we can use the with_relationship()
function. The function literally telling Holistics that for this expression, use this relationship, even if it is inactive.
Note that the relationships used in with_relationships()
must be pre-defined in the dataset.
Let’s say that we want to understand how many Orders have been delivered, cancelled, refunded, you can create the metrics as below
Dataset role_playing_dim {
models: [
fct_orders,
dim_dates
]
relationships: [
relationship(fct_orders.created_at > dim_dates.date, true),
relationship(fct_orders.cancelled_at > dim_dates.date, false),
relationship(fct_orders.delivered_at > dim_dates.date, false),
relationship(fct_orders.refunded_at > dim_dates.date, false)
]
// AQL Metrics
metric total_created_orders {
label: 'Total Created Orders'
type: 'number'
definition: @aql fct_orders | count(fct_orders.id);;
}
metric total_delivered_orders {
label: 'Total Delivered Orders'
type: 'number'
definition: @aql
fct_orders
| count(fct_orders.id)
| with_relationships(fct_orders.delivered_at > dim_dates.date)
;;
}
metric total_cancelled_orders {
label: 'Total Cancelled Orders'
type: 'number'
definition: @aql
fct_orders
| count(fct_orders.id)
| with_relationships(fct_orders.cancelled_at > dim_dates.date)
;;
}
metric total_refunded_orders {
label: 'Total Refunded Orders'
type: 'number'
definition: @aql
fct_orders
| count(fct_orders.id)
| with_relationships(fct_orders.refunded_at > dim_dates.date)
;;
}
}
And then when you use these metrics altogether, you will get the Result like below
explore {
dimensions {
dim_dates.date
}
// Note that metrics defined in models are called measures
measures {
total_created_orders,
total_delivered_orders,
total_cancelled_orders,
total_refunded_orders
}
}
Example 2: Fact Constellation Schema Design
Definition
Fact Constellation Schema is also known as Galaxy Schema that further divides Star Schema in small Star Schema(s) where there are more than one Fact Tables and Reusable dimension to connect multiple Fact Tables.
For example, you could have a fact table for orders
and another fact table for inventory
, both linked to the same dimension tables for products
, dates
.
In Holistics, in order for this Dataset to be functional without facing the ambiguous error, you would have to disable 1 relationship from dim to fact. For example, in this case, I will disable the relationship from Products to Inventory
With this relationship setup, you’re unable to find out what are the Total Quantity Available for a specific product just by using the Drag and Drop in the Dataset Explore because the direct relationship between Products and Inventory is inactive.
This implies that, in the Dataset Explore, when you use the combination of Product Name and Sum of Quantity Available, instead of the filtering direction goes from Products → Inventory
, it will go from Products → Orders → Dates → Inventory
which is analytically incorrect.
Solution
What is you want to build a Report that tells
- Total Quantity Available by Product
- Total GMV by Product
- Filtered by Month
One of the solution can be done here is to use with_relationship()
in the context of the metric Total Quantity Available by Product
Dataset galaxy_schema {
models: [
fct_inventory,
dim_products,
fct_orders,
dim_dates
]
relationships: [
relationship(fct_orders.product_id > dim_products.id, true),
relationship(fct_inventory.product_id > dim_products.id, false),
relationship(fct_orders.created_at > dim_dates.date, true),
relationship(fct_inventory.created_at > dim_dates.date, true)
]
// AQL Metrics here
metric total_available_products {
label: 'Total Avaiable Quantity Product'
type: 'number'
definition: @aql
fct_inventory
| sum(fct_inventory.quantity_avaiable)
| with_relationships(fct_inventory.product_id > dim_products.id)
;;
}
}
And then when using it in the Dataset Explore to build the report, you will get the Result like below
explore {
dimensions {
dim_products.name
}
// Note that metrics defined in models are called measures
measures {
total_available_products,
total_gmv_by_products: sum(fct_orders.item_values)
}
filters {
dim_dates.date matches @(last 6 months)
}
}