In some cases, when exploring you encounter this error
Cannot combine selected fields due to potential fan-out issues.
Why you encounter this error
Fan-out happens when you LEFT JOIN two data models with one-to-many relationship, but the table with
one column is on the left and the table with
many column on the right.
For example, you are working on an ecommerce dataset that contains two data models
Order Items and
Let's say when exploring this set of Data, if you use any field from
order_items, for example,
order_items.order_id then delivery_attempts will be duplicated and normally sum on the number of delivery will be wrong.
Since our current mechanism will find a join path (LEFT JOIN by default) from the data model that has aggregations (measures) to the data model that does not, the fan-out issue occurs.
In the abovementioned case, you use measure field
SUM(orders."delivery_attempts") of orders data model and non-measure field
order_items.order_id of order_items data model.
Since the relationship between
order_items is one-to-many, the underlying query will be generated as below
What have we done to solve the issue
In order to solve this issue, Holistics has modified the underlying query to get the distinct value when the fan-out issue happens.
Specifically, when the measure/aggregation is on the dimension model instead of the fact model, we will calculate the measure first inside the dimension model before executing the join.
For example, let take the example above, when we sum the
delivery_attempts, this query will run first inside
order model to calculate total
delivery_attempts for each order_id. Let's call this Set A
After that, we will execute the join from
orders as normal and select Distinct
order_items model. Let's call this Set B
Finally, we will Full Join Set A and Set B, select
order_id from Set B and
total_delivery_attempts from Set A. By doing this, the order_id will be unique and fan-out issue no longer exists.
The final query will be:
Please remember that we have handled the fan-out issues from our side only if we know the aggregated function of your measure. With that being said, for now if your measure is an only basic calculation (COUNT, SUM, AVG,...), you can use our aggregation in our dataset exploration interface.
On the contrary, if you measure is created inside model, there will be high possibility that we cannot guess our aggregation function thus causing the fan-out issues.