Cannot combine fields due to fan-out issues?

In some cases, when exploring you encounter this error Cannot combine selected fields due to potential fan-out issues.

Why you encounter this error

Fan-out happens when you LEFT JOIN two data models with one-to-many relationship, but the table with one column is on the left and the table with many column on the right.

For example, you are working on an ecommerce dataset that contains two data models Order Items and Orders. Let's say when exploring this set of Data, if you use any field from order_items, for example, order_items.order_id then delivery_attempts will be duplicated and normally sum on the number of delivery will be wrong. Since our current mechanism will find a join path (LEFT JOIN by default) from the data model that has aggregations (measures) to the data model that does not, the fan-out issue occurs.

In the abovementioned case, you use measure field SUM(orders."delivery_attempts") of orders data model and non-measure field order_items.order_id of order_items data model. Since the relationship between orders and order_items is one-to-many, the underlying query will be generated as below

SELECT
T1."order_id" AS "order_id",
SUM(T0."delivery_attempts") AS "delivery_attempts"
FROM
"ecommerce"."orders" T0
LEFT JOIN "ecommerce"."order_items" T1 ON T0."id" = T1."order_id"
WHERE
T0."id" < 2200
GROUP BY 1
ORDER BY 2 DESC
LIMIT 500

What have we done to solve the issue

In order to solve this issue, Holistics has modified the underlying query to get the distinct value when the fan-out issue happens.

Specifically, when the measure/aggregation is on the dimension model instead of the fact model, we will calculate the measure first inside the dimension model before executing the join.

For example, let take the example above, when we sum the delivery_attempts, this query will run first inside order model to calculate total delivery_attempts for each order_id. Let's call this Set A

SELECT
T0."order_id" AS "order_id",
SUM(T0."delivery_attempts") AS "total_delivery_attempts"
FROM
"ecommerce"."orders" T0
WHERE
T0."id" < 2200
GROUP BY 1
ORDER BY 2 DESC

After that, we will execute the join from order_items to orders as normal and select Distinct order_id from order_items model. Let's call this Set B

SELECT
distinct T0."order_id" AS "order_id"
FROM
"ecommerce"."order_items" T0
LEFT JOIN "ecommerce"."orders" T1 ON T0."order_id" = T1."id"
WHERE
T0."id" < 2200

Finally, we will Full Join Set A and Set B, select order_id from Set B and total_delivery_attempts from Set A. By doing this, the order_id will be unique and fan-out issue no longer exists.

The final query will be:

With total_delivery_attemps as (
SELECT
T0."order_id" AS "order_id",
SUM(T0."delivery_attempts") AS "total_delivery_attempts"
FROM
"ecommerce"."orders" T0
WHERE
T0."id" < 2200
GROUP BY 1
ORDER BY 2 DESC
)
, distinct_order_id as (
SELECT
distinct T0."order_id" AS "order_id"
FROM
"ecommerce"."order_items" T0
LEFT JOIN
"ecommerce"."orders" T1 ON T0."order_id" = T1."id"
WHERE
T0."id" < 2200
)
SELECT
T1."order_id"
T0."total_delivery_attempts"
FROM
total_delivery_attemps T0
FULL JOIN
distinct_order_id T1 on T0."order_id" = T1."order_id"

Notes

Please remember that we have handled the fan-out issues from our side only if we know the aggregated function of your measure. With that being said, for now if your measure is an only basic calculation (COUNT, SUM, AVG,...), you can use our aggregation in our dataset exploration interface.

On the contrary, if you measure is created inside model, there will be high possibility that we cannot guess our aggregation function thus causing the fan-out issues.