# How Holistics handles joins > Learn how data joins work and their use cases in Holistics. This page also offers tips to avoid fan-out issues ## Introduction For business users that use drag-and-drop report interface day by day, it may not be that important to understand what is happening behind the scenes as long as "it works". However, for analysts, it is important to understand fully Holistics' underlying mechanism for joining to ensure the results are calculated correctly. ## How we generate SQL based on different join types Holistics uses `LEFT JOIN` (also known as `LEFT OUTER JOIN`) for both `many-to-one` and `one-to-one` relationships. For `many-to-one`, the model on the "many" side goes on the left of the join. Consider an example where we want to **count total revenue generated by different genders of the users**. Using Data Exploration, the result may look like this: ![](https://media.holistics.io/91c94de5-how-joins-work-02.png) The final SQL will look like this: ```sql SELECT U.gender, SUM(O.revenue) as total_orders FROM orders O LEFT JOIN users U ON O.user_id = U.id GROUP BY 1 ``` :::info Why LEFT JOIN from the "many" side to the "one" side? This preserves rows from the "many" table even when there's no matching record on the "one" side (referential integrity). In the example above, orders whose `user_id` doesn't exist in `users` are still counted. ::: ## Potential fan-out issues with one-to-many relationship Fan-out happens when you **LEFT JOIN** two data models with **one-to-many** relationship, but the table with `one column` is on the left and the table with `many column` on the right. For example, you are working on an ecommerce dataset that contains two data models `Order Items` and `Orders`. Let's say when exploring this set of Data, if you use any field from `order_items`, for example, `order_items.order_id` then **delivery_attempts** will be duplicated and normally sum on the number of delivery will be wrong. Since our current mechanism will find a join path (LEFT JOIN by default) from the data model that has aggregations (measures) to the data model that does not, the fan-out issue occurs. In the aforementioned case, you use measure field `SUM(orders."delivery_attempts")` of **orders** data model and non-measure field `order_items.order_id` of **order_items** data model. Since the relationship between `orders` and `order_items` is **one-to-many**, the underlying query will be generated as below ```sql SELECT T1."order_id" AS "order_id", SUM(T0."delivery_attempts") AS "delivery_attempts" FROM "ecommerce"."orders" T0 LEFT JOIN "ecommerce"."order_items" T1 ON T0."id" = T1."order_id" WHERE T0."id" < 2200 GROUP BY 1 ORDER BY 2 DESC LIMIT 500 ``` ### What have we done to solve the issue :::note Note: Holistics would auto-resolve this fan-out issue for the Aggregations and Business Calculation in our Dataset exploration UI or measure defined using AML (only in 4.0). However, we do not support solving the fan-out problems when using measures created in the 3.0 Data Model layer. ::: In order to solve this issue, Holistics has modified the underlying query to get the distinct value when the fan-out issue happens. Specifically, when the measure/aggregation is on the dimension model instead of the fact model, we will calculate the measure first inside the dimension model before executing the join. For example, let take the example above, when we sum the `delivery_attempts`, this query will run first inside `order` model to calculate total `delivery_attempts` for each order_id. Let's call this Set A ```sql SELECT T0."id" AS "order_id", SUM(T0."delivery_attempts") AS "total_delivery_attempts" FROM "ecommerce"."orders" T0 WHERE T0."id" < 2200 GROUP BY 1 ORDER BY 2 DESC ``` After that, we will execute the join from `order_items` to `orders` as normal and select Distinct `order_id` from `order_items` model. Let's call this Set B ```sql SELECT distinct T0."order_id" AS "order_id" FROM "ecommerce"."order_items" T0 LEFT JOIN "ecommerce"."orders" T1 ON T0."order_id" = T1."id" WHERE T0."id" < 2200 ``` Finally, we will execute Inner Join Set A and Set B, select `order_id` from Set B and `total_delivery_attempts` from Set A. By doing this, the order_id will be unique and fan-out issue no longer exists. The final query will be: ```sql With total_delivery_attemps as ( SELECT T0."order_id" AS "order_id", SUM(T0."delivery_attempts") AS "total_delivery_attempts" FROM "ecommerce"."orders" T0 WHERE T0."id" < 2200 GROUP BY 1 ORDER BY 2 DESC ) , distinct_order_id as ( SELECT distinct T0."order_id" AS "order_id" FROM "ecommerce"."order_items" T0 LEFT JOIN "ecommerce"."orders" T1 ON T0."order_id" = T1."id" WHERE T0."id" < 2200 ) SELECT T1."order_id" T0."total_delivery_attempts" FROM total_delivery_attemps T0 INNER JOIN distinct_order_id T1 on T0."order_id" = T1."order_id" ``` ### How to avoid fan-out issue in your report In Holistics, the fan-out warning should only appear when you're using a custom measure, because we only auto-handle fan-out for measures with an explicit aggregation type. If your measure is created inside modeling layer using SQL, there will be high possibility that we cannot guess our aggregation function thus causing the fan-out issues. There are some solutions to avoid the error in this situation: - Use Aggregation or Business calculation on the Dataset Exploration - Using AML syntax (only in 4.0) You could find more detail [here](/docs/joins/troubleshooting-fanout) ## Handling referential integrity violations Take the (Orders, Users) example [above](#how-we-generate-sql-based-on-different-join-types). Consider scenario where there are `orders` records of user ID = 5, but in the `users` model, no corresponding record of user ID 5 found. This is a **violation of the referential integrity rule** between orders and users (many-to-one relationship). If we use INNER JOIN for the above query, the result set would eliminate orders rows with unfounded users. This is dangerous and will underreport the sales results. Therefore, **using INNER JOIN does not resolve referential integrity violation correctly**. ```sql -- IMPORTANT: This is not what Holistics does -- Using INNER JOIN SELECT U.gender, SUM(O.revenue) as total_orders FROM orders O INNER JOIN users U ON O.user_id = U.id GROUP BY 1 ``` Generally speaking, **Holistics will opt to use OUTER JOIN** when dealing with referential integrity issues. In some situation, this will add query performance overhead, but it ensures that all records will be accounted for and not fall into referential integrity violation traps.