Combining data from different models
In an exploration, users can drag fields from different data models in a single explore. Based on the relationships metadata, Holistics figures out the correct JOINs statements to apply to the SQL query.
Example: Given a dataset created from the following models:
You want to calculate: "Total orders broken down by countries". You will need access to
Simply drag in Country Name and Total Orders field (from 2 different models):
The generated query will be:
T3."name" AS "name",
count(1) AS "total_orders"
LEFT JOIN "ecommerce"."users" T1 ON T0."user_id" = T1."id"
LEFT JOIN "ecommerce"."cities" T2 ON T1."city_id" = T2."id"
LEFT JOIN "ecommerce"."countries" T3 ON T2."country_code" = T3."code"
GROUP BY 1
ORDER BY 2 DESC
Ambiguity in join paths
Sometimes, based on the relationships metadata, there are more than one possible paths to combine fields between two models.
Example: you have some models with relationships as below
To answer the question "Total order values placed by countries", you drag in Country Name from
countries and Orders Count from
orders. As you can see, there are two ways to go from
- A: countries → cities → users → orders → order items
- B: countries → cities → merchants → products → order items
Each JOIN path will produce a different result with different meanings (total values by users' countries vs. by merchant's origin countries.)
To prevent ambiguity, when creating datasets, Holistics automatically detects and disables ambiguous relationships in the JOIN paths. However, you could also browse the list of relationships and adjust your active relationship according to your needs.
Duplicating into 2 different models: In these situations, it is also recommended that you duplicate Cities and Countries models into dedicated models for Users and Merchants. This makes your dataset more clear to the end-user:
Which JOIN types (left, right, full) does Holistics use?
The JOINs are constructed following some simple rules:
- If the relationship between A and B is n - 1, then valid JOIN path is from n to 1 (
A LEFT JOIN Bor
B RIGHT JOIN A)
- If the relationship is 1 - 1, the path is valid both ways (
A LEFT JOIN Bor
B LEFT JOIN Aare OK) so we apply FULL JOIN in this case.
Why do I sometimes see 'SELECT DISTINCT' in the SQL?
In some cases, you will see we apply SELECT DISTINCT in the underlying query for two reasons.
First, this is part of our mechanism to avoid fan-out issue. For more information, please refer to our document about fan-out issue
Second, we see no value to display all records (even duplicated records) so we have applied SELECT DISTINCT to prevent showing redundant data. Our aim is to provide dataset's explorers an overview of their data.