Skip to main content

Relationships in Dataset

Combining data from different models

In an exploration, users can drag fields from different data models in a single explore. Based on the relationships metadata, Holistics figures out the correct JOINs statements to apply to the SQL query.

Example: Given a dataset created from the following models: orders, users, merchants, cities, countries

You want to calculate: "Total orders broken down by countries". You will need access to countries.name and orders.total_orders fields.

Simply drag in Country Name and Total Orders field (from 2 different models):

The generated query will be:

SELECT
T3."name" AS "name",
count(1) AS "total_orders"
FROM
"ecommerce"."orders" T0
LEFT JOIN "ecommerce"."users" T1 ON T0."user_id" = T1."id"
LEFT JOIN "ecommerce"."cities" T2 ON T1."city_id" = T2."id"
LEFT JOIN "ecommerce"."countries" T3 ON T2."country_code" = T3."code"
GROUP BY 1
ORDER BY 2 DESC

Ambiguity in join paths

Sometimes, based on the relationships metadata, there are more than one possible paths to combine fields between two models.

Example: you have some models with relationships as below

To answer the question "Total order values placed by countries", you drag in Country Name from countries and Orders Count from orders. As you can see, there are two ways to go from countries to orders:

  • A: countries → cities → users → orders → order items
  • B: countries → cities → merchants → products → order items

Each JOIN path will produce a different result with different meanings (total values by users' countries vs. by merchant's origin countries.)

To prevent ambiguity, when creating datasets, Holistics automatically detects and disables ambiguous relationships in the JOIN paths. However, you could also browse the list of relationships and adjust your active relationship according to your needs.

Duplicating into 2 different models: In these situations, it is also recommended that you duplicate Cities and Countries models into dedicated models for Users and Merchants. This makes your dataset more clear to the end-user:

Which JOIN types (left, right, full) does Holistics use?

The JOINs are constructed following some simple rules:

  • If the relationship between A and B is n - 1, then valid JOIN path is from n to 1 (A LEFT JOIN B or B RIGHT JOIN A)
  • If the relationship is 1 - 1, the path is valid both ways (A LEFT JOIN B or B LEFT JOIN A are OK) so we apply FULL JOIN in this case.

Why do I sometimes see 'SELECT DISTINCT' in the SQL?

In some cases, you will see we apply SELECT DISTINCT in the underlying query for two reasons.

First, this is part of our mechanism to avoid fan-out issue. For more information, please refer to our document about fan-out issue

Second, we see no value to display all records (even duplicated records) so we have applied SELECT DISTINCT to prevent showing redundant data. Our aim is to provide dataset's explorers an overview of their data.


Let us know what you think about this document :)