Dataset is a collection of Data Models that share the same interest and have relationships with one another. It is similar to a set of database tables with join keys and join paths pre-defined by the analyst. Technically, a Dataset is a tool that generate SQL queries base on the dimensions, measures and conditions that you selected.
In SQL, joining order between tables determines the result you receive:
In SQL reporting, you have to define JOIN paths manually. In other words, with every query, you only access a subset of tables/models at your disposal, and for every slightly different reporting needs, you will need to rewrite the logic every time.
With Holistics, you do not have to manually predefine JOIN paths - you just need to define the relationships between models (join on which field, is it 1 - 1 or n - 1 relationship...), drag in your required fields and Holistics's engine will generate valid JOIN paths for you.
For example, we have a dataset created from the following models:
Let's say you are the country manager and want to calculate "Total orders placed by countries", so you will need access to
Simply drag in Country Name and Total Orders field, and the following query will be generated:
SELECT T3."name" AS "name", count(1) AS "total_orders" FROM "ecommerce"."orders" T0 LEFT JOIN "ecommerce"."users" T1 ON T0."user_id" = T1."id" LEFT JOIN "ecommerce"."cities" T2 ON T1."city_id" = T2."id" LEFT JOIN "ecommerce"."countries" T3 ON T2."country_code" = T3."code" GROUP BY 1 ORDER BY 2 DESC
The paths are constructed following some simple rules:
- If the relationship between A and B is n - 1, then valid JOIN path is from n to 1 (
A LEFT JOIN Bor
B RIGHT JOIN A)
- If the relationship is 1 - 1, the path is valid both ways (
A LEFT JOIN Bor
B LEFT JOIN Aare OK) so we apply FULL JOIN in this case.
During the creation of Dataset, Holistics will automatically detect and disable relationships that introduce ambiguity in the JOIN paths. However, you could also browse the Relationship list and adjust your active relationship as your needs.
In some cases, you will see we apply SELECT DISTINCT in the underlying query for 2 reasons.
First, this is part of our mechanism to avoid fan-out issue. For more information, please refer to our document about fan-out issue
Second, we see no value to display all records (even duplicated records) so we have applied SELECT DISTINCT to prevent showing redundant data. Our aim is to provide dataset's explorers an overview of their data.
Let's say for example, you have a model/table recording each and every customers' orders
If you explore
status field in this
order table, a single SELECT querry will return
We believe that there is no analytical benefit by showing all the duplicated records. And instead, we will SELECT DISTINCT in order to help you in getting the overview of your data
Ambiguity in paths means that there are many possible paths to combine fields between two models.
For example, you have some models with relationships as below:
To answer the question "Total order values placed by countries", you drag in Country Name from
countries and Orders Count from
orders. As you can see, there are two ways to go from
- A: countries → cities → users → orders → order items
- B: countries → cities → merchants → products → order items
Each JOIN path will produce a different result with different meanings (total values by users' countries vs. by merchant's origin countries.)
To prevent this, Holistics will automatically deactivate the relationship between
merchants, and path B will be disabled
It is recommended that you duplicate Cities and Countries models into dedicated models for Users and Merchants. This makes your dataset clearer to the end-user:
To create a Dataset, you can go to Datasets section under Reporting. From here you can either create datasets directly or create a folder to organize your datasets.
First, start with selecting all the Data Models that you want to add to your Dataset. Then, activate or deactivate relationships enable the paths you need.
In this example, since model Users, Orders and Order Items have relationships with each other, if you choose Order Items as root model, you will be able to select Users and Orders as peripheral models.
You also have the option to create a Dataset directly from Data Models to explore it
With Dataset, business users can ask and answer questions themselves without relying on the analyst to write SQL at every step in their exploration.
If you want to start fresh, simply go to the Dataset section on Reporting page and click on a Dataset. You will be presented with the Explore screen where you can start dragging in fields, measures, apply conditions, and tweak the visualizations.
When you are satisfied with the exploration result, you can save it as an individual report or a widget in a larger dashboard. Saved reports can also be brought into a dashboard as a widget just like any other SQL report.
To learn more about Visualizations, please refer to the Visualizations page.
You can also explore a report / dashboard widget's result by clicking on it and choose "Explore Data". For more details, please refer to Explore Data
Just like with your reports, dashboard or KPI metric sheets, you can share Datasets to specific users or groups. Sharing a folder will automatically share all items inside.
Updated about 18 hours ago
Forget what relationship is? Checkout the following page: