In Holistics, a Dataset is a "container" holding several data models together so they can be explored together, and dictating which join path to be used in a particular analytics use case.
In other words, Dataset is like a mini data mart that enables two things:
- Data Exploration: Dataset can be shared to Explorers (non-technical users) to do self-service exploration of the data.
- Creating Charts: All Charts in Holistics have to be created from a dataset. This is done either by the Analyst or the Explorer
UI of a dataset exploration
Creating a dataset
To create a Dataset, you can go to Datasets section under Reporting. From here you can either create datasets directly or create a folder to organize your datasets.
First, start with selecting all the Data Models that you want to add to your Dataset. Then, activate or deactivate relationships that enable the join paths you need.
In this example, since model Users, Orders and Order Items have relationships with each other, if you choose Order Items as root model, you will be able to select Users and Orders as peripheral models.
You also have the option to create a Dataset directly from Data Models to explore it
Explore a dataset
With Dataset, business users can ask and answer questions themselves without relying on the analyst to write SQL at every step in their exploration.
If you want to start fresh, simply go to the Dataset section on Reporting page and click on a Dataset. You will be presented with the Explore screen where you can start dragging in fields, measures, apply conditions, and tweak the visualizations.
When you are satisfied with the exploration result, you can save it as a widget in a dashboard.
You can also explore a report / dashboard widget's result by clicking on it and choose "Explore Data". For more details, please refer to Explore Data
Share a Dataset
Just like with your reports, dashboard or KPI metric sheets, you can share Datasets to specific users or groups. Sharing a folder will automatically share all items inside.
Relationships in Dataset
Combining data from different models
In an exploration, users can drag fields from different data models in a single explore. Based on the relationships metadata, Holistics figures out the correct JOINs statements to apply to the SQL query.
Example: Given a dataset created from the following models:
You want to calculate: "Total orders broken down by countries". You will need access to
Simply drag in Country Name and Total Orders field (from 2 different models):
The generated query will be:
Ambiguity in join paths
Sometimes, based on the relationships metadata, there are more-than-once possible paths to combine fields between two models.
Example: you have some models with relationships as below
To answer the question "Total order values placed by countries", you drag in Country Name from
countries and Orders Count from
orders. As you can see, there are two ways to go from
- A: countries → cities → users → orders → order items
- B: countries → cities → merchants → products → order items
Each JOIN path will produce a different result with different meanings (total values by users' countries vs. by merchant's origin countries.)
To prevent this, when creating datasets, Holistics automatically detects and disable relationships that introduce ambiguity in the JOIN paths. However, you could also browse the Relationship list and adjust your active relationship as your needs.
Duplicating into 2 different models: In these situation, it is also recommended that you duplicate Cities and Countries models into dedicated models for Users and Merchants. This makes your dataset clearer to the end-user:
Which JOIN types (left, right, full) does Holistics use?
The JOINs are constructed following some simple rules:
- If the relationship between A and B is n - 1, then valid JOIN path is from n to 1 (
A LEFT JOIN Bor
B RIGHT JOIN A)
- If the relationship is 1 - 1, the path is valid both ways (
A LEFT JOIN Bor
B LEFT JOIN Aare OK) so we apply FULL JOIN in this case.
Why do I sometimes see 'SELECT DISTINCT' in the SQL?
In some cases, you will see we apply SELECT DISTINCT in the underlying query for two reasons.
First, this is part of our mechanism to avoid fan-out issue. For more information, please refer to our document about fan-out issue
Second, we see no value to display all records (even duplicated records) so we have applied SELECT DISTINCT to prevent showing redundant data. Our aim is to provide dataset's explorers an overview of their data.
Show items with no data
By default, Holistics will not present items with no data when exploring dataset.
For example, you have 2 models users and orders, the relationship between them is 1 to many. If you want to understand how many orders each user has, you will add in 2 fields:
count(orders.id). However, we will exclude users that have not made any purchases.
In order to include items with no data, you can toggle on the option
Include empty children rows under
Others in our Visualization Settings. Please refer to the video below: