Holistics Expression

Definition​

Holistics Expression is our propriety language that allows you to specify data definition, transformation, calculations,... in a database-agnostic way. In other words, it abstracts away the little differences between SQL flavors and lets you focus on what really matters: your data logic.

Why Holistics Expression is necessary​

Firstly, in comparison to SQL, Holistics Expression serves more complex business use cases. For example, you can:

• Create a period-over-period growth percentage
• Analyze cumulative sum from the selected range
• Create custom fields combining data from multiple models instead of just the current model
• Apply Column Level Permission easily
• And so much more

Next, if you are working with multiple databases with different SQL dialects, Holistics Expression will help you to create dimension/measure flexibly with unified syntax that functions properly with all of your databases (and of course, with our system as well).

Where to use Holistics Expression​

Holistics Expression aims to support both Data Analysts and Explorers to efficiently create calculations thus fulfilling their analytics needs. However, while Analysts work primarily in the modeling layer and prepare reusable measures/calculations, Explorers, on the other hand, view prepared reports and only explore or create calculations on-demand, so that Holistics has introduced two places where Holistics Expression can be created and used.

Business Calculations can be created when exploring Dataset or Reports and Explorers are its target users. You can imagine it's like calculating 2 or multiple fields in an Excel Sheet. Its formula can be adjusted easily right in Dataset Exploration View.

Model Field Expression​

In contrast, Model Field Expression can only be created in a specific model and used when exploring the related model in Dataset. The formula of Model Field Expression cannot be modified in Dataset Exploration View.

How Holistics Expression works​

Holistics Expression provides a unified syntax against different SQL dialects for defining any measure (metric) on top of a set of models inside a dataset. For a simple measure which only involves a single model, a measure expression work just like an aggregate function in SQL. Behind the scene, when you use a measure in a dataset, it's translated to a simple SQL expression and put into the final query.

With more complicated measures that involve more models, since we know what you are computing, which models and relationships are involved, we can dynamically prepare these dependencies in different parts of the final query with respect to correctness and performance.

Supported Functions​

The functions can be divided into a few basic categories:

• Aggregator Functions: take values from multiple rows to perform a calculation
• Logical Functions: return value based on some logical conditions
• Dealing with Nulls and Zeros functions
• Time Intelligence Functions: Date- and time-related functions
• (Not Available) Filter Funtions: filter expressions that are applied to a measure calculation

Aggregator Functions​

FunctionsSyntaxPurpose
countcount(field)Counts the total number of items in a group, not including NULL values
count_distinctcount_distinct(field)Counts the total number of distinct items in a group, not including NULL values.
averageaverage(field)Averages the values of items in a group, not including NULL values.
minmin(field)Computes the item in the group with the smallest numeric value.
maxmax(field)Computes the item in the group with the largest numeric value.
sumsum(field)Sums the total number of items in a group, not including NULL values.
medianmedian(field)Computes the median of an expression, which is the value that the values in the expression are below 50% of the time.
stdevstdev(field)Returns the standard deviation (sample) of the column created by expression unless expression defines a column of lists, in which case returns the standard deviation (sample) of each list.
stdevpstdevp(field)Returns the standard deviation (population) of the column created by expression unless expression defines a column of lists, in which case returns the standard deviation (population) of each list.
varvar(field)Returns the variance (sample) of the column created by expression unless expression defines a column of lists, in which case returns the variance (sample) of each list.
varpvarp(field)Returns the variance (population) of the column created by expression unless expression defines a column of lists, in which case returns the variance (population) of each list.
running totalShows how a metric has changed over time. This function can only be used in Dataset Exploration UI.

Logical Funtions​

FuntionsSyntaxPurpose
case whencase(when: condition_expression, then: value_expression, else: value_expression)goes through conditions and returns a value when the first condition is met (like an IF-THEN-ELSE statement).
andand(condition_expression, ...)compares between two Booleans as expression and returns true when both expressions are true.
oror(condition_expression, ...)compares two Booleans as expression and returns true when one of the expressions is true.
notnot(field_expression)takes a single Boolean as an argument and invert it.
isis(field_expression)evaluates the given statement and return either True or False.
inin(field_expression, value_expression, value...)takes a field expression and a list of values. Return true if that list of values contains the value of that field expression.

Dealing with Nulls and Zeros functions​

FuntionsSyntaxPurpose
coalescecoalesce(val1, val2, ...., val_n)returns the first non-null value in a list
nullifnullif(expr1, expr2)returns NULL if two expressions are equal, otherwise it returns the first expression.
safe_dividesafe_divide(val1, val2)Equivalent to the division operator (X / Y), but returns NULL if an error occurs, such as a division by zero error.

Time Intelligence Functions​

danger

Currently, Holistics only supports epoch funtion. Other functions are not available.

FunctionsSyntaxPurpose
epochepoch(date); epoch(datetime)Returns a Unix timestamp which is the number of seconds that have elapsed since ‘1970-01-01 00:00:00’ UTC
date_truncdate_trunc(datetime, time_col: 'year', 'quarter', 'month', 'week', 'day', 'hour', 'minute')Truncate a TIMESTAMP on secific date part
nownow()Returns current timestamp
dayday()Extract the day from a given timestamp
monthmonth()Extract the month from a given timestamp
yearyear()Extract the year from a given timestamp
quarterquarter()Extract the quarter from a given timestamp
weekweek()Extract the week from a given timestamp
hourhour()Extract the hour from a given timestamp
minuteminute()Extract the minute from a given timestamp
week_dayweek_day()Return the day number of a specific date (within a week)
week_numweek_num()Return the week number of a specific date (within a year)

Please refer to Reference section on the left for more information.

FAQs​

Holistics Expression is case sensitive​

Question: Can I use CASE(WHEN:...,THEN:...,ELSE:...), AND(), OR(),...

Answer: No, Since Holistics Expression is case sensitive and we don't support capitalized letters in our Expression so the exact needs to be followed

• case(when:...,then:...,else:...)
• and()
• or()
• ...

How to create calculation with only a subset of my current data​

Question: How can I create a calculation with only a subset of my current data (using condition inside an aggregate function).
For example, from the eCommerce dataset, what if I want to calculate the total value from the delivered orders only (exclude all other cancelled and refunded orders)

Answer: Since Filter function inside an Aggregate function is currently not supported, we recommend that at this moment, you can combine measure function with case when inside to calculate the data with any specific condition being applied.

sum(  case(    when: order_derived.order_status == 'delivered'    , then: order_derived.item_value    , else: null  ))

How to handle error Divide by 0​

Question: When doing division in Business Calculation (field_a/field_b), sometimes I encounter division by zero error which is obviously because my Divisor = 0. How should I handle this case?

Answer: There are 2 ways to handle this case:

Option 1: Use safe_divide syntax.

Option 2: You can add conditional expression in your divisor to return NULL whenever it has the value of 0

sum(model.field_a)/case(  when: sum(model.field_b) == 0    , then: null    , else: sum(model.field_b))

You can add comments using // syntax:
sum(  // only take delivered orders  case(    when: order_derived.order_status == 'delivered'    , then: order_derived.item_value    , else: null  ))