As users access reports (widgets), Holistics stores the data that's returned by queries in a cache. That way, if user interaction with the report results in a query that's already been issued, Holistics can try to read the data from the cache.
Query caching can optimize the performance of your dashboard when avoiding running the query against the database to get fresh query results. This can also reduce cost by reducing the overall number of queries.
How Holistics cache data
Holistics works by connecting directly to your SQL database instead of storing your raw data on our own server. When you open or access a report with specific filter parameters, Holistics will send report SQL query to the database, wait for the results and visualize them to the end users.
Behind the scene, once the query is sent to the database, the result of that query is cached. The duration of this cache is configurable (min 10 minutes, and can be configured). The generated SQL query is the cache key.
In short, the cache is generated on-demand. It means when someone first visits the dashboard, each widget’s result will be cached separately.
When a new query is executed, the cache is checked to see if the query is similar to the previous run. All filters and row limits must be the same for the cached data to be used.
- If the query has been executed before and within cache duration, the results will be retrieved from the Holistics cache instead of sending a fresh query to your database.
- Otherwise, Holistics runs the query against the database to get fresh database results (and those results are then cached).What this means is that the data sent to the subsequent users will only be as fresh as the time the first user access the data. Even so, the subsequent user can choose to force reload or change the filter parameters and Holistics will send a new query to your database.
Note that: When a filter is applied to the dashboard, a new query will be generated (based on the WHERE clause).
You can specify Caching Duration to control how long your result sets are stored in Holistics's cache server before expiring. After the specified time period, the cache will expire and is cleaned from Holistics's cache server.
In reports/dashboards, you can configure Cache Duration in the Cache Settings UI, which can be accessed from the Refresh button or Report/Dashboard Preferences.
Adjusting this parameter means balancing your needs for performance (or cost) and data freshness:
- Short cache duration gives you fresher data. But the cache expires faster. Users may occasionally suffer a long loading time if the cache already expired. This may also increase the cost (Query-run consumption) due to more queries will be sent to the database.
- Long cache duration yields shorter report loading time, but might give stale data.
Please note that currently, the caching
- Cannot be disabled
- The minimum interval is 10 minutes and cannot be set lower.
Walkthrough an example
Say that you have dashboard A with 8 charts (widgets) and no cached data. The cache duration is set to be 24 hours.
- The first time you open the dashboard (with no cache), all widgets will be run for the first time. That means:
- 8 queries will be sent to your SQL database for processing.
- Each widget's result will also be cached (for 24 hours).
- If you refresh the dashboard right after: Since the data is already in the cache, no new queries will be sent to your database.
- Now, if John visits the dashboard and modifies with filter A (for i.e: filter "Country = Vietnam”) at 9:00 am.
- Since there are no existing caches for the new queries, the new queries will be sent to your database for processing
- These new result sets will also be cached by Holistics (for 24 hours).
- The cache would expire at 9:00 am the next day.
- During this time, anyone who visits this dashboard A and uses the same filter A will see the cached data, no new queries will be sent to your database.
- Next, Amy visits dashboard A with filter C at 10:00 am.
- This doesn't use the cached data, because she uses a different filter value.
- Now, a query is triggered, the result is cached from 10 am -> 10 am the next day
To summarize, the cached data is generated on-demand when the 1st user visits the dashboard, then the countdown starts from this moment.
Vice versa, if no one visits the dashboard, then there's no cached data for that dashboard.
Forcing reload from the database
In reports/dashboards, when clicking on the Refresh button, end-users will see the time when that query was last updated (the time when the current result was cached). Clicking on the
Refresh Now will manually rerun the query and override the cached result with the new result.
Is the cache invalid when I change filter values?
Question: I notice that data caching is not very useful when you have filters in the widget/report. As soon as you change filter value query is running over and you have to wait again. What can we do here?
Answer: When the filter value changes, the underlying query changed thus invalidated the cache. What we can suggest is that you can use our Transform Model functionality to build pre-aggregated tables, so that the queries will only query a small aggregated result set instead of scanning the entire raw tables.
Is the cache invalid when I explore/edit the widget?
Question: I view Dashboard Widget A (the query gets sent to cache). After I click on Explore/Edit on Widget A, will this first run of the initial widget load be fetched from cache?
Answer: When exploring/editing, we are applying an upper limit of 100k rows. The limit is meant to make the preview faster. But it results in different queries when in view mode (no limit) and edit/explore mode (100k).