Troubleshoot data warehouse issues
Visit Performance Optimizations to see the central doc on Performance.
Note that we do not cover SQL Optimization in this guide. Head over to Best practices to improve Holistics reporting performance for optimization tips.
This article provides general troubleshooting methods for customers who are encountering performance issues from bottlenecks in their data warehouses while using Holistics. It should be used in conjunction with Performance Troubleshooting for the best results.
Investigation Methods
Identify long-running jobs in Job Monitoring
See Job Monitoring.
Check if there are many failed jobs with the same errors
Description
If you have multiple failed jobs within a short timeframe, it may be because there has been a systemic failure from a common cause, most likely in the data warehouse layer. Some common issues are:
- DB fails to write Persisted Models
- DB out of Memory
- DB connection refused
- DB connection timed out
- ...
Instructions
Open Job Monitoring Dashboard.
Check if there are many jobs failed with the same failures. For example, these jobs below failed because of issues from the Data Warehouse's side.
Check if CPU Utilization of your data warehouse is high
Not every data warehouse offers CPU Utilization monitoring tools. If yours does not support this feature, consider research on the Internet for workarounds on how to investigate this metric.
Description
If your database’s CPU Utilization is high, SQL queries may take a long time to process. Commands such as cancelling job are unable to be executed due to the database being unresponsive.
Common causes for high CPU Utilization in data warehouses are:
- High number of logical reads due to:
- Queries that lack indexes
- Datawarehouse use outdated index statistics
- Inefficient queries
- Increase in workload
If your data warehouse has high CPU Utilization usage, it is necessary to identify what queries are causing this issue.
Instructions
Depending on your data warehouse, you can select the appropriate guide to check the CPU Utilization metric:
If from step 1, you frequently find this metric to be abnormally high, proceed to list out long-running queries or processes, especially during peak CPU usage.
Send these queries to Holistics support.
Run EXPLAIN ANALYZE on slow/suspected queries
Description
EXPLAIN ANALYZE on a SQL query reveals the query plan of that query. It is useful to identify the bottlenecks of the query, i.e which parts take up a long time to complete.
Instructions
General Instructions
In Holistics, get the generated SQL of your slow job
Go to Job Monitoring Dashboard and find job logs of slow-running jobs. You should be able to see the generated SQL. Consider copy this SQL to re-run it for troubleshooting.
Note that SQLs generated by Holistics jobs will have a
/* - Job ID: xxxxxxxxxx* */
comment on the top.Append EXPLAIN ANALYZE to the query and run it against your data warehouse
Copy the generated SQL from step 1 and append EXPLAIN ANALYZE to it. Then, run the whole query in your DB Engine. Send it to Holistics if you are not sure on how to interpret the result.
To visualize the query plan and see which parts take up the most time, you can paste the query plan into a visualization tool like explain.depesz.com.
Snowflake-specific Instructions
Follow the same instruction as step 1 of the General Instructions.
Then, find the full query in Snowflake UI.
- You can search for it in History Page using the SQL text parameters.
- Or you can re-run the SQL in Holistics Editor.
Then, go to Snowflake’s History Page and look for the latest query executed.
Get detailed information about your SQL.
infoIf you are sending troubleshooting information to Holistics, please include a Query Detail and all Query Profile screenshots on your email.
Click on the SQL, which should take you to Query Detail view.
Then, navigate to the Query Profile tab, which will show you in detail what operations get executed for each step. There can be many profile tabs, so be sure to check out all of them.
References
Mitigation methods
Cancel long-running jobs in Holistics
Manually cancel jobs in your data warehouse
Description
For the jobs that cannot be cancelled using Job Monitoring Dashboard (i.e field suggestion jobs), you can cancel them manually from within your data warehouse.
Instructions
- Get the pid of the long-running processes. Please make sure to check the query column of the returned processes:
select * from pg_stat_activity
where query_start < now() - interval '1 hour'
- Cancel the long-running processes
select pg_cancel_backend(the_pid_collected_from_step_1)
Improvement methods
Add table indexing/partitioning/clustering keys where relevant
Description
Indices can greatly improve query speed and reduce high CPU Usage issue in your data warehouse.
Instructions
- Add the indexes where necessary based on the Query Plan.
- Run EXPLAIN ANALYZE again to see if the indexes have been properly applied.
Please consider carefully before adding indexes. It will boost READ operations speed, but slow down INSERT operations. There is also a need for additional disk space to store them. For more information, refer to this article.
Increase Data Warehouse computing resources
Depending on the Data Warehouse (DW) that you are using, there will be a variety of methods to scale the DW up and increase its computing resources (CPU, memory, etc.).
Here are a few of the methods:
- Redshift:
Optimize your reports
Please refer to Tips to improve reporting performance.