Skip to main content

Troubleshoot data warehouse issues

Central Doc

Visit Performance Optimizations to see the central doc on Performance.

caution

Note that we do not cover SQL Optimization in this guide. Head over to Best practices to improve Holistics reporting performance for optimization tips.

This article provides general troubleshooting methods for customers who are encountering performance issues from bottlenecks in their data warehouses while using Holistics. It should be used in conjunction with Performance Troubleshooting for the best results.

Investigation Methods

Identify long-running jobs in Job Monitoring

See Job Monitoring.

Check if there are many failed jobs with the same errors

Description

If you have multiple failed jobs within a short timeframe, it may be because there has been a systemic failure from a common cause, most likely in the data warehouse layer. Some common issues are:

  • DB fails to write Persisted Models
  • DB out of Memory
  • DB connection refused
  • DB connection timed out
  • ...

Instructions

  1. Open Job Monitoring Dashboard.

  2. Check if there are many jobs failed with the same failures. For example, these jobs below failed because of issues from the Data Warehouse's side.

Check if CPU Utilization of your data warehouse is high

info

Not every data warehouse offers CPU Utilization monitoring tools. If yours does not support this feature, consider research on the Internet for workarounds on how to investigate this metric.

Description

If your database’s CPU Utilization is high, SQL queries may take a long time to process. Commands such as cancelling job are unable to be executed due to the database being unresponsive.

Common causes for high CPU Utilization in data warehouses are:

  • High number of logical reads due to:
    • Queries that lack indexes
    • Datawarehouse use outdated index statistics
    • Inefficient queries
  • Increase in workload

If your data warehouse has high CPU Utilization usage, it is necessary to identify what queries are causing this issue.

Instructions

  1. Depending on your data warehouse, you can select the appropriate guide to check the CPU Utilization metric:

  2. If from step 1, you frequently find this metric to be abnormally high, proceed to list out long-running queries or processes, especially during peak CPU usage.

  3. Send these queries to Holistics support.

Run EXPLAIN ANALYZE on slow/suspected queries

Description

EXPLAIN ANALYZE on a SQL query reveals the query plan of that query. It is useful to identify the bottlenecks of the query, i.e which parts take up a long time to complete.

Instructions

General Instructions
  1. In Holistics, get the generated SQL of your slow job

    Go to Job Monitoring Dashboard and find job logs of slow-running jobs. You should be able to see the generated SQL. Consider copy this SQL to re-run it for troubleshooting.

    Note that SQLs generated by Holistics jobs will have a /* - Job ID: xxxxxxxxxx* */ comment on the top.

  2. Append EXPLAIN ANALYZE to the query and run it against your data warehouse

    Copy the generated SQL from step 1 and append EXPLAIN ANALYZE to it. Then, run the whole query in your DB Engine. Send it to Holistics if you are not sure on how to interpret the result.

    To visualize the query plan and see which parts take up the most time, you can paste the query plan into a visualization tool like explain.depesz.com.

Snowflake-specific Instructions
  1. Follow the same instruction as step 1 of the General Instructions.

  2. Then, find the full query in Snowflake UI.

    • You can search for it in History Page using the SQL text parameters.
    • Or you can re-run the SQL in Holistics Editor.

    Then, go to Snowflake’s History Page and look for the latest query executed.

  3. Get detailed information about your SQL.

    info

    If you are sending troubleshooting information to Holistics, please include a Query Detail and all Query Profile screenshots on your email.

    Click on the SQL, which should take you to Query Detail view.

    Then, navigate to the Query Profile tab, which will show you in detail what operations get executed for each step. There can be many profile tabs, so be sure to check out all of them.

References

Mitigation methods

Cancel long-running jobs in Holistics

See Cancel long-running jobs.

Manually cancel jobs in your data warehouse

Description

For the jobs that cannot be cancelled using Job Monitoring Dashboard (i.e field suggestion jobs), you can cancel them manually from within your data warehouse.

Instructions

  1. Get the pid of the long-running processes. Please make sure to check the query column of the returned processes:
select * from pg_stat_activity
where query_start < now() - interval '1 hour'
  1. Cancel the long-running processes

select pg_cancel_backend(the_pid_collected_from_step_1)

Improvement methods

Add table indexing/partitioning/clustering keys where relevant

Description

Indices can greatly improve query speed and reduce high CPU Usage issue in your data warehouse.

Instructions

  1. Add the indexes where necessary based on the Query Plan.
  2. Run EXPLAIN ANALYZE again to see if the indexes have been properly applied.

Please consider carefully before adding indexes. It will boost READ operations speed, but slow down INSERT operations. There is also a need for additional disk space to store them. For more information, refer to this article.

Increase Data Warehouse computing resources

Depending on the Data Warehouse (DW) that you are using, there will be a variety of methods to scale the DW up and increase its computing resources (CPU, memory, etc.).

Here are a few of the methods:

Optimize your reports

Please refer to Tips to improve reporting performance.


Let us know what you think about this document :)