Skip to main content

Job Queue System and Workers

What is a Worker/Concurrent Worker?

A worker (or concurrent worker) is an actor that actively processes jobs pushed into the queue. It sequentially handles jobs in the queue, with an available worker picking up the next job and processing it. Upon completion, the worker releases the job and proceeds to the next one.

How does it work?

In Holistics, when a user opens a report, we construct an SQL query sent to the customer's data warehouse, wait for it to finish, and visualize the results.

Since the analytical SQL queries take time (seconds to minutes), it is usually not a good idea to handle this using synchronous web requests. A more scalable solution is to use a background job queue system.

A typical flow would look like:

  1. When a user views a report, a job is created and pushed into a job queue.
  2. A worker picks up the job, constructs the SQL queries, and then runs them against the customer’s data warehouse
  3. Once the query is finished, the result set is visualized and presented to the user’s browser.

What kind of actions will create a job?

Usually actions that involve running a SQL against the customer’s data warehouse:

  • Users viewing dashboards
  • Email schedules triggered
  • Etc.

Why are Concurrent Workers important?

In an extreme scenario, with 20 users accessing 100 charts simultaneously, the Holistics application, without control, would generate 2000 database queries to the customers' database, potentially causing a crash, especially for a production database.

Holistics workers actively manage concurrent database queries by limiting the customer to 5 workers. This ensures that no more than 5 queries run simultaneously, with others queued up.

Therefore, increasing Concurrent Workers improves the querying process for both you and your customers. As your business scales, being charged based on Concurrent Workers is more cost-effective than the number of visualizations processed.

Job Queues

Type of Job Queues

Each Holistics customer has their job queue and workers. This ensures one customer overloading the job queue will have zero to little effect on other customers’ systems.

Furthermore, depending on the nature of the job, it will be classified into different queues (or pools). For example, a Report job runs in a different queue than a Data Transform job.

Default slots for specific job queues

Below is the default list of job queues and their default worker count. This is a soft limit, which means that it can be increased by purchasing more workers.

QueueDefault SlotAction included
Default201. Create/Update Custom Field
2. Refresh Models and Dependant Models
Adhoc Query51. Adhoc SQL executions
2. Dataset explorations
Filter31. Filter suggestion
2. Process filter in Dashboard
Report15Execute report/widget
Prefetch121. Prefetch Filter Cache
2. Preload Dashboard
Preview31. Validate Data Import
2. Preview Report/Query (Holistics Version < 3.0)
Export101. Export Dashboard
2. Export Dashboard Widget/Report
Email Schedule2Executing schedule (Email, Slack, SFTP, Google Sheet)
Data Source151. Test Data Source connection
2. Synchronize database schema
Data Import (Version 3.0 and below)2Executing Data Import
Data Transform2Executing Data Transform (or Storage Settings)
Validate51. Validate Table Structure in Data Transform
2. Validate Query in Data Transform
3. Preview Data Transform
Embed Analytics QueueDefault SlotAction included
Embed01. Execute Embedded Dashboard Widget/Report
2. Export Embedded Dashboard
3. Export Embedded Dashboard Widget/Report

If you want to enable our Embedded Analytics feature, please refer to our doc about Embedded Analyticsfor more information.

Your account’s configuration might be different from the default above. Please contact us by sending an email to [email protected] to find out your current setup.

Do note that the Embedded Analytics feature utilizes a special type of worker called Embed Worker. They are separate and can be manually adjusted from the Embed Analytics Manager.

Life Cycle of a Job

New Job Statuses

We have rolled out new Job Statuses to make them more intuitive.
Please refer to this Community post for more details.
Note that Holistics APIs still use the old Job statuses (created and queued).

StatusDescriptionAPI value
PendingThis job is waiting for an available job worker in your workspace.created
StartingThis job is done waiting (queuing) and being picked up (started) by an available job worker. It is going to be executed shortly.queued
RunningThis job is being executed by a job worker.running
Success If the job runs successfully, it will have success status.success
Failure If the job runs unsuccessfully, it will have failure status.failure
Cancelling While a job is running, if you manually cancel the job, it will have cancelling status.canceling
CancelledIf the job is cancelled successfully, it will have cancelled status.cancelled
ExistedWhen a job have this status, it means that this job coincides with the another existing Pending/Starting/Running job.
(See Job de-duplication)
already_existed

Monitoring

info

To monitor your Holistics Jobs and Job Workers in real-time, please head to Job Monitoring.

FAQs

Can we reallocate some or all of the internal workers to be embedded workers?

Our core business model revolves around internal self-service analytics, with embedded analytics serving as a complementary add-on. We have not, from a commercial perspective, accommodated the transfer of workers or focused on supporting embedded dashboards.

Additionally, you can also configure up to 8 concurrent workers for your embedded dashboards on your current plan. This can be a practical way for you to assess their effectiveness. By doing so, you'll ensure sufficient spare capacity, preventing customers from waiting for workers to be freed up when using the dashboard concurrently.

How much time will it take to process these queries?

Outside of concurrency, our workers typically do not act as the bottleneck for data loading time, with negligible impact. They do not process or compute any data; instead, they wait for your query results to be returned from your database before visualizing it in your browser.

The processing time for these queries depends on your database's performance and the query's complexity, cost, or runtime.

For long-running queries, Holistics enables you to set up materialized views to automatically persist the query results (Transform Model) physically in your data warehouse. This feature speeds up query time, allowing users to access results from the physical table when querying a dataset or dashboard with the persisted model, rather than running the query at the time of access.

info

You can get real-time data on how your jobs are performing within the Holistics app.


Let us know what you think about this document :)