Processing and monitoring critical jobs

Automatic tracking and prioritizing of critical network jobs.

Workload service assurance provides automatic tracking and prioritizing of critical network jobs and online functions that allow you to monitor and intervene in the processing of critical network jobs.

Automatic tracking and prioritizing

To ensure that critical deadlines can be met, workload service assurance provides the following automated services for critical jobs and for predecessor jobs that form their critical networks:

Promotion

When the critical start time of a job is approaching and the job has not started, the promotion mechanism is used. A promoted job is assigned additional operating system resources and its submission is prioritized.

The timing of promotions is controlled by the global option promotionoffset. Promoted jobs are selected for submission after jobs that have priorities of "high" and "go", but before all other jobs. Prioritizing of operating system resources is controlled by the local options jm promoted nice (UNIX and Linux) and jm promoted priority (Windows).

Calculation of the critical path

The critical path is the chain of dependencies, leading to the critical job, that is most at risk of causing the deadline to be missed at any given time. The critical path is calculated using the estimated end times of the critical job predecessors. Working back from the critical job, the path is constructed by selecting the predecessor with the latest estimated end time. If the actual end time differs substantially from the estimated end time, the critical path is automatically recalculated.

Figure 1 shows the critical path through a critical network at a specific time during the processing of the plan.

Figure 1. Critical path

At this specific time, the critical path includes Job3a, Job2a, and Job1a. Job3a and Job3b are the immediate predecessors of the critical job, job4, and Job3a has the later estimated end date. Job3a has two immediate predecessors, Job2a and Job_y. Job2a has the later estimated end time, and so on.

Addition of jobs to the hot list

Jobs that are part of the critical network are added to a hot list that is associated to the critical job itself. The hot list includes any critical network jobs that have a real or potential impact on the timely completion of the critical job. Jobs are added to the hot list for the one or more of the following reasons. Note that only the jobs that begin the current critical network, for which there is no predecessor, can be included in the hot list.

The job has stopped with an error. The length of time before the critical start time is determined by the approachingLateOffset global option.
The job has been running longer than estimated by a factor defined in the longDurationThreshold global option.
The job has still not started, although all its follows dependencies have either been resolved or released, and at least one of the following conditions is true:
- The critical start time has nearly been reached.
- The job is scheduled to run on a workstation where the limit is set to zero.
- The job belongs to a job stream for which the limit is set to zero.
- The job or its job stream has been suppressed.
- The job or its job stream currently has a priority that is lower than the fence or is set to zero.

Setting a high or potential risk status for the critical job

A critical job can be set to the following risk statuses:

High risk: Calculated timings show that the critical job will finish after its deadline.
Initially, estimated start and end times are used. As jobs are completed, timings are recalculated to take account of the actual start and end times of jobs.
Potential risk: Critical predecessor jobs have been added to the hot list.

Online tracking of critical jobs

The Dynamic Workload Console provides specialized views for tracking the progress of critical jobs and their predecessors. You can access the views from the following sources: Start of change

Workload Dashboard: dedicated widgets to monitor the critical status: high risk, no risk, potential risk
Monitor Critical Jobs tasks: lists all critical jobs for a selected engine with the possibility to run actions against the results. View jobs with a high risk level along a horizontal time axis.
What-if analysis view: from a Gantt view, highlight the critical path and show the impact on critical jobs.

The list of results produced by the Monitor Critical Jobs query displays all critical jobs for the engine, showing their status: normal, potential risk, or high risk. From this view, you can navigate to see:

The hot list of jobs that put the critical deadline at risk.
The critical path.
Details of all critical predecessors.
Details of completed critical predecessors.
Job logs of jobs that have already run.

Using the views, you can monitor the progress of the critical network, find out about current and potential problems, release dependencies, and rerun jobs.

The Monitor Critical jobs view provides a timeline, Expand Timeline, which displays the placement along a horizontal time axis of the jobs in the list and highlights jobs with high risk level. Only critical jobs that have not completed are displayed in the timeline. It also allows you to modify the plan deadline and quickly see how it would affect jobs. If you see a job that is late or in high risk state, right-click the job in the list of results and select What-if from the table toolbar to open the What-if Analysis and view them in a Gantt chart for further investigation.