Using workload service assurance
Workload service assurance is an optional feature that provides the means to flag jobs as mission critical for your business and to ensure that they are processed in a timely manner. Using this function benefits your scheduling operations personnel by enhancing their ability to meet defined service levels.
When the workload service assurance feature is enabled, you can flag jobs as mission critical and ensure they have an associated completion deadline specified in their definition or at submission. Two additional threads of execution, Time Planner and Plan Monitor, that run within WebSphere Application Server, are thereafter engaged to make sure that the critical jobs are completed on time.
Defining a critical job and its deadline triggers the calculation of the start times of all the other jobs that are predecessors of the critical job. The set of predecessors of a critical job make up its critical network. This might include jobs from other job streams. Starting from the critical job's deadline and duration, Time Planner calculates its critical start time, which is the latest starting time for the job to keep up with its deadline. Moving backwards from the critical start time it calculates the latest time at which each predecessor within the critical network can start so that the critical job at the end of the chain can complete on time.
While the plan runs, Plan Monitor constantly checks the critical network to ensure that the deadline of the critical job can be met. When changes that have an impact on timings are made to the critical network, for example addition or removal of jobs or follows dependencies, Plan Monitor requests Time Planner to recalculate the critical start times. Also, when a critical network job completes, timings of jobs that follow it are recalculated to take account of the actual duration of the job.
Within a critical network, the set of predecessors that more directly risk delaying the critical start time is called critical path. The critical path is dynamically updated as predecessors complete or their risk of completing late changes.
The scheduler (batchman) acts automatically to remedy delays by prioritizing jobs that are actually or potentially putting the target deadline at risk, although some conditions that cause delays might require operator intervention. A series of specialized critical job views, available on the Dynamic Workload Console, allows operators to browse critical jobs, display their predecessors and the critical paths associated with them, identify jobs that are causing problems, and drill down to identify and remedy problems.
For detailed information, see:
- Enabling and configuring workload service assurance
- Planning critical jobs
- Processing and monitoring critical jobs
- Workload service assurance scenario
For information about troubleshooting and common problems with the workload service assurance, see the Workload Service Assurance chapter in HCL Workload Automation: Troubleshooting.