Creating a task to Monitor Critical Jobs

About this task

You can use this task to retrieve all the jobs that were marked as Critical during their job stream creation. If it is critical that a job must be completed before a specific time, you can flag it as critical when you add it to a job stream using the Workload Designer. Jobs can also be flagged as critical by including the critical keyword to the job statement when you create or modify a job stream using the composer command line.

For more information about this, see the HCL Workload Automation User's Guide and Reference.

You can then use this list of critical jobs to control them, ensuring that nothing prevents them from completing on time.

For each critical job listed, you are also provided with information about the confidence factor and the related estimated end time of the critical job. The confidence factor is expressed as a percentage and indicates the confidence with which a critical job will finish running within its deadline. The confidence factor is calculated as the normal cumulative density function using a Gaussian function, where the estimated end time is the mean and the estimated end variance is the standard deviation. When a job completes, the confidence factor is set to 0% when the job exceeded its deadlines, and is set to 100% when the deadline was not exceeded.

Note: This task can only be run against the current plan and only on a master workstation.

Starting from the list of critical jobs, you can drill down and take actions on their predecessors (internal and external), which might be located faraway in the whole critical job network.

You can retrieve the following lists of predecessors to act on them (for example, by releasing dependencies or answering prompts) if they compromise the critical job success:
Critical Path
Critical job predecessors with the least slack time (delay allowed to let the critical job complete on time).
Hot List
The hot list contains a subset of critical predecessors that can cause a delay of the critical job because they are in such states as error, late, fence (for distributed systems only), suppressed (for distributed systems only) or long duration. If these jobs do not complete successfully on time, they prevent the critical job from completing on time. In the hot list view, you can quickly see which jobs need you to take appropriate recovery actions. Jobs included in the Hot List are not necessarily included in the Critical Path.

To create a Monitor Critical Jobs task, perform the following steps.

Note: For all the details about options and fields displayed in the panels, see the online help by clicking the question mark located at the top-right corner of each panel.

Procedure

  1. In the navigation bar, click System Status and Health > All Configured Tasks > New.
  2. In the Create Task panel, under Monitor Task, select Monitor Critical Jobs and click Next.
  3. In the Enter Task Information panel, define the type of scheduler engine here you want to run the task. You can select an engine at a later time. Remember that the engine name must be specified before running the task. Depending on the engine type you choose, the filtering criteria and the results you can display are different. You can also specify whether to share the task with others, to allow them to see and run the task, but not to modify it. Task and engine sharing can be disabled by the TWSWEBUIAdministrator in the global settings customizable file.
  4. Click Next to proceed with task creation or click Finish to complete the creation using the default values and exit without proceeding to the following steps. If you are editing an existing task, properties are organized in tabs.
  5. In the General Filter section, specify some broad filtering criteria to limit the results retrieved by your query. Here you start refining the scope of your query by also considering the amount of information you want to retrieve. Optionally, in some of the results tables in the Periodic Refresh Options section, you can customize how often to refresh the information by specifying the refresh interval in seconds in hh:mm:ss format, with a minimum of 30 seconds and a maximum of 7200 seconds. For example, 00:01:10 means 70 seconds. If the value specified is not valid, the last valid value is automatically used. If the periodic refresh is enabled for a task, when the task runs, the refresh time control options are shown in the results table. You can also set or change the periodic refresh interval directly in the results table when the timer is in stop status. In this case, the value specified at task creation time is temporarily overwritten. You can search for jobs based on their status, on the workstation where they run, or on the job streams they belong to. For example, you can look for all the jobs that have a specific priority level and a high risk of missing their deadlines.
    Note: The Monitor Critical Jobs task searches only for jobs that have been marked as critical.

    Depending on what you choose as Risk Level, one or more of the following alert levels is shown in the list of critical jobs:

    High risk icon
    Critical jobs at high risk. This icon means that the critical job estimated end is beyond the job deadline. If nothing changes, the critical job is going to miss its deadline. The critical job estimated end is dynamically recalculated.
    Potential risk icon
    Critical jobs at potential risk. This icon means that the critical job estimated end has not yet passed the job deadline . However, the critical job has some predecessors in late, long duration, or error state. For distributed systems, the late condition can also be due to priority, limit, or fence values that are preventing jobs from running. If nothing changes, there is the possibility that the critical job will miss its deadline.
    No risk icon
    Critical job is on track. If nothing changes, it will meet its deadline.
  6. In the Time Data Filter panel, specify a time range to limit your search to jobs or job streams that ran within a specific time period. If no date and time are specified, the jobs and job streams are not filtered based on their processing time.
  7. In the Columns Definition section, select the information you want to display in the table containing the query results. According to the columns you choose here, the corresponding information is displayed in the task results table. For example, for all the objects resulting from your query, you might want to see their statuses, the workstations where the ran, when they ran, and when they were scheduled to run. You can then drill down into this information displayed in the table and navigate it. In the Columns Definition section, not only can you select the columns for this task results, but you can also specify the columns for secondary queries on jobs, job streams, jobs in critical network, and workstations. Starting from the Monitor Critical Jobs results, you can launch further queries on secondary objects associated to the jobs in the table. The information you can retrieve with these queries is specified in this panel. One of these secondary queries retrieves the list of jobs in the critical network, which includes all the critical job predecessors. The critical path is part of the critical network. The columns set for the list of jobs in the critical network are displayed as details of all the critical jobs Predecessors, and in the Hot List and Critical Path views. All these views can be launched by using the corresponding buttons from the Monitor Critical Jobs table of results.
  8. In the All Configured Tasks panel, you can see the main details about the task that you have just created. You can also choose to run the task immediately. The task is now in the list of your tasks where you can open and modify it. You can find it in the task lists displayed by clicking the following options: System Status and Health > All Configured Tasks or Workload Monitoring > Monitor Critical Jobs.

Results

You have created your task that, when run, creates a list of jobs satisfying your filtering criteria and showing, for each job in the list, the information contained in the columns you selected to view.

You can find a workload service assurance scenario in the HCL Workload Automation User's Guide and Reference about using this feature to monitor critical jobs.

From the list of results, you can display a timeline, by clicking Expand Timeline, that shows the placement of critical jobs along a horizontal time axis and highlights jobs with high risk level. Only critical jobs that have not completed are displayed in the timeline. It also allows you to modify the plan deadline and quickly see how it would affect jobs. If you see a job that is late or in high risk state, right-click the job in the list of results and select What-if from the table toolbar to open the What-if Analysis and view them in a Gantt chart for further investigation.
Note: The timeline is not supported on Internet Explorer, version 9.