HCL Workload Automation, Version 9.4

Apache Oozie jobs

An Apache Oozie job defines, schedules, monitors, and controls the execution of Oozie workflows and Hadoop jobs like: MapReduce, Pig, Hive, and Sqoop.

Prerequisites

Before you can define Oozie jobs, you must create the HCL Workload Automation agent connection to the Oozie server.

Oozie job definition

A description of the job properties and valid values are detailed in the context-sensitive help in the Dynamic Workload Console by clicking the question mark (?) icon in the top-right corner of the properties pane.

For more information about creating jobs using the various supported product interfaces, see Defining a job.

The following table lists the required and optional attributes for Oozie jobs:

Table 1. Required and optional attributes for the definition of an Oozie job
Attribute	Description and value	Required
Connection attributes
hostname	The host name of the Oozie server.	If you do not specify the hostname attribute, then hostname, protocol, and VerifyCheckbox attributes are read from the properties file.
port	The port number where the Oozie server is listening.
protocol	The protocol for connecting to the Oozie server. Supported values are http and https.
userName	The user to be used for accessing the Oozie server.
password	The password to be used for accessing the Oozie server.
keyStore FilePath	The fully qualified path of the keystore file containing the private key that is used to make the connection.
keyStore Password	The password that protects the private key and is required to make the connection.	Required only if you specify a keystore file path.
HostnameVerify Checkbox	Supported values are true and false. When the value is true, the syntax of the Oozie server name, as featured in the keystore file, must match exactly the URL. If they do not match, no authorization is granted to access the server. When the value is false, the control on the server name is not enforced.
NumberOfRetries	The number of times the program retries in case of connection failure. Default value is 0.
RetryIntervalSeconds	The number of seconds the program waits before retrying in case of connection failure. Default value is 30.
Action attributes for all the job types
nodeName	The URL of the Hadoop name-node.	✓
jobTracker	The URL of the Hadoop job-tracker.	✓
jobUserName	The name of the user submitting the Hadoop job.	✓
libPath	The path in the Hadoop file system, where the jar files necessary to the Hadoop job reside.	✓
Action attributes for Oozie workflow job type
workflowPath	The path in the Hadoop file system, where the workflow application resides.	✓
Action attributes for MapReduce job type
Mapper-task classname	The map-task classname.	✓
Reducer-task classname	The reducer-task classname.	✓
Mapper-task input directory	The map-task input directory.	✓
Reducer-task output directory	The reduce-task output directory.	✓
Action attributes for Hive, Pig, and Sqoop job types
Actual command or script	The actual command or script that you want to run with your job.	✓
Parameters	The parameters, and related values, that you are passing to your job.
Options	The options that you are passing to your job.
Advanced attributes
customPropertiesTableValue	Additional properties, and related values, that you might want to pass to your job. For example: `<jsdloozie:customPropertiesTableValue key="examplesRoot">examples</jsdloozie: customPropertiesTableValue>` where `examplesRoot` is the property and `examples` is its value.
timeout	The monitoring time. It determines for how long the job is monitored. At the end of the timeout interval the job fails. Default value is 7200 seconds.
pollingPeriod	The monitoring frequency. It determines how often the job is monitored. Default value is 15 seconds.

Note:

If incorrect values are specified for timeout and pollingPeriod at job definition time, during the job execution the incorrect values are replaced as follows:

If numeric values are specified that are lower than the minimum allowed values, they are replaced with:
timeout

10 seconds (the minimum allowed value)

pollingPeriod

5 seconds (the minimum allowed value)
If non-numeric values are specified, they are replaced with:
timeout

7200 seconds (the default value)

pollingPeriod

15 seconds (the default value)

Scheduling and stopping a job in HCL Workload Automation

You schedule HCL Workload Automation Oozie jobs by defining them in job streams. Add the job to a job stream with all the necessary scheduling arguments and submit the job stream.

You can submit jobs by using the Dynamic Workload Console, Application Lab or the conman command line. See Scheduling and submitting jobs and job streams for information about how to schedule and submit jobs and job streams using the various interfaces.

After submission, when the job is running and is reported in EXEC status in HCL Workload Automation, you can stop it if necessary, by using the kill command. This action stops also the program execution on the Oozie server.

Monitoring a job

If the HCL Workload Automation agent stops when you submit the Oozie job, or while the job is running, the job restarts automatically as soon as the agent restarts.

For information about how to monitor jobs using the different product interfaces available, see Monitoring HCL Workload Automation jobs.

OozieJobExecutor.properties file

The properties file is automatically generated either when you perform a "Test Connection" from the Dynamic Workload Console in the job definition panels, or when you submit the job to run the first time. Once the file has been created, you can customize it. This is especially useful when you need to schedule several jobs of the same type. You can specify the values in the properties file and avoid having to provide information such as credentials and other information, for each job. You can override the values in the properties files by defining different values at job definition time.

The TWS_INST_DIR\TWS\JavaExt\cfg\OozieJobExecutor.properties file contains the following properties:

#Oozie properties
hostname=
port=
protocol=http
user=
password=
keyStoreFilePath=
keyStorePassword=
HostnameVerifyCheckbox=false
NumberOfRetries=0
RetryIntervalSeconds=30
nodeName=
jobTracker=
jobUserName=
libPath=
pollingPeriod=15
timeout=7200
#add here the custom oozie job properties in the format 
CUSTOMOOZIEPROPERTY.<property_name>=<property_value>
#For example CUSTOMOOZIEPROPERTY.queueName=default

For a description of each property, see the corresponding job attribute description in Table 1.

Job properties

While the job is running, you can track the status of the job and analyze the properties of the job. In particular, in the Extra Information section, if the job contains variables, you can verify the value passed to the variable from the remote system. Some job streams use the variable passing feature, for example, the value of a variable specified in job 1, contained in job stream A, is required by job 2 in order to run in the same job stream.

For information about how to display the job properties from the various supported interfaces, see Analyzing the job log. For example, from the conman command line, you can see the job properties by running:

conman sj <job_name>;props

where <job_name> is the Oozie job name.

The properties are listed in the Extra Information section of the output command.

For information about passing job properties, see Passing job properties from one job to another in the same job stream instance.

Job log content

For information about how to display the job log from the various supported interfaces, see Analyzing the job log.

For example, you can see the job log content by running conman sj <job_name>;stdlist, where <job_name> is the Oozie job name.