Apache Oozie jobs
An Apache Oozie job defines, schedules, monitors, and controls the execution of Oozie workflows and Hadoop jobs like: MapReduce, Pig, Hive, and Sqoop.
Prerequisites
Before you can define Oozie jobs, you must create the HCL Workload Automation agent connection to the Oozie server.
Oozie job definition
A description of the job properties and valid values are detailed in the context-sensitive help in the Dynamic Workload Console by clicking the question mark (?) icon in the top-right corner of the properties pane.For more information about creating jobs using the various supported product interfaces, see Defining a job.
Attribute | Description and value | Required |
---|---|---|
Connection attributes | ||
hostname | The host name of the Oozie server. | If you do not specify the hostname attribute, then hostname, protocol, and VerifyCheckbox attributes are read from the properties file. |
port | The port number where the Oozie server is listening. | |
protocol | The protocol for connecting to the Oozie server. Supported values are http and https. | |
userName | The user to be used for accessing the Oozie server. | |
password | The password to be used for accessing the Oozie server. | |
keyStore FilePath | The fully qualified path of the keystore file containing the private key that is used to make the connection. | |
keyStore Password | The password that protects the private key and is required to make the connection. | Required only if you specify a keystore file path. |
HostnameVerify Checkbox |
Supported values are true and false.
|
|
NumberOfRetries | The number of times the program retries in case of connection failure. Default value is 0. | |
RetryIntervalSeconds | The number of seconds the program waits before retrying in case of connection failure. Default value is 30. | |
Action attributes for all the job types | ||
nodeName | The URL of the Hadoop name-node. | ✓ |
jobTracker | The URL of the Hadoop job-tracker. | ✓ |
jobUserName | The name of the user submitting the Hadoop job. | ✓ |
libPath | The path in the Hadoop file system, where the jar files necessary to the Hadoop job reside. | ✓ |
Action attributes for Oozie workflow job type | ||
workflowPath | The path in the Hadoop file system, where the workflow application resides. | ✓ |
Action attributes for MapReduce job type | ||
Mapper-task classname | The map-task classname. | ✓ |
Reducer-task classname | The reducer-task classname. | ✓ |
Mapper-task input directory | The map-task input directory. | ✓ |
Reducer-task output directory | The reduce-task output directory. | ✓ |
Action attributes for Hive, Pig, and Sqoop job types | ||
Actual command or script | The actual command or script that you want to run with your job. | ✓ |
Parameters | The parameters, and related values, that you are passing to your job. | |
Options | The options that you are passing to your job. | |
Advanced attributes | ||
customPropertiesTableValue | Additional properties, and related values, that
you might want to pass to your job. For example:
where examplesRoot is
the property and examples is its value. |
|
timeout | The monitoring time. It determines for how long the job is monitored. At the end of the timeout interval the job fails. Default value is 7200 seconds. | |
pollingPeriod | The monitoring frequency. It determines how often the job is monitored. Default value is 15 seconds. |
- If numeric values are specified that are lower than the minimum
allowed values, they are replaced with:
- timeout
- 10 seconds (the minimum allowed value)
- pollingPeriod
- 5 seconds (the minimum allowed value)
- If non-numeric values are specified, they are replaced with:
- timeout
- 7200 seconds (the default value)
- pollingPeriod
- 15 seconds (the default value)
Scheduling and stopping a job in HCL Workload Automation
You schedule HCL Workload Automation Oozie jobs by defining them in job streams. Add the job to a job stream with all the necessary scheduling arguments and submit the job stream.
You can submit jobs by using the Dynamic Workload Console, Application Lab or the conman command line. See Scheduling and submitting jobs and job streams for information about how to schedule and submit jobs and job streams using the various interfaces.
After submission, when the job is running and is reported in EXEC status in HCL Workload Automation, you can stop it if necessary, by using the kill command. This action stops also the program execution on the Oozie server.
Monitoring a job
If the HCL Workload Automation agent stops when you submit the Oozie job, or while the job is running, the job restarts automatically as soon as the agent restarts.
For information about how to monitor jobs using the different product interfaces available, see Monitoring HCL Workload Automation jobs.
OozieJobExecutor.properties file
The properties file is automatically generated either when you perform a "Test Connection" from the Dynamic Workload Console in the job definition panels, or when you submit the job to run the first time. Once the file has been created, you can customize it. This is especially useful when you need to schedule several jobs of the same type. You can specify the values in the properties file and avoid having to provide information such as credentials and other information, for each job. You can override the values in the properties files by defining different values at job definition time.
#Oozie properties
hostname=
port=
protocol=http
user=
password=
keyStoreFilePath=
keyStorePassword=
HostnameVerifyCheckbox=false
NumberOfRetries=0
RetryIntervalSeconds=30
nodeName=
jobTracker=
jobUserName=
libPath=
pollingPeriod=15
timeout=7200
#add here the custom oozie job properties in the format
CUSTOMOOZIEPROPERTY.<property_name>=<property_value>
#For example CUSTOMOOZIEPROPERTY.queueName=default
For a description of each property, see the corresponding job attribute description in Table 1.
Job properties
While the job is running, you can track the status of the job and analyze the properties of the job. In particular, in the Extra Information section, if the job contains variables, you can verify the value passed to the variable from the remote system. Some job streams use the variable passing feature, for example, the value of a variable specified in job 1, contained in job stream A, is required by job 2 in order to run in the same job stream.
conman sj <job_name>;props
where
<job_name> is the Oozie job name. The properties are listed in the Extra Information section of the output command.
For information about passing job properties, see Passing job properties from one job to another in the same job stream instance.
Job log content
For information about how to display the job log from the various supported interfaces, see Analyzing the job log.
For example, you can see the job log content by running conman sj <job_name>;stdlist, where <job_name> is the Oozie job name.
See also
From the Dynamic Workload Console you can perform the same task as described in
For more information about how to create and edit scheduling objects, see