HCL Workload Automation, Version 9.4

Hadoop Distributed File System jobs

A Hadoop Distributed File System job defines, schedules, monitors, and manages file transfer jobs between your workstation and the Hadoop Distributed File System server.

Prerequisites

The HCL Workload Automation plug-in for Hadoop Distributed File System enables you to access the Hadoop Distributed File System from any computer, and work on files and directories. You can download a file, upload a file or free text, append a file or free text to another file, rename or delete a file, create a directory, and wait for the creation of a file on a Hadoop Distributed File System server.

Before you can define Hadoop Distributed File System jobs, you must install an HCL Workload Automation agent with a connection to the Hadoop Distributed File System server.

Hadoop Distributed File System job definition

A description of the job properties and valid values are detailed in the context-sensitive help in the Dynamic Workload Console by clicking the question mark (?) icon in the top-right corner of the properties pane.

For more information about creating jobs using the various supported product interfaces, see Defining a job.

The following table lists the required and optional attributes for Hadoop Distributed File System jobs:
Table 1. Required and optional attributes for the definition of a Hadoop Distributed File System job
Attribute Description and value Required
Connection properties - Hadoop Distributed File System section
Hostname The hostname of the Hadoop Distributed File System server.
Port The port of the Hadoop Distributed File System server.
Protocol The protocol for connecting to the Hadoop Distributed File System server. Supported values are http and https.  
User The user to be used for accessing the Hadoop Distributed File System server.  
Password The password to be used for accessing the Hadoop Distributed File System server.  
Connection properties - Retry options section
Number of retries The number of times the program retries performing the operation.  
Retry interval (seconds) The number of seconds the program waits before retrying the operation.  
Action properties - Upload section
File on Hadoop Distributed File System The name and path of the file on the Hadoop Distributed File System server. Use this option to upload a file from the local workstation to the Hadoop Distributed File System server.
Permissions The permissions to be defined for the file on the Hadoop Distributed File System server.  
Overwrite Specifies whether the file on the Hadoop Distributed File System server should be overwritten, if existing.  
Upload a file The name and path of the file on the local workstation.  
File content Specifies the file content to be written into the file on the Hadoop Distributed File System server.  
Action properties - Download section
File on Hadoop Distributed File System The name and path of the file on the Hadoop Distributed File System server. Use this option to download a file from theHadoop Distributed File System server to the local workstation.
Save file as Specify the name and path of the file to be saved locally
Action properties - Append section
File on Hadoop Distributed File System The name and path of the file on the Hadoop Distributed File System server. Use this option to append a file from the local workstation or a specific content to a file the Hadoop Distributed File System server.
Append a file The name and path of the file to be appended to the specified file on the Hadoop Distributed File System server.  
Append this content Specify the content to be appended to the file on the Hadoop Distributed File System server.  
Action properties - Rename section
File or directory on Hadoop Distributed File System The name and path of the file or directory on the Hadoop Distributed File System server. Use this option to modify the name of a file or directory on the Hadoop Distributed File System server.
New path on Hadoop Distributed File System The new name of the file or directory on the Hadoop Distributed File System server.
Action properties - Delete section
File or directory on Hadoop Distributed File System The name and path of the file or directory on the Hadoop Distributed File System server. Use this option to delete a file or directory on the Hadoop Distributed File System server.
Recursive Specifies whether this action should be recursive.  
Action properties - Create directory section
Directory on Hadoop Distributed File System The name and path of the directory on the Hadoop Distributed File System server. Use this option to create a directory on the Hadoop Distributed File System server.
Permissions Specifies the permissions to be assigned to the directory.  
Action properties - Wait for a file section
File or directory on Hadoop Distributed File System The name and path of the file or directory on the Hadoop Distributed File System server. Use this option to define a dependency in the job for the creation on the Hadoop Distributed File System server of a file or directory. When the file or directory are created, the job status changes to successful.

Scheduling and stopping the job in HCL Workload Automation

You schedule HCL Workload Automation Hadoop Distributed File System jobs by defining them in job streams. Add the job to a job stream with all the necessary scheduling arguments and submit the job stream.

You can submit jobs by using the Dynamic Workload Console, Application Lab or the conman command line. See Scheduling and submitting jobs and job streams for information about how to schedule and submit jobs and job streams using the various interfaces.

After submission, when the job is running and is reported in EXEC status in HCL Workload Automation, you can stop it if necessary, by using the kill command. However, this action is effective only for the Wait for a file action. If you have defined different actions in your job, the kill command is ignored.

Monitoring the job

If the HCL Workload Automation agent stops when you submit the HCL Workload Automation Hadoop Distributed File System job or while the job is running, when the agent becomes available again the job status changes to UNKOWN and you have to resubmit the job. If the job consists of the Wait for a file action, as soon as the agent becomes available again HCL Workload Automation begins monitoring the job from where it stopped.

For information about how to monitor jobs using the different product interfaces available, see Monitoring HCL Workload Automation jobs.

Job properties

While the job is running, you can track the status of the job and analyze the properties of the job. In particular, in the Extra Information section, if the job contains variables, you can verify the value passed to the variable from the remote system. Some job streams use the variable passing feature, for example, the value of a variable specified in job 1, contained in job stream A, is required by job 2 in order to run in the same job stream.

For information about how to display the job properties from the various supported interfaces, see Analyzing the job log.

For example, from the conman command line, you can see the job properties by running:
conman sj <job_name>;props
where <job_name> is the Hadoop Distributed File System job name.

The properties are listed in the Extra Information section of the output command.

For more information about passing variables between jobs, see Passing job properties from one job to another in the same job stream instance.

Job log content

For information about how to display the job log from the various supported interfaces, see Analyzing the job log.

For example, you can see the job log content by running conman sj <job_name>;stdlist, where <job_name> is the Hadoop Distributed File System job name.

See also

From the Dynamic Workload Console you can perform the same task as described in

Creating job definitions.

For more information about how to create and edit scheduling objects, see

Designing your Workload.