Hadoop Distributed File System jobs
A Hadoop Distributed File System job defines, schedules, monitors, and manages file transfer jobs between your workstation and the Hadoop Distributed File System server.
Prerequisites
The HCL Workload Automation plug-in for Hadoop Distributed File System enables you to access the Hadoop Distributed File System from any computer, and work on files and directories. You can download a file, upload a file or free text, append a file or free text to another file, rename or delete a file, create a directory, and wait for the creation of a file on a Hadoop Distributed File System server.
Before you can define Hadoop Distributed File System jobs, you must install an HCL Workload Automation agent with a connection to the Hadoop Distributed File System server.
Hadoop Distributed File System job definition
A description of the job properties and valid values are detailed in the context-sensitive help in the Dynamic Workload Console by clicking the question mark (?) icon in the top-right corner of the properties pane.For more information about creating jobs using the various supported product interfaces, see Defining a job.
Attribute | Description and value | Required |
---|---|---|
Connection properties - Hadoop Distributed File System section | ||
Hostname | The hostname of the Hadoop Distributed File System server. | ✓ |
Port | The port of the Hadoop Distributed File System server. | ✓ |
Protocol | The protocol for connecting to the Hadoop Distributed File System server. Supported values are http and https. | |
User | The user to be used for accessing the Hadoop Distributed File System server. | |
Password | The password to be used for accessing the Hadoop Distributed File System server. | |
Connection properties - Retry options section | ||
Number of retries | The number of times the program retries performing the operation. | |
Retry interval (seconds) | The number of seconds the program waits before retrying the operation. | |
Action properties - Upload section | ||
File on Hadoop Distributed File System | The name and path of the file on the Hadoop Distributed File System server. Use this option to upload a file from the local workstation to the Hadoop Distributed File System server. | ✓ |
Permissions | The permissions to be defined for the file on the Hadoop Distributed File System server. | |
Overwrite | Specifies whether the file on the Hadoop Distributed File System server should be overwritten, if existing. | |
Upload a file | The name and path of the file on the local workstation. | |
File content | Specifies the file content to be written into the file on the Hadoop Distributed File System server. | |
Action properties - Download section | ||
File on Hadoop Distributed File System | The name and path of the file on the Hadoop Distributed File System server. Use this option to download a file from theHadoop Distributed File System server to the local workstation. | ✓ |
Save file as | Specify the name and path of the file to be saved locally | ✓ |
Action properties - Append section | ||
File on Hadoop Distributed File System | The name and path of the file on the Hadoop Distributed File System server. Use this option to append a file from the local workstation or a specific content to a file the Hadoop Distributed File System server. | ✓ |
Append a file | The name and path of the file to be appended to the specified file on the Hadoop Distributed File System server. | |
Append this content | Specify the content to be appended to the file on the Hadoop Distributed File System server. | |
Action properties - Rename section | ||
File or directory on Hadoop Distributed File System | The name and path of the file or directory on the Hadoop Distributed File System server. Use this option to modify the name of a file or directory on the Hadoop Distributed File System server. | ✓ |
New path on Hadoop Distributed File System | The new name of the file or directory on the Hadoop Distributed File System server. | ✓ |
Action properties - Delete section | ||
File or directory on Hadoop Distributed File System | The name and path of the file or directory on the Hadoop Distributed File System server. Use this option to delete a file or directory on the Hadoop Distributed File System server. | ✓ |
Recursive | Specifies whether this action should be recursive. | |
Action properties - Create directory section | ||
Directory on Hadoop Distributed File System | The name and path of the directory on the Hadoop Distributed File System server. Use this option to create a directory on the Hadoop Distributed File System server. | ✓ |
Permissions | Specifies the permissions to be assigned to the directory. | |
Action properties - Wait for a file section | ||
File or directory on Hadoop Distributed File System | The name and path of the file or directory on the Hadoop Distributed File System server. Use this option to define a dependency in the job for the creation on the Hadoop Distributed File System server of a file or directory. When the file or directory are created, the job status changes to successful. | ✓ |
Scheduling and stopping the job in HCL Workload Automation
You schedule HCL Workload Automation Hadoop Distributed File System jobs by defining them in job streams. Add the job to a job stream with all the necessary scheduling arguments and submit the job stream.
You can submit jobs by using the Dynamic Workload Console, Application Lab or the conman command line. See Scheduling and submitting jobs and job streams for information about how to schedule and submit jobs and job streams using the various interfaces.
After submission, when the job is running and is reported in EXEC status in HCL Workload Automation, you can stop it if necessary, by using the kill command. However, this action is effective only for the Wait for a file action. If you have defined different actions in your job, the kill command is ignored.
Monitoring the job
If the HCL Workload Automation agent stops when you submit the HCL Workload Automation Hadoop Distributed File System job or while the job is running, when the agent becomes available again the job status changes to UNKOWN and you have to resubmit the job. If the job consists of the Wait for a file action, as soon as the agent becomes available again HCL Workload Automation begins monitoring the job from where it stopped.
For information about how to monitor jobs using the different product interfaces available, see Monitoring HCL Workload Automation jobs.
Job properties
While the job is running, you can track the status of the job and analyze the properties of the job. In particular, in the Extra Information section, if the job contains variables, you can verify the value passed to the variable from the remote system. Some job streams use the variable passing feature, for example, the value of a variable specified in job 1, contained in job stream A, is required by job 2 in order to run in the same job stream.
For information about how to display the job properties from the various supported interfaces, see Analyzing the job log.
conman sj <job_name>;props
where
<job_name> is the Hadoop Distributed File System job name. The properties are listed in the Extra Information section of the output command.
For more information about passing variables between jobs, see Passing job properties from one job to another in the same job stream instance.
Job log content
For information about how to display the job log from the various supported interfaces, see Analyzing the job log.
For example, you can see the job log content by running conman sj <job_name>;stdlist, where <job_name> is the Hadoop Distributed File System job name.
See also
From the Dynamic Workload Console you can perform the same task as described in
For more information about how to create and edit scheduling objects, see