Hadoop Job

The Hadoop job connects to the Hadoop framework, and it enables the distributed processing of large data sets across clusters of commodity servers. You can expand your enterprise business workflows to include tasks running in your Big Data Hadoop cluster from Control-M using the different Hadoop-supported tools, including Pig, Hive, HDFS File Watcher, Map Reduce Jobs, and Sqoop.

The following table describes the Hadoop job attributes:

Attribute	Description
Connection Profile	Defines the connection profile for the job. Rules: Characters: 1−30 Case Sensitive: Yes Invalid Characters: Spaces Variable Name: %%HDP-ACCOUNT
Execution Type	Determines the execution type for Hadoop job execution, as follows: DistCp Job Distributed Shell Job HDFS Commands Job HDFS File Watcher Job Impala Job Hive Job Java-Map-Reduce Job Oozie Job Oozie Extractor Job Pig Job Spark Job Sqoop Job Streaming Job Tajo Job Variable Name: %%HDP-EXEC_TYPE
Pre Commands	Defines the Pre commands performed before job execution (not for HDFS Commands jobs and Oozie Extractor jobs), and the argument for each command.
Fail the job if the command fails	Determines whether the entire job fails if any of the Pre commands fail (not for HDFS Commands jobs and Oozie Extractor jobs).
Post Commands	Defines the Post commands performed before job execution (not for HDFS Commands jobs and Oozie Extractor jobs), and the argument for each command.
Fail the job if the command fails	Determines whether the entire job fails if any of the Post commands fail (not for HDFS Commands jobs and Oozie Extractor jobs).

DistCp Job Attributes

The following table describes the DistCp job attributes:

Attribute	Description
Target Path	Defines the absolute destination path. Variable Name: %%HDP-DISTCP_TARGET_PATH
Source Path	Defines the source paths. Variable Name: %%HDP-DISTCP_SOURCE_PATH-Nxxx_ARG
Command Line Options	Defines the sets of attributes and values that are added to the command line. Variable Names: Name: %%HDP-DISTCP_OPTION-Nxxx-NAME Value: %%HDP- DISTCP_OPTION-Nxxx-VAL
Append Yarn aggregated logs to output	Determines whether to add Yarn aggregated logs to the job outputA tab in the job properties pane in the Monitoring domain that shows the output of a job, which indicates whether a job ended OK, and used, for example, with jobs that check file location.

Distributed Shell Job Attributes

The following table describes the Distributed Shell job attributes:

Attribute	Description
Shell Type	Determines what the Distributed Shell job runs, as follows: Command: Runs a shell command entry as defined by Command. Script File: Runs a script file as defined by Command, Script Full Path, and Shell Script Arguments. Variable Name: %%HDP-SHELL_TYPE
Command	Defines the shell command entry to run for the job execution. Variable Name: %%HDP-SHELL_COMMAND
Script Full Path	Defines the full path to the script file which is executed. The script file is located in the HDFS. Variable Name: %%HDP-SHELL_SCRIPT_FULL_PATH
Shell Script Arguments	Defines the shell script arguments. Variable Name: %%HDP-SHELL-Nxxx-ARG
More Options	Opens more attributes.
Files/Archives	Defines the full path to the file or archive to upload as a dependency to the HDFS working directory. Variable Names: Type: %%HDP-SHELL_FILE_DEP-Nxxx-TYPE Path: %%HDP-SHELL_FILE_DEP -Nxxx-PATH
Options	Defines the additional option (Name and Value) to set when executing the job. Variable Names: Name: %%HDP-SHELL_OPTION -Nxxx-NAME Value: %%HDP-SHELL_OPTION -Nxxx-VAL
Environment Variables	Defines the environment variables for the shell script/command. Variable Name: %%HDP-SHELL_ENV_VARIABLE-Nxxx-ARG
Append Yarn aggregated logs to output	Determines whether to add Yarn aggregated logs to the job output.

HDFS Commands Job Attributes

The following table describes the HDFS Commands job attributes:

Attribute	Description
Command	Defines the command for the argument to be performed with job execution. Variable Name: %%HDP-HDFS_CMD_ACTION-Nxxx-CMD
Arguments	Defines the argument used by the command. Variable Name: %%HDP-HDFS_CMD_ACTION-Nxxx-ARG

Attribute

Description

Command

Defines the command for the argument to be performed with job execution.

Variable Name: %%HDP-HDFS_CMD_ACTION-Nxxx-CMD

Arguments

Defines the argument used by the command.

Variable Name: %%HDP-HDFS_CMD_ACTION-Nxxx-ARG

HDFS File Watcher Job Attributes

The following table describes the HDFS File Watcher job attributes:

Attribute	Description
File name full path	Defines the full path of the file being watched. Variable Name: %%HDP-HDFS_FILE_PATH
Min detected size	Determines the minimum file size in bytes to meet the criteria and finish the job as OK. If the file arrives, but the size is not met, the job continues to watch the file. Variable Name: %%HDP-MIN_DETECTED_SIZE
Max time to wait	Determines the maximum number of minutes to wait for the file to meet the watching criteria. If criteria are not met (file did not arrive, or minimum size was not reached) the job fails after this maximum number of minutes. Variable Name: %%HDP-MAX_WAIT_TIME
File Name Variable	Defines the variable name that is used in succeeding jobs. Variable Name: %%HDP-FW_DETECTED _FILE_NAME_VAR

Impala Job Attributes

The following table describes the Impala job attributes:

Attribute	Description
Source	Determines the source type to run the queries, as follows: Query File: Runs a query file as defined by Query File Full Path. Open Query: Runs an open query command as defined by Query. Variable Name: %%HDP-IMPALA_QUERY_SOURCE
Query File Full Path	Defines the location of the file used to run the queries. Variable Name: %%HDP-IMPALA_QUERY_FILE_PATH
Query	Defines the query command used to run the queries. Variable Name: %%HDP-IMPALA_OPEN_QUERY
Command Line Options	Defines the sets of attributes and values that are added to the command line. Variable Name: %%HDP-HDP-IMPALA_CMD_OPTION-Nxxx-ARG

Hive Job Attributes

The following table describes the Hive job attributes:

Attribute	Description
Full path to Hive script	Defines the full path to the Hive script on the Hadoop host. Variable Name: %%HDP-HIVE_SCRIPT_NAME
Script Parameters	Defines the list of parameters for the script. Variable Names: Name: %%HDP-HIVE_SCRIPT_PARAM_Nxxx-NAME Value: %%HDP-HIVE_SCRIPT_PARAM-Nxxx-VAL
Append Yarn aggregated logs to output	Determines whether to add Yarn aggregated logs to the job output.

Attribute

Description

Full path to Hive script

Defines the full path to the Hive script on the Hadoop host.

Variable Name: %%HDP-HIVE_SCRIPT_NAME

Script Parameters

Defines the list of parameters for the script.

Variable Names:

Name: %%HDP-HIVE_SCRIPT_PARAM_Nxxx-NAME
Value: %%HDP-HIVE_SCRIPT_PARAM-Nxxx-VAL

Append Yarn aggregated logs to output

Determines whether to add Yarn aggregated logs to the job output.

Java-Map-Reduce Job Attributes

The following table describes the Java Map-Reduce job attributes:

Attribute	Description
Full path to Jar	Defines the full path to the jar containing the Map Reduce Java program on the Hadoop host. Variable Name: %%HDP-JAVA_JAR_NAME
Main Class	Defines the class that is included in the jar containing a main function and the map reduce implementation. Variable Name: %%HDP-JAVA_MAIN_CLASS
Arguments	Defines the argument used by the command. Variable Name: %%HDP-JAVA_Nxxx_ARG
Append Yarn aggregated logs to output	Determines whether to add Yarn aggregated logs to the job output.

Oozie Job Attributes

The following table describes the Oozie job attributes:

Attribute	Description
Job Properties File	Defines the job properties file path. Variable Name: %%HDP-OOZIE_JOB_PROPERTIES_FILE
Job Properties (Add/Overwrite)	Defines the Oozie job properties. A set of properties is comprised of the following: Key: Defines a key name associated with each property. Variable Name: %%HDP-OOZIE_PROPERTY-Nxxx-KEY Value: Defines a value associated with each property. Variable Name: %%HDP-OOZIE_PROPERTY-Nxxx-VAL You can add new properties or override property values defined in the Job Properties File.
Rerun from point of failure	Determines whether to rerun an Oozie job from the point of its failure.

Attribute

Description

Job Properties File

Defines the job properties file path.

Variable Name: %%HDP-OOZIE_JOB_PROPERTIES_FILE

Job Properties (Add/Overwrite)

Defines the Oozie job properties.

A set of properties is comprised of the following:

Key: Defines a key name associated with each property.

Variable Name: %%HDP-OOZIE_PROPERTY-Nxxx-KEY
Value: Defines a value associated with each property.

Variable Name: %%HDP-OOZIE_PROPERTY-Nxxx-VAL

You can add new properties or override property values defined in the Job Properties File.

Rerun from point of failure

Determines whether to rerun an Oozie job from the point of its failure.

Pig Job Attributes

The following table describes the Pig job attributes:

Attribute	Description
Full Path to Pig Program	Defines the full path to the Pig program on the Hadoop host. Variable Name: %%HDP-PIG_PROG_NAME
Pig Program Parameters	Defines the list of program parameters.
Append Yarn aggregated logs to output	Determines whether to add Yarn aggregated logs to the job output.
Properties	Defines a list of properties (Name and Value) to be executed with the job. These properties override the Hadoop defaults.
Archives	Defines the location of the Hadoop archives.
Files	Defines the location of the Hadoop files.

Spark Job Attributes

The following table describes the Spark job attributes:

Attribute	Description
Program Type	Determines the Spark program type, as follows: Python Script: As defined by Full Path to Script. Java / Scala Application: As defined by Application Jar File and Full Path to Script. Variable Name: %%HDP-SPARK_PROG_TYPE
Full Path to Script	Defines the full path to the python script to execute. Variable Name: %%HDP-SPARK_FULL_PATH_TO_PYTHON_SCRIPT
Application Jar File	Defines the path to the jar including your application and all the dependencies. Variable Name: %%HDP-SPARK_APP_JAR_FULL_PATH
Main Class to Run	Defines the main class of the application. Variable Name: %%HDP-SPARK_MAIN_CLASS_TO_RUN
Application Arguments	Defines the attribute arguments that are added at the end of the Spark command line either after the main class for Java / Scala Applications or after the script of the Python Script. Variable Name: %%HDP-SPARK_Nxxx_ARG
Command Line Options	Defines the sets of attributes and values that are added to the command line. Variable Names: Name: %%HDP-SPARK_OPTION -Nxxx-NAME Value: %%HDP-SPARK_OPTION -Nxxx-VAL
Append Yarn aggregated logs to output	Determines whether to add Yarn aggregated logs to the job output.

Sqoop Job Attributes

The following table describes the Sqoop job attributes:

Attribute	Description
Command Editor	Defines any valid Sqoop command necessary for job execution. Sqoop can only be used for job execution if defined in Sqoop connection attributes. HDP-SQOOP_COMMAND
Append Yarn aggregated logs to output	Determines whether to add Yarn aggregated logs to the job output.
Properties	Defines a list of properties (Name and Value) to be executed with the job. These properties override the Hadoop defaults.
Archives	Defines the location of the Hadoop archives.
Files	Defines the location of the Hadoop files.

Streaming Job Attributes

The following table describes the Streaming job attributes:

Attribute	Description
Input Path	Defines the input file for the Mapper step. Variable Name: %%HDP-INPUT_PATH
Output Path	Defines the HDFS output path for the Reducer step. Variable Name: %%HDP-OUTPUT_PATH
Mapper Command	Defines the command that runs as a mapper. Variable Name: %%HDP-MAPPER_COMMAND
Reducer Command	Defines the command that runs as a reducer. Variable Name: %%HDP-REDUCER_COMMAND
Streaming Options	Defines the sets of attributes (Name and Value) that are added to the end of the Streaming command line. Variable Names: Name: %%HDP-STREAMING_PARAM-Nxxx-NAME Value: %%HDP-STREAMING_PARAM-Nxxx-VAL
Generic Options	Defines the sets of attributes (Name and Value) that are added to the Streaming command line. Variable Names: Name: %%HDP-GENERIC_PARAM-Nxxx-NAME Value: %%HDP-GENERIC_PARAM-Nxxx-VAL
Append Yarn aggregated logs to output	Determines whether to add Yarn aggregated logs to the job output.

Tajo Job Attributes

The following table describes the Tajo job attributes:

Attribute	Description
Command Source	Determines the source of the Tajo command, as follows: Input File: Runs the Tajo command from an input file as defined by the Full File Path. Variable Name: %%HDP-TAJO_INPUT_FILE Open Query: Runs an open query as the Tajo command, as defined by Open Query. Variable Name: %%HDP-TAJO_OPEN_QUERY
Full File Path	Defines the file path of the input file that runs the Tajo command.
Open Query	Defines the query. Variable Name: %%HDP-TAJO_OPEN_QUERY

Attribute

Description

Command Source

Determines the source of the Tajo command, as follows:

Input File: Runs the Tajo command from an input file as defined by the Full File Path.

Variable Name: %%HDP-TAJO_INPUT_FILE
Open Query: Runs an open query as the Tajo command, as defined by Open Query.

Variable Name: %%HDP-TAJO_OPEN_QUERY

Full File Path

Defines the file path of the input file that runs the Tajo command.

Open Query

Defines the query.

Variable Name: %%HDP-TAJO_OPEN_QUERY