Big Data Analytics Jobs

The following sections provide information about the parameters and attributes of jobs that work with Big Data Analytics platforms and services.

Azure Databricks Job

Azure Databricks is a cloud-based data analytics platform that enables you to process large workloads of data.

The following table describes Azure Databricks job attributes.

Attribute

Description

Connection profile

Defines the connection profile for the job.

Rules:

  • Characters: 1−30

  • Case Sensitive: Yes

  • Invalid Characters: Spaces

  • Variable Name: %%AZURE-ACCOUNT

Databricks Job ID

Determines the ID of the job created in your Databricks workspace.

Parameters

Defines task parameters to override when the job runs, according to the Databricks convention. The list of parameters must begin with the name of the parameter type. For example:

  • "notebook_params":{"param1":"val1", "param2":"val2"}

  • "jar_params": ["param1", "param2"]

For more information about the parameter types, review the properties of RunParameters in the OpenAPI specification provided through the Azure Databricks documentation.

For no parameters, specify the following value:

  • "params": {}

Idempotency Token

(Optional) Defines a token to use to rerun job runs that timed out in Databricks.

Values:

  • Control-M-Idem_%%ORDERID — With this token, upon rerun, Control-M invokes the monitoring of the existing job run in Databricks. Default.

  • Any other value — Replaces the Control-M idempotency token. When you rerun a job using a different token, Databricks creates a new job run with a new unique run ID.

Status Polling Frequency

(Optional) Determines the number of seconds to wait before checking the status of the job between intervals.

Default: 30

Azure HDInsight Job

Azure HDInsight enables you to run an Apache Spark batch job for big data analytics.

The following table describes Azure HDInsight job parameters:

Attribute

Description

Connection Profile

Defines the name of a connection profile to use to connect to the Azure HDInsight workspace.

Parameters

Determines which parameters are passed to the Apache Spark Application during job execution, in JSON format (name:value pairs).

This JSON must include the file and className elements.

Polling Intervals

Determines the number of seconds to wait before the Apache Spark batch job is verified.

Default: 10 seconds

Status Polling Interval

Determines whether logs from Apache Spark appear in the job output.

Databricks Job

The Databricks job enables you to integrate jobs created in the Databricks environment with your existing Control-M workflows. The following table describes Databricks job parameters:

Attribute

Description

Connection Profile

Determines which connection profile to use to connect to the Databricks workspace.

Databricks Job ID

Determines the job ID created in your Databricks workspace.

Parameters

Defines task parameters to override when the job runs, according to the Databricks convention. The list of parameters must begin with the name of the parameter type. For example:

  • "notebook_params":{"param1":"val1", "param2":"val2"}

  • "jar_params": ["param1", "param2"]

For more information about the parameter types, review the properties of RunParameters in the OpenAPI specification provided through the Azure Databricks documentation.

For no parameters, specify the following value:

  • "params": {}

Idempotency Token

(Optional) Defines a token to use to rerun job runs that timed out in Databricks.

Values:

  • Control-M-Idem_%%ORDERID — With this token, upon rerun, Control-M invokes the monitoring of the existing job run in Databricks. Default.

  • Any other value — Replaces the Control-M idempotency token. When you rerun a job using a different token, Databricks creates a new job run with a new unique run ID.

Status Polling Frequency

(Optional) Determines the number of seconds to wait before checking the status of the job between intervals.

Default: 30

Snowflake Job

Snowflake is a cloud computing platform that you can use for data storage, processing, and analysis.

The following table describes the Snowflake job type attributes.

Attribute

Action

Description

Connection Profile

N/A

Defines the connection profile for the job.

Rules:

  • Characters: 1−30

  • Case Sensitive: Yes

  • Invalid Characters: Spaces

Database

N/A

Determines the database that the job uses.

Schema

N/A

Determines the schema that the job uses.

A schema is an organizational model that describes the layout and definition of fields and tables, and their relationships to each other, in a database.

Action

N/A

Determines one of the following Snowflake actions to perform:

  • SQL Statement: Runs any number of Snowflake-supported SQL statements, such as queries, calling or creating procedures, database maintenance tasks, and creating and editing tables.

  • Copy from Query: Copies a queried database and schema into an existing or new file in cloud storage.

  • Copy from Table: Copies from an existing table.

  • Create Table and Query: Creates a table, populated by a query, in the specified database and schema.

  • Create Snowpipe: Creates a Snowpipe and saves it to a file in cloud storage.

  • Start or Pause Snowpipe: Starts or pauses an existing Snowpipe.

  • Stored Procedure: Calls an existing procedure and its arguments.

  • Snowpipe Load Status: Monitors the status of a Snowpipe for a set period of time.

Snowflake SQL Statement

SQL Statement

Determines one or more Snowflake-supported SQL commands.

Rule: Must be written in a single line, with strings separated by one space only.

Statement Timeout

All Actions

Determines the maximum number of seconds to run the job in Snowflake.

Show More Options

All Actions

Determines whether the following job-defining attributes are displayed:

  • Parameters

  • Role

  • Bindings

  • Warehouse

Parameters

All Actions

Defines Snowflake-provided parameters that let you control how data is presented.

Copy

  "param1":"value1",
  "param2":"value2"
}

Role

All Actions

Determines the Snowflake role used for this Snowflake job.

A role is an entity that can be assigned privileges on secure objects. You can be assigned one or more roles from a limited selection.

Bindings

All Actions

Defines the values to bind to the variables used in the Snowflake job, in JSON format.

For more information on bindings, see the Snowflake documentation.

The following JSON script defines two binding variables:

Copy
"1": { 
      "type": "FIXED"
      "value": "123" 
    } 
"2": { 
      "type": "TEXT"
      "value": "String" 
    }

Warehouse

All Actions

Determines the warehouse used in the Snowflake job.

A warehouse is a cluster of virtual machines that processes a Snowflake job.

Show Output

All Actions

Determines whether to show a full JSON response in the log output.

Status Polling Frequency

All Actions

Determines the number of seconds to wait before checking the status of the job.

Default: 20

Query to Location

Copy from Query

Defines the cloud storage location.

Query Input

Copy from Query

Defines the query used for copying the data.

Storage Integration

  • Copy from Query

  • Copy from Table

Defines the storage integration object.

Overwrite

  • Copy from Query

  • Copy from Table

Determines whether to overwrite an existing file in the cloud storage, as follows:

  • Yes

  • No

File Format

  • Copy from Query

  • Copy from Table

  • Create Snowpipe

Determines one of the following file formats for the saved file:

  • JSON

  • CSV

Copy Destination

Copy from Table

Defines where the JSON or CSV file is saved.

You can save to Amazon Web Services, Google Cloud Platform, or Microsoft Azure.

s3://<bucket name>/

From Table

Copy from Table

Defines the name of the copied table.

Create Table Name

Create Table and Query

Defines the name of the new or existing table where the data is queried.

Query

Create Table and Query

Defines the query used for the copied data.

Snowpipe Name

  • Create Snowpipe

  • Start or Pause Snowpipe

  • Snowpipe Load Status

Defines the name of the Snowpipe.

A Snowpipe loads data from files when they are ready, or staged.

Copy into Table

Create Snowpipe

Defines the table that the data is copied into.

Copy Data from Stage

Create Snowpipe

Defines the stage from where the data is copied.

Start or Pause Snowipe

Start or Pause Snowpipe

Determines whether to start or pause the Snowpipe, as follows:

  • Start Snowpipe

  • Pause Snowpipe

Stored Procedure Name

Stored Procedure

Defines the name of the stored procedure.

Procedure Argument

Stored Procedure

Defines the value of the argument in the stored procedure.

Table Name

Snowpipe Load Status

Defines the table that is monitored when loaded by the Snowpipe.

Stage Location

Snowpipe Load Status

Defines the cloud storage location.

A stage is a pointer that indicates where data is stored, or staged.

s3://CloudStorageLocation/

Days Back

Snowpipe Load Status

Determines the number of days to monitor the Snowpipe load status.

Status File Cloud Location Path

Snowpipe Load Status

Defines the cloud storage location where a CSV file log is created.

The CSV file log details the load status for each Snowpipe.

Storage Integration

Snowpipe Load Status

Defines the Snowflake configuration for the cloud storage location, defined in the previous attribute−Status File Cloud Location Path.

S3_INT