

# Logging for AWS Glue jobs


 In AWS Glue 5.0, all jobs have real-time logging capabilities. Additionally, you can specify custom configuration options to tailor the logging behavior. These options include setting the Amazon CloudWatch log group name, the Amazon CloudWatch log stream prefix (which will precede the AWS Glue job run ID and driver/executor ID), and the log conversion pattern for log messages. These configurations allow you to aggregate logs in custom Amazon CloudWatch log groups with different expiration policies. Furthermore, you can analyze the logs more effectively by using custom log stream prefixes and conversion patterns. This level of customization enables you to optimize log management and analysis according to your specific requirements. 

## Logging behavior in AWS Glue 5.0


 By default, system logs, Spark daemon logs, and user AWS Glue Logger logs are written to the `/aws-glue/jobs/error` log group in Amazon CloudWatch. On the other hand, user stdout (standard output) and stderr (standard error) logs are written to the `/aws-glue/jobs/output` log group by default. 

## Custom logging


 You can customize the default log group and log stream prefixes using the following job arguments: 
+  `--custom-logGroup-prefix`: Allows you to specify a custom prefix for the `/aws-glue/jobs/error` and `/aws-glue/jobs/output` log groups. If you provide a custom prefix, the log group names will be in the following format: 
  +  `/aws-glue/jobs/error` will be `<customer prefix>/error` 
  +  `/aws-glue/jobs/output ` will be `<customer prefix>/output` 
+  `--custom-logStream-prefix`: Allows you to specify a custom prefix for the log stream names within the log groups. If you provide a custom prefix, the log stream names will be in the following format: 
  +  `jobrunid-driver` will be `<customer log stream>-driver` 
  +  `jobrunid-executorNum` will be `<customer log stream>-executorNum` 

 Validation rules and limitations for custom prefixes: 
+  The entire log stream name must be between 1 and 512 characters long. 
+  The custom prefix itself is restricted to 400 characters. 
+  The custom prefix must match the regular expression pattern `[^:\$1]\$1` (special characters allowed are '\$1', '-', and '/'). 

## Logging application-specific messages using the custom script logger
Using the custom script logger

You can use the AWS Glue logger to log any application-specific messages in the script that are sent in real time to the driver log stream.

The following example shows a Python script.

```
from awsglue.context import GlueContext
from pyspark.context import SparkContext

sc = SparkContext()
glueContext = GlueContext(sc)
logger = glueContext.get_logger()
logger.info("info message")
logger.warn("warn message")
logger.error("error message")
```

The following example shows a Scala script.

```
import com.amazonaws.services.glue.log.GlueLogger

object GlueApp {
  def main(sysArgs: Array[String]) {
    val logger = new GlueLogger
    logger.info("info message")
    logger.warn("warn message")
    logger.error("error message")
  }
}
```

## Enabling the progress bar to show job progress
Enabling the progress bar

AWS Glue provides a real-time progress bar under the `JOB_RUN_ID-progress-bar` log stream to check AWS Glue job run status. Currently it supports only jobs that initialize `glueContext`. If you run a pure Spark job without initializing `glueContext`, the AWS Glue progress bar does not appear.

The progress bar shows the following progress update every 5 seconds.

```
Stage Number (Stage Name): > (numCompletedTasks + numActiveTasks) / totalNumOfTasksInThisStage]
```

## Security configuration with Amazon CloudWatch logging


 When a security configuration is enabled for Amazon CloudWatch logs, AWS Glue creates log groups with specific naming patterns that incorporate the security configuration name. 

### Log group naming with security configuration


 The default and custom log groups will be as follows: 
+  **Default error log group:** `/aws-glue/jobs/Security-Configuration-Name-role/glue-job-role/error` 
+  **Default output log group:** `/aws-glue/jobs/Security-Configuration-Name-role/glue-job-role/output` 
+  **Custom error log group (AWS Glue 5.0):** `custom-log-group-prefix/Security-Configuration-Name-role/glue-job-role/error` 
+  **Custom output log group (AWS Glue 5.0):** `custom-log-group-prefix/Security-Configuration-Name-role/glue-job-role/output` 

### Required IAM Permissions


 You need to add the `logs:AssociateKmsKey` permission to your IAM role permissions, if you enable a security configuration with Amazon CloudWatch Logs. If that permission is not included, continuous logging will be disabled. 

 Also, to configure the encryption for the Amazon CloudWatch Logs, follow the instructions at [ Encrypt Log Data in Amazon CloudWatch Logs Using AWS Key Management Service](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/encrypt-log-data-kms.html) in the Amazon Amazon CloudWatch Logs User Guide. 

### Additional Information


 For more information on creating security configurations, see [Managing security configurations on the AWS Glue console](https://docs.aws.amazon.com/glue/latest/dg/console-security-configurations.html). 

**Topics**
+ [

## Logging behavior in AWS Glue 5.0
](#monitor-logging-behavior-glue-50)
+ [

## Custom logging
](#monitor-logging-custom)
+ [

## Logging application-specific messages using the custom script logger
](#monitor-logging-script)
+ [

## Enabling the progress bar to show job progress
](#monitor-logging-progress)
+ [

## Security configuration with Amazon CloudWatch logging
](#monitor-security-config-logging)
+ [

# Enabling continuous logging for AWS Glue 4.0 and earlier jobs
](monitor-continuous-logging-enable.md)
+ [

# Viewing logs for AWS Glue jobs
](monitor-continuous-logging-view.md)

# Enabling continuous logging for AWS Glue 4.0 and earlier jobs


**Note**  
 In AWS Glue 4.0 and earlier versions, continuous logging was an available feature. However, with the introduction of AWS Glue 5.0, all jobs have real-time logging capability. For more details on the logging capabilities and configuration options in AWS Glue 5.0, see [Logging for AWS Glue jobs](https://docs.aws.amazon.com/glue/latest/dg/monitor-continuous-logging.html). 

You can enable continuous logging using the AWS Glue console or through the AWS Command Line Interface (AWS CLI). 

You can enable continuous logging when you create a new job, edit an existing job, or enable it through the AWS CLI.

You can also specify custom configuration options such as the Amazon CloudWatch log group name, CloudWatch log stream prefix before the AWS Glue job run ID driver/executor ID, and log conversion pattern for log messages. These configurations help you to set aggregate logs in custom CloudWatch log groups with different expiration policies, and analyze them further with custom log stream prefixes and conversions patterns. 

**Topics**
+ [

## Using the AWS Management Console
](#monitor-continuous-logging-enable-console)
+ [

## Logging application-specific messages using the custom script logger
](#monitor-continuous-logging-script)
+ [

## Enabling the progress bar to show job progress
](#monitor-continuous-logging-progress)
+ [

## Security configuration with continuous logging
](#monitor-logging-encrypt-log-data)

## Using the AWS Management Console


Follow these steps to use the console to enable continuous logging when creating or editing an AWS Glue job.

**To create a new AWS Glue job with continuous logging**

1. Sign in to the AWS Management Console and open the AWS Glue console at [https://console.aws.amazon.com/glue/](https://console.aws.amazon.com/glue/).

1. In the navigation pane, choose **ETL jobs**.

1. Choose **Visual ETL**.

1. In the **Job details** tab, expand the **Advanced properties** section.

1. Under **Continuous logging** select **Enable logs in CloudWatch**.

**To enable continuous logging for an existing AWS Glue job**

1. Open the AWS Glue console at [https://console.aws.amazon.com/glue/](https://console.aws.amazon.com/glue/).

1. In the navigation pane, choose **Jobs**.

1. Choose an existing job from the **Jobs** list.

1. Choose **Action**, **Edit job**.

1. In the **Job details** tab, expand the **Advanced properties** section.

1. Under **Continuous logging** select **Enable logs in CloudWatch**.

### Using the AWS CLI
Using the AWS CLI

To enable continuous logging, you pass in job parameters to an AWS Glue job. Pass the following special job parameters similar to other AWS Glue job parameters. For more information, see [Using job parameters in AWS Glue jobs](aws-glue-programming-etl-glue-arguments.md).

```
'--enable-continuous-cloudwatch-log': 'true'
```

You can specify a custom Amazon CloudWatch log group name. If not specified, the default log group name is `/aws-glue/jobs/logs-v2`.

```
'--continuous-log-logGroup': 'custom_log_group_name'
```

You can specify a custom Amazon CloudWatch log stream prefix. If not specified, the default log stream prefix is the job run ID.

```
'--continuous-log-logStreamPrefix': 'custom_log_stream_prefix'
```

You can specify a custom continuous logging conversion pattern. If not specified, the default conversion pattern is `%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n`. Note that the conversion pattern only applies to driver logs and executor logs. It does not affect the AWS Glue progress bar.

```
'--continuous-log-conversionPattern': 'custom_log_conversion_pattern'
```

## Logging application-specific messages using the custom script logger
Using the custom script logger

You can use the AWS Glue logger to log any application-specific messages in the script that are sent in real time to the driver log stream.

The following example shows a Python script.

```
from awsglue.context import GlueContext
from pyspark.context import SparkContext

sc = SparkContext()
glueContext = GlueContext(sc)
logger = glueContext.get_logger()
logger.info("info message")
logger.warn("warn message")
logger.error("error message")
```

The following example shows a Scala script.

```
import com.amazonaws.services.glue.log.GlueLogger

object GlueApp {
  def main(sysArgs: Array[String]) {
    val logger = new GlueLogger
    logger.info("info message")
    logger.warn("warn message")
    logger.error("error message")
  }
}
```

## Enabling the progress bar to show job progress
Enabling the progress bar

AWS Glue provides a real-time progress bar under the `JOB_RUN_ID-progress-bar` log stream to check AWS Glue job run status. Currently it supports only jobs that initialize `glueContext`. If you run a pure Spark job without initializing `glueContext`, the AWS Glue progress bar does not appear.

The progress bar shows the following progress update every 5 seconds.

```
Stage Number (Stage Name): > (numCompletedTasks + numActiveTasks) / totalNumOfTasksInThisStage]
```

## Security configuration with continuous logging
Encrypting log data

If a security configuration is enabled for CloudWatch logs, AWS Glue will create a log group named as follows for continuous logs:

```
<Log-Group-Name>-<Security-Configuration-Name>
```

The default and custom log groups will be as follows:
+ The default continuous log group will be `/aws-glue/jobs/error-<Security-Configuration-Name>`
+ The custom continuous log group will be `<custom-log-group-name>-<Security-Configuration-Name>`

You need to add the `logs:AssociateKmsKey` to your IAM role permissions, if you enable a security configuration with CloudWatch Logs. If that permission is not included, continuous logging will be disabled. Also, to configure the encryption for the CloudWatch Logs, follow the instructions at [Encrypt Log Data in CloudWatch Logs Using AWS Key Management Service](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/encrypt-log-data-kms.html) in the *Amazon CloudWatch Logs User Guide*.

For more information on creating security configurations, see [Managing security configurations on the AWS Glue console](console-security-configurations.md).

**Note**  
 You may incur additional charges when you enable logging and additional CloudWatch log events are created. For more information, see [ Amazon CloudWatch pricing ](https://aws.amazon.com/cloudwatch/pricing/). 

# Viewing logs for AWS Glue jobs


You can view real-time logs using the AWS Glue console or the Amazon CloudWatch console.

**To view real-time logs using the AWS Glue console dashboard**

1. Sign in to the AWS Management Console and open the AWS Glue console at [https://console.aws.amazon.com/glue/](https://console.aws.amazon.com/glue/).

1. In the navigation pane, choose **Jobs**.

1. Add or start an existing job. Choose **Action**, **Run job**.

   When you start running a job, you navigate to a page that contains information about the running job:
   + The **Logs** tab shows the older aggregated application logs.
   + The **Logs** tab shows a real-time progress bar when the job is running with `glueContext` initialized.
   + The **Logs** tab also contains the **Driver logs**, which capture real-time Apache Spark driver logs, and application logs from the script logged using the AWS Glue application logger when the job is running.

1. For older jobs, you can also view the real-time logs under the **Job History** view by choosing **Logs**. This action takes you to the CloudWatch console that shows all Spark driver, executor, and progress bar log streams for that job run.

**To view real-time logs using the CloudWatch console dashboard**

1. Open the CloudWatch console at [https://console.aws.amazon.com/cloudwatch/](https://console.aws.amazon.com/cloudwatch/).

1. In the navigation pane, choose **Log**.

1. Choose the **/aws-glue/jobs/error/** log group.

1. In the **Filter** box, paste the job run ID.

   You can view the driver logs, executor logs, and progress bar (if using the **Standard filter**).