

# Query Application Load Balancer logs
<a name="application-load-balancer-logs"></a>

An Application Load Balancer is a load balancing option for Elastic Load Balancing that enables traffic distribution in a microservices deployment using containers. Querying Application Load Balancer logs allows you to see the source of traffic, latency, and bytes transferred to and from Elastic Load Balancing instances and backend applications. For more information, see [Access logs for your Application Load Balancer](https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-access-logs.html) and [Connection logs for your Application Load Balancer](https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-connection-logs.html) in the *User Guide for Application Load Balancers*.

## Prerequisites
<a name="application-load-balancer-logs-prerequisites"></a>
+ Enable [access logging](https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-access-logs.html) or [connection logging ](https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-connection-logs.html) so that Application Load Balancer logs can be saved to your Amazon S3 bucket.
+ A database to hold the table that you will create for Athena. To create a database, you can use the Athena or AWS Glue console. For more information, see [Create databases in Athena](creating-databases.md) in this guide or [Working with databases on the AWS glue console](https://docs.aws.amazon.com/glue/latest/dg/console-databases.html) in the *AWS Glue Developer Guide*. 

**Topics**
+ [Prerequisites](#application-load-balancer-logs-prerequisites)
+ [Create the table for ALB access logs](create-alb-access-logs-table.md)
+ [Create the table for ALB access logs in Athena using partition projection](create-alb-access-logs-table-partition-projection.md)
+ [Example queries for ALB access logs](query-alb-access-logs-examples.md)
+ [Create the table for ALB connection logs](create-alb-connection-logs-table.md)
+ [Create the table for ALB connection logs in Athena using partition projection](create-alb-connection-logs-table-partition-projection.md)
+ [Example queries for ALB connection logs](query-alb-connection-logs-examples.md)
+ [Additional resources](application-load-balancer-logs-additional-resources.md)

# Create the table for ALB access logs
<a name="create-alb-access-logs-table"></a>

1. Copy and paste the following `CREATE TABLE` statement into the query editor in the Athena console, and then modify it as necessary for your own log entry requirements. For information about getting started with the Athena console, see [Get started](getting-started.md). Replace the path in the `LOCATION` clause with your Amazon S3 access log folder location. For more information about access log file location, see [Access log files](https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-access-logs.html#access-log-file-format) in the *User Guide for Application Load Balancers*.

   For information about each log file field, see [Access log entries](https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-access-logs.html#access-log-entry-format) in the *User Guide for Application Load Balancers*.
**Note**  
The following example `CREATE TABLE` statement includes the recently added `classification`, `classification_reason`, and `conn_trace_id` ('traceability ID', or TID) columns. To create a table for Application Load Balancer access logs that do not contain these entries, remove the corresponding columns from the `CREATE TABLE` statement and modify the regular expression accordingly. 

   ```
   CREATE EXTERNAL TABLE IF NOT EXISTS alb_access_logs (
               type string,
               time string,
               elb string,
               client_ip string,
               client_port int,
               target_ip string,
               target_port int,
               request_processing_time double,
               target_processing_time double,
               response_processing_time double,
               elb_status_code int,
               target_status_code string,
               received_bytes bigint,
               sent_bytes bigint,
               request_verb string,
               request_url string,
               request_proto string,
               user_agent string,
               ssl_cipher string,
               ssl_protocol string,
               target_group_arn string,
               trace_id string,
               domain_name string,
               chosen_cert_arn string,
               matched_rule_priority string,
               request_creation_time string,
               actions_executed string,
               redirect_url string,
               lambda_error_reason string,
               target_port_list string,
               target_status_code_list string,
               classification string,
               classification_reason string,
               conn_trace_id string
               )
               ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
               WITH SERDEPROPERTIES (
               'serialization.format' = '1',
               'input.regex' = 
           '([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*)[:-]([0-9]*) ([-.0-9]*) ([-.0-9]*) ([-.0-9]*) (|[-0-9]*) (-|[-0-9]*) ([-0-9]*) ([-0-9]*) \"([^ ]*) (.*) (- |[^ ]*)\" \"([^\"]*)\" ([A-Z0-9-_]+) ([A-Za-z0-9.-]*) ([^ ]*) \"([^\"]*)\" \"([^\"]*)\" \"([^\"]*)\" ([-.0-9]*) ([^ ]*) \"([^\"]*)\" \"([^\"]*)\" \"([^ ]*)\" \"([^\\s]+?)\" \"([^\\s]+)\" \"([^ ]*)\" \"([^ ]*)\" ?([^ ]*)? ?( .*)?'
               )
               LOCATION 's3://amzn-s3-demo-bucket/access-log-folder-path/'
   ```
**Note**  
We suggest that the pattern *`?( .*)?`* at the end of the `input.regex` parameter always remain in place to handle future log entries in case new ALB log fields are added. 

1. Run the query in the Athena console. After the query completes, Athena registers the `alb_access_logs` table, making the data in it ready for you to issue queries.

# Create the table for ALB access logs in Athena using partition projection
<a name="create-alb-access-logs-table-partition-projection"></a>

Because ALB access logs have a known structure whose partition scheme you can specify in advance, you can reduce query runtime and automate partition management by using the Athena partition projection feature. Partition projection automatically adds new partitions as new data is added. This removes the need for you to manually add partitions by using `ALTER TABLE ADD PARTITION`. 

The following example `CREATE TABLE` statement automatically uses partition projection on ALB access logs from a specified date until the present for a single AWS region. The statement is based on the example in the previous section but adds `PARTITIONED BY` and `TBLPROPERTIES` clauses to enable partition projection. In the `LOCATION` and `storage.location.template` clauses, replace the placeholders with values that identify the Amazon S3 bucket location of your ALB access logs. For more information about access log file location, see [Access log files](https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-access-logs.html#access-log-file-format) in the *User Guide for Application Load Balancers*. For `projection.day.range`, replace *2022*/*01*/*01* with the starting date that you want to use. After you run the query successfully, you can query the table. You do not have to run `ALTER TABLE ADD PARTITION` to load the partitions. For information about each log file field, see [Access log entries](https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-access-logs.html#access-log-entry-format). 

```
CREATE EXTERNAL TABLE IF NOT EXISTS alb_access_logs (
            type string,
            time string,
            elb string,
            client_ip string,
            client_port int,
            target_ip string,
            target_port int,
            request_processing_time double,
            target_processing_time double,
            response_processing_time double,
            elb_status_code int,
            target_status_code string,
            received_bytes bigint,
            sent_bytes bigint,
            request_verb string,
            request_url string,
            request_proto string,
            user_agent string,
            ssl_cipher string,
            ssl_protocol string,
            target_group_arn string,
            trace_id string,
            domain_name string,
            chosen_cert_arn string,
            matched_rule_priority string,
            request_creation_time string,
            actions_executed string,
            redirect_url string,
            lambda_error_reason string,
            target_port_list string,
            target_status_code_list string,
            classification string,
            classification_reason string,
            conn_trace_id string
            )
            PARTITIONED BY
            (
             day STRING
            )
            ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
            WITH SERDEPROPERTIES (
            'serialization.format' = '1',
            'input.regex' = 
        '([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*)[:-]([0-9]*) ([-.0-9]*) ([-.0-9]*) ([-.0-9]*) (|[-0-9]*) (-|[-0-9]*) ([-0-9]*) ([-0-9]*) \"([^ ]*) (.*) (- |[^ ]*)\" \"([^\"]*)\" ([A-Z0-9-_]+) ([A-Za-z0-9.-]*) ([^ ]*) \"([^\"]*)\" \"([^\"]*)\" \"([^\"]*)\" ([-.0-9]*) ([^ ]*) \"([^\"]*)\" \"([^\"]*)\" \"([^ ]*)\" \"([^\\s]+?)\" \"([^\\s]+)\" \"([^ ]*)\" \"([^ ]*)\" ?([^ ]*)? ?( .*)?'
            )
            LOCATION 's3://amzn-s3-demo-bucket/AWSLogs/<ACCOUNT-NUMBER>/elasticloadbalancing/<REGION>/'
            TBLPROPERTIES
            (
             "projection.enabled" = "true",
             "projection.day.type" = "date",
             "projection.day.range" = "2022/01/01,NOW",
             "projection.day.format" = "yyyy/MM/dd",
             "projection.day.interval" = "1",
             "projection.day.interval.unit" = "DAYS",
             "storage.location.template" = "s3://amzn-s3-demo-bucket/AWSLogs/<ACCOUNT-NUMBER>/elasticloadbalancing/<REGION>/${day}"
            )
```

For more information about partition projection, see [Use partition projection with Amazon Athena](partition-projection.md).

**Note**  
We suggest that the pattern *?( .\$1)?* at the end of the `input.regex` parameter always remain in place to handle future log entries in case new ALB log fields are added. 

# Example queries for ALB access logs
<a name="query-alb-access-logs-examples"></a>

The following query counts the number of HTTP GET requests received by the load balancer grouped by the client IP address:

```
SELECT COUNT(request_verb) AS
 count,
 request_verb,
 client_ip
FROM alb_access_logs
GROUP BY request_verb, client_ip
LIMIT 100;
```

Another query shows the URLs visited by Safari browser users:

```
SELECT request_url
FROM alb_access_logs
WHERE user_agent LIKE '%Safari%'
LIMIT 10;
```

The following query shows records that have ELB status code values greater than or equal to 500.

```
SELECT * FROM alb_access_logs
WHERE elb_status_code >= 500
```

The following example shows how to parse the logs by `datetime`:

```
SELECT client_ip, sum(received_bytes) 
FROM alb_access_logs
WHERE parse_datetime(time,'yyyy-MM-dd''T''HH:mm:ss.SSSSSS''Z') 
     BETWEEN parse_datetime('2018-05-30-12:00:00','yyyy-MM-dd-HH:mm:ss') 
     AND parse_datetime('2018-05-31-00:00:00','yyyy-MM-dd-HH:mm:ss') 
GROUP BY client_ip;
```

The following query queries the table that uses partition projection for all ALB access logs from the specified day.

```
SELECT * 
FROM alb_access_logs 
WHERE day = '2022/02/12'
```

# Create the table for ALB connection logs
<a name="create-alb-connection-logs-table"></a>

1. Copy and paste the following example `CREATE TABLE` statement into the query editor in the Athena console, and then modify it as necessary for your own log entry requirements. For information about getting started with the Athena console, see [Get started](getting-started.md). Replace the path in the `LOCATION` clause with your Amazon S3 connection log folder location. For more information about connection log file location, see [Connection log files](https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-connection-logs.html#connection-log-file-format) in the *User Guide for Application Load Balancers*. For information about each log file field, see [Connection log entries](https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-connection-logs.html#connection-log-entry-format). 

   ```
   CREATE EXTERNAL TABLE IF NOT EXISTS alb_connection_logs (
            time string,
            client_ip string,
            client_port int,
            listener_port int,
            tls_protocol string,
            tls_cipher string,
            tls_handshake_latency double,
            leaf_client_cert_subject string,
            leaf_client_cert_validity string,
            leaf_client_cert_serial_number string,
            tls_verify_status string,
            conn_trace_id string
            ) 
            ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
            WITH SERDEPROPERTIES (
            'serialization.format' = '1',
            'input.regex' =
             '([^ ]*) ([^ ]*) ([0-9]*) ([0-9]*) ([A-Za-z0-9.-]*) ([^ ]*) ([-.0-9]*) \"([^\"]*)\" ([^ ]*) ([^ ]*) ([^ ]*) ?([^ ]*)?( .*)?'
            )
            LOCATION 's3://amzn-s3-demo-bucket/connection-log-folder-path/'
   ```

1. Run the query in the Athena console. After the query completes, Athena registers the `alb_connection_logs` table, making the data in it ready for you to issue queries.

# Create the table for ALB connection logs in Athena using partition projection
<a name="create-alb-connection-logs-table-partition-projection"></a>

Because ALB connection logs have a known structure whose partition scheme you can specify in advance, you can reduce query runtime and automate partition management by using the Athena partition projection feature. Partition projection automatically adds new partitions as new data is added. This removes the need for you to manually add partitions by using `ALTER TABLE ADD PARTITION`. 

The following example `CREATE TABLE` statement automatically uses partition projection on ALB connection logs from a specified date until the present for a single AWS region. The statement is based on the example in the previous section but adds `PARTITIONED BY` and `TBLPROPERTIES` clauses to enable partition projection. In the `LOCATION` and `storage.location.template` clauses, replace the placeholders with values that identify the Amazon S3 bucket location of your ALB connection logs. For more information about connection log file location, see [Connection log files](https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-connection-logs.html#connection-log-file-format) in the *User Guide for Application Load Balancers*. For `projection.day.range`, replace *2023*/*01*/*01* with the starting date that you want to use. After you run the query successfully, you can query the table. You do not have to run `ALTER TABLE ADD PARTITION` to load the partitions. For information about each log file field, see [Connection log entries](https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-connection-logs.html#connection-log-entry-format).

```
CREATE EXTERNAL TABLE IF NOT EXISTS alb_connection_logs (
         time string,
         client_ip string,
         client_port int,
         listener_port int,
         tls_protocol string,
         tls_cipher string,
         tls_handshake_latency double,
         leaf_client_cert_subject string,
         leaf_client_cert_validity string,
         leaf_client_cert_serial_number string,
         tls_verify_status string,
         conn_trace_id string
         )
            PARTITIONED BY
            (
             day STRING
            )
            ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
            WITH SERDEPROPERTIES (
            'serialization.format' = '1',
            'input.regex' =
             '([^ ]*) ([^ ]*) ([0-9]*) ([0-9]*) ([A-Za-z0-9.-]*) ([^ ]*) ([-.0-9]*) \"([^\"]*)\" ([^ ]*) ([^ ]*) ([^ ]*) ?([^ ]*)?( .*)?'
            )
            LOCATION 's3://amzn-s3-demo-bucket/AWSLogs/<ACCOUNT-NUMBER>/elasticloadbalancing/<REGION>/'
            TBLPROPERTIES
            (
             "projection.enabled" = "true",
             "projection.day.type" = "date",
             "projection.day.range" = "2023/01/01,NOW",
             "projection.day.format" = "yyyy/MM/dd",
             "projection.day.interval" = "1",
             "projection.day.interval.unit" = "DAYS",
             "storage.location.template" = "s3://amzn-s3-demo-bucket/AWSLogs/<ACCOUNT-NUMBER>/elasticloadbalancing/<REGION>/${day}"
            )
```

For more information about partition projection, see [Use partition projection with Amazon Athena](partition-projection.md).

# Example queries for ALB connection logs
<a name="query-alb-connection-logs-examples"></a>

The following query count occurrences where the value for `tls_verify_status` was not `'Success'`, grouped by client IP address:

```
SELECT DISTINCT client_ip, count() AS count FROM alb_connection_logs
WHERE tls_verify_status != 'Success'
GROUP BY client_ip
ORDER BY count() DESC;
```

The following query searches occurrences where the value for `tls_handshake_latency` was over 2 seconds in the specified time range:

```
SELECT * FROM alb_connection_logs
WHERE 
  (
    parse_datetime(time, 'yyyy-MM-dd''T''HH:mm:ss.SSSSSS''Z') 
    BETWEEN 
    parse_datetime('2024-01-01-00:00:00', 'yyyy-MM-dd-HH:mm:ss') 
    AND 
    parse_datetime('2024-03-20-00:00:00', 'yyyy-MM-dd-HH:mm:ss') 
  ) 
  AND 
    (tls_handshake_latency >= 2.0);
```

# Additional resources
<a name="application-load-balancer-logs-additional-resources"></a>

For more information about using ALB logs, see the following resources.
+ [How do I analyze my Application Load Balancer access logs using Amazon Athena](https://repost.aws/knowledge-center/athena-analyze-access-logs) in the *AWS Knowledge Center*.
+ For information about HTTP status codes in Elastic Load Balancing, see [Troubleshoot your application load balancers](https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-troubleshooting.html) in the *User Guide for Application Load Balancers*.
+ [Catalog and analyze Application Load Balancer logs more efficiently with AWS Glue custom classifiers and Amazon Athena](https://aws.amazon.com/blogs/big-data/catalog-and-analyze-application-load-balancer-logs-more-efficiently-with-aws-glue-custom-classifiers-and-amazon-athena/) in the *AWS Big Data Blog*.