

# Analyze bucket access logs with Amazon Athena in Lightsail
<a name="amazon-lightsail-using-bucket-access-logs"></a>

In this guide, we show you how to identify requests to a bucket using access logs. For more information, see [Bucket access logs](amazon-lightsail-bucket-access-logs.md).

**Contents**
+ [Query access logs for requests using Amazon Athena](#querying-access-logs-for-requests)
+ [Identify object access requests using Amazon S3 access logs](#identifying-object-access-requests)

## Query access logs for requests using Amazon Athena
<a name="querying-access-logs-for-requests"></a>

You can use Amazon Athena to query and identify requests to a bucket in access logs.

Lightsail stores access logs as objects in a Lightsail bucket. It is often easier to use a tool that can analyze the logs. Athena supports analysis of objects and can be used to query access logs.

**Example**

The following example shows how you can query bucket server access logs in Amazon Athena.

**Note**  
To specify a bucket location in an Athena query, you need to format the target bucket name and target prefix where your logs are delivered as an S3 URI, as follows: `s3://{{amzn-s3-demo-bucket1}}-logs/prefix/`

1. Open the Athena console at [https://console.aws.amazon.com/athena/](https://console.aws.amazon.com/athena/).

1. In the **Query Editor**, run a command similar to the following.

   ```
   create database bucket_access_logs_db
   ```
**Note**  
It's a best practice to create the database in the same AWS Region as your S3 bucket.

1. In the **Query Editor**, run a command similar to the following to create a table schema in the database that you created in step 2. The `STRING` and `BIGINT` data type values are the access log properties. You can query these properties in Athena. For `LOCATION`, enter the bucket and prefix path as noted earlier.

   ```
   CREATE EXTERNAL TABLE `s3_access_logs_db.amzn-s3-demo-bucket_logs`(
     `bucketowner` STRING, 
     `bucket_name` STRING, 
     `requestdatetime` STRING, 
     `remoteip` STRING, 
     `requester` STRING, 
     `requestid` STRING, 
     `operation` STRING, 
     `key` STRING, 
     `request_uri` STRING, 
     `httpstatus` STRING, 
     `errorcode` STRING, 
     `bytessent` BIGINT, 
     `objectsize` BIGINT, 
     `totaltime` STRING, 
     `turnaroundtime` STRING, 
     `referrer` STRING, 
     `useragent` STRING, 
     `versionid` STRING, 
     `hostid` STRING, 
     `sigv` STRING, 
     `ciphersuite` STRING, 
     `authtype` STRING, 
     `endpoint` STRING, 
     `tlsversion` STRING)
   ROW FORMAT SERDE 
     'org.apache.hadoop.hive.serde2.RegexSerDe' 
   WITH SERDEPROPERTIES ( 
     'input.regex'='([^ ]*) ([^ ]*) \\[(.*?)\\] ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) (\"[^\"]*\"|-) (-|[0-9]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) (\"[^\"]*\"|-) ([^ ]*)(?: ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*))?.*$') 
   STORED AS INPUTFORMAT 
     'org.apache.hadoop.mapred.TextInputFormat' 
   OUTPUTFORMAT 
     'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
   LOCATION
     '{{s3://amzn-s3-demo-bucket1-logs/prefix/}}'
   ```

1. In the navigation pane, under **Database**, choose your database.

1. Under **Tables**, choose **Preview** table next to your table name.

   In the **Results** pane, you should see data from the server access logs, such as `bucketowner`, `bucket`, `requestdatetime`, and so on. This means that you successfully created the Athena table. You can now query the bucket server access logs.

**Example — Show who deleted an object and when (timestamp, IP address, and IAM user)**

```
SELECT RequestDateTime, RemoteIP, Requester, Key 
FROM s3_access_logs_db.amzn-s3-demo-bucket_logs 
WHERE key = 'images/picture.jpg' AND operation like '%DELETE%';
```

**Example — Show all operations that were performed by an IAM user**

```
SELECT * 
FROM s3_access_logs_db.amzn-s3-demo-bucket_logs 
WHERE requester='arn:aws:iam::123456789123:user/user_name';
```

**Example — Show all operations that were performed on an object in a specific time period**

```
SELECT *
FROM s3_access_logs_db.amzn-s3-demo-bucket_logs
WHERE Key='prefix/images/picture.jpg' 
    AND parse_datetime(RequestDateTime,'dd/MMM/yyyy:HH:mm:ss Z')
    BETWEEN parse_datetime('2017-02-18:07:00:00','yyyy-MM-dd:HH:mm:ss')
    AND parse_datetime('2017-02-18:08:00:00','yyyy-MM-dd:HH:mm:ss');
```

**Example — Show how much data was transferred by a specific IP address in a specific time period**

```
SELECT SUM(bytessent) AS uploadTotal,
      SUM(objectsize) AS downloadTotal,
      SUM(bytessent + objectsize) AS Total
FROM s3_access_logs_db.amzn-s3-demo-bucket_logs
WHERE RemoteIP='1.2.3.4'
AND parse_datetime(RequestDateTime,'dd/MMM/yyyy:HH:mm:ss Z')
BETWEEN parse_datetime('2017-06-01','yyyy-MM-dd')
AND parse_datetime('2017-07-01','yyyy-MM-dd');
```

## Identify object access requests using Amazon S3 access logs
<a name="identifying-object-access-requests"></a>

You can use queries on access logs to identify object access requests, for operations such as *GET*, *PUT*, and *DELETE*, and discover further information about those requests.

The following Amazon Athena query example shows how to get all `PUT` object requests for a bucket from the server access log.

**Example — Show all requesters that are sending PUT object requests in a certain period**

```
SELECT Bucket, Requester, RemoteIP, Key, HTTPStatus, ErrorCode, RequestDateTime
FROM s3_access_logs_db
WHERE Operation='REST.PUT.OBJECT' AND
parse_datetime(RequestDateTime,'dd/MMM/yyyy:HH:mm:ss Z') 
BETWEEN parse_datetime('2019-07-01:00:42:42','yyyy-MM-dd:HH:mm:ss')
AND 
parse_datetime('2019-07-02:00:42:42','yyyy-MM-dd:HH:mm:ss')
```

The following Amazon Athena query example shows how to get all GET object requests for Amazon S3 from the server access log.

**Example — Show all requesters that are sending GET object requests in a certain period**

```
SELECT Bucket, Requester, RemoteIP, Key, HTTPStatus, ErrorCode, RequestDateTime
FROM s3_access_logs_db
WHERE Operation='REST.GET.OBJECT' AND
parse_datetime(RequestDateTime,'dd/MMM/yyyy:HH:mm:ss Z') 
BETWEEN parse_datetime('2019-07-01:00:42:42','yyyy-MM-dd:HH:mm:ss')
AND 
parse_datetime('2019-07-02:00:42:42','yyyy-MM-dd:HH:mm:ss')
```

The following Amazon Athena query example shows how to get all anonymous requests to your S3 buckets from the server access log.

**Example — Show all anonymous requesters that are making requests to a bucket in a certain period**

```
SELECT Bucket, Requester, RemoteIP, Key, HTTPStatus, ErrorCode, RequestDateTime
FROM s3_access_logs_db.amzn-s3-demo-bucket_logs
WHERE Requester IS NULL AND
parse_datetime(RequestDateTime,'dd/MMM/yyyy:HH:mm:ss Z') 
BETWEEN parse_datetime('2019-07-01:00:42:42','yyyy-MM-dd:HH:mm:ss')
AND 
parse_datetime('2019-07-02:00:42:42','yyyy-MM-dd:HH:mm:ss')
```

**Note**  
You can modify the date range to suit your needs.
These query examples might also be useful for security monitoring. You can review the results for `PutObject` or `GetObject` calls from unexpected or unauthorized IP addresses/requesters and for identifying any anonymous requests to your buckets.
This query only retrieves information from the time at which logging was enabled.