# Data repository tasks
<a name="data-repository-tasks"></a>

By using import and export data repository tasks, you can manage the transfer of data and metadata between your FSx for Lustre file system and any of its durable data repositories on Amazon S3.

*Data repository tasks* optimize data and metadata transfers between your FSx for Lustre file system and a data repository on S3. One way that they do this is by tracking changes between your Amazon FSx file system and its linked data repository. They also do this by using parallel transfer techniques to transfer data at speeds up to hundreds of GBps. You create and view data repository tasks using the Amazon FSx console, the AWS CLI, and the Amazon FSx API. 

Data repository tasks maintain the file system's Portable Operating System Interface (POSIX) metadata, including ownership, permissions, and timestamps. Because the tasks maintain this metadata, you can implement and maintain access controls between your FSx for Lustre file system and its linked data repositories.

You can use a release data repository task to free up file system space for new files by releasing files exported to Amazon S3. The released file's content is removed, but the metadata of the released file remains on the file system. Users and applications can still access a released file by reading the file again. When the user or application reads the released file, FSx for Lustre transparently retrieves the file content from Amazon S3.

## Types of data repository tasks
<a name="data-repo-task-types"></a>

There are three types of data repository tasks:
+ **Export** data repository tasks export from your Lustre file system to a linked S3 bucket.
+ **Import** data repository tasks import from a linked S3 bucket to your Lustre file system.
+ **Release** data repository tasks release files exported to a linked S3 bucket from your Lustre file system.

For more information, see [Creating a data repository task](creating-data-repo-task.md).

**Topics**
+ [Types of data repository tasks](#data-repo-task-types)
+ [Understanding a task's status and details](data-repo-task-status.md)
+ [Using data repository tasks](managing-data-repo-task.md)
+ [Working with task completion reports](task-completion-report.md)
+ [Troubleshooting data repository task failures](failed-tasks.md)

# Understanding a task's status and details
<a name="data-repo-task-status"></a>

 A data repository task has descriptive information and a lifecycle status.

After a task is created, you can view the following detailed information for a data repository task using the Amazon FSx console, CLI, or API:
+ The task type: 
  + `EXPORT_TO_REPOSITORY` indicates an export task.
  + `IMPORT_METADATA_FROM_REPOSITORY` indicates an import task.
  + `RELEASE_DATA_FROM_FILESYSTEM` indicates a release task.
+ The file system that the task ran on.
+ The task creation time.
+ The task status.
+ The total number of files that the task processed.
+ The total number of files that the task successfully processed.
+ The total number of files that the task failed to process. This value is greater than zero when the task status is FAILED. Detailed information about files that failed is available in a task completion report. For more information, see [Working with task completion reports](task-completion-report.md).
+ The time that the task started.
+ The time that the task status was last updated. Task status is updated every 30 seconds.

 A data repository task can have one of the following statuses:
+ **PENDING** indicates that Amazon FSx has not started the task.
+ **EXECUTING** indicates that Amazon FSx is processing the task.
+ **FAILED** indicates that Amazon FSx didn't successfully process the task. For example, there might be files that the task failed to process. The task details provide more information about the failure. For more information about failed tasks, see [Troubleshooting data repository task failures](failed-tasks.md).
+ **SUCCEEDED** indicates that Amazon FSx completed the task successfully.
+ **CANCELED** indicates that the task was canceled and not completed.
+ **CANCELING** indicates that Amazon FSx is in the process of canceling the task.

Data repository task information is kept for 14 days after the task finishes. For more information about accessing existing data repository tasks, see [Accessing data repository tasks](view-data-repo-tasks.md).

# Using data repository tasks
<a name="managing-data-repo-task"></a>

In the following sections, you can find detailed information about managing data repository tasks. You can create, duplicate, view details, and cancel data repository tasks using the Amazon FSx console, CLI, or API.

**Topics**
+ [Creating a data repository task](creating-data-repo-task.md)
+ [Duplicating a task](recreate-task.md)
+ [Accessing data repository tasks](view-data-repo-tasks.md)
+ [Canceling a data repository task](cancel-data-repo-task.md)

# Creating a data repository task
<a name="creating-data-repo-task"></a>

You can create a data repository task by using the Amazon FSx console, CLI, or API. After you create a task, you can view the task's progress and status by using the console, CLI, or API.

You can create three types of data repository tasks:
+ The **Export** data repository task exports from your Lustre file system to a linked S3 bucket. For more information, see [Using data repository tasks to export changes](export-data-repo-task-dra.md).
+ The **Import** data repository task imports from a linked S3 bucket to your Lustre file system. For more information, see [Using data repository tasks to import changes](import-data-repo-task-dra.md).
+ The **Release** data repository task releases files from your Lustre file system that have been exported to a linked S3 bucket. For more information, see [Using data repository tasks to release files](release-files-task.md).

# Duplicating a task
<a name="recreate-task"></a>

You can duplicate an existing data repository task in the Amazon FSx console. When you duplicate a task, an exact copy of the existing task is displayed in the **Create import data repository task** or **Create export data repository task** page. You can make changes to the paths to export or import, as needed, before creating and running the new task.

**Note**  
A request to run a duplicate task will fail if an exact copy of that task is already running. An exact copy of a task that is already running contains the same file system path or paths in the case of an export task or the same data repository paths in the case of an import task.

You can duplicate a task from the task details view, the **Data Repository Tasks** pane in the **Data Repository** tab for the file system, or from the **Data repository tasks** page.

**To duplicate an existing task**

1. Choose a task on the **Data Repository Tasks** pane in the **Data Repository** tab for the file system.

1. Choose **Duplicate task**. Depending on which type of task you chose, the **Create import data repository task** or **Create export data repository task** page appears. All settings for the new task are identical to those for the task that you're duplicating.

1. Change or add the paths that you want to import from or export to.

1. Choose **Create**.

# Accessing data repository tasks
<a name="view-data-repo-tasks"></a>

After you create a data repository task, you can access the task, and all existing tasks in your account, using the Amazon FSx console, CLI, and API. Amazon FSx provides the following detailed task information: 
+ All existing tasks.
+ All tasks for a specific file system.
+ All tasks for a specific data repository association.
+ All tasks with a specific lifecycle status. For more information about task lifecycle status values, see [Understanding a task's status and details](data-repo-task-status.md).

You can access all existing data repository tasks in your account by using the Amazon FSx console, CLI, or API, as described following.

## To view data repository tasks and task details (console)
<a name="access-all-tasks-console"></a>

1. Open the Amazon FSx console at [https://console.aws.amazon.com/fsx/](https://console.aws.amazon.com/fsx/).

1. On the navigation pane, choose the file system that you want to view data repository tasks for. The file system details page appears.

1. On the file system details page, choose the **Data repository** tab. Any tasks for this file system appear on the **Data repository tasks** panel.

1. To see a task's details, choose **Task ID** or **Task name** in the **Data repository tasks** panel. The task detail page appears.  
![\[Data repository tasks page\]](http://docs.aws.amazon.com/fsx/latest/LustreGuide/images/task-details-rprt.png)

## To retrieve data repository tasks and task details (CLI)
<a name="task-details-cli"></a>

Using the Amazon FSx [https://docs.aws.amazon.com/cli/latest/reference/fsx/describe-data-repository-tasks.html](https://docs.aws.amazon.com/cli/latest/reference/fsx/describe-data-repository-tasks.html) CLI command, you can view all the data repository tasks, and their details, in your account. [https://docs.aws.amazon.com/fsx/latest/APIReference/API_DescribeDataRepositoryTasks.html](https://docs.aws.amazon.com/fsx/latest/APIReference/API_DescribeDataRepositoryTasks.html) is the equivalent API command.
+ Use the following command to view all data repository task objects in your account.

  ```
  aws fsx describe-data-repository-tasks
  ```

  If the command is successful, Amazon FSx returns the response in JSON format.

  ```
  {
      "DataRepositoryTasks": [
          {
              "Lifecycle": "EXECUTING",
              "Paths": [],
              "Report": {
                  "Path":"s3://dataset-01/reports",
                  "Format":"REPORT_CSV_20191124",
                  "Enabled":true,
                  "Scope":"FAILED_FILES_ONLY"
              },
              "StartTime": 1591863862.288,
              "EndTime": ,
              "Type": "EXPORT_TO_REPOSITORY",
              "Tags": [],
              "TaskId": "task-0123456789abcdef3",
              "Status": {
                  "SucceededCount": 4255,
                  "TotalCount": 4200,
                  "FailedCount": 55,
                  "LastUpdatedTime": 1571863875.289
              },
              "FileSystemId": "fs-0123456789a7",
              "CreationTime": 1571863850.075,
              "ResourceARN": "arn:aws:fsx:us-east-1:1234567890:task/task-0123456789abcdef3"
          },
          {
              "Lifecycle": "FAILED",
              "Paths": [],
              "Report": {
                  "Enabled": false,
              },
              "StartTime": 1571863862.288,
              "EndTime": 1571863905.292,
              "Type": "EXPORT_TO_REPOSITORY",
              "Tags": [],
              "TaskId": "task-0123456789abcdef1",
              "Status": {
                  "SucceededCount": 1153,
                  "TotalCount": 1156,
                  "FailedCount": 3,
                  "LastUpdatedTime": 1571863875.289
              },
              "FileSystemId": "fs-0123456789abcdef0",
              "CreationTime": 1571863850.075,
              "ResourceARN": "arn:aws:fsx:us-east-1:1234567890:task/task-0123456789abcdef1"
          },
          {
              "Lifecycle": "SUCCEEDED",
              "Paths": [],
              "Report": {
                  "Path":"s3://dataset-04/reports",
                  "Format":"REPORT_CSV_20191124",
                  "Enabled":true,
                  "Scope":"FAILED_FILES_ONLY"
              },
              "StartTime": 1571863862.288,
              "EndTime": 1571863905.292,
              "Type": "EXPORT_TO_REPOSITORY",
              "Tags": [],
              "TaskId": "task-04299453935122318",
              "Status": {
                  "SucceededCount": 258,
                  "TotalCount": 258,
                  "FailedCount": 0,
                  "LastUpdatedTime": 1771848950.012,
              },
              "FileSystemId": "fs-0123456789abcdef0",
              "CreationTime": 1771848950.012,
              "ResourceARN": "arn:aws:fsx:us-east-1:1234567890:task/task-0123456789abcdef0"
          }
      ]
  }
  ```

## Viewing tasks by file system
<a name="view-tasks-by-fs"></a>

You can view all tasks for a specific file system using the Amazon FSx console, CLI, or API, as described following.

### To view tasks by file system (console)
<a name="tasks-by-fs-console"></a>

1. Choose **File systems** on the navigation pane. The **File systems** page appears.

1. Choose the file system that you want to view data repository tasks for. The file system details page appears.

1. On the file system details page, choose the **Data repository** tab. Any tasks for this file system appear on the **Data repository tasks** panel.

### To retrieve tasks by file system (CLI)
<a name="task-by-fs-cli"></a>
+ Use the following command to view all data repository tasks for file system `fs-0123456789abcdef0`.

  ```
  aws fsx describe-data-repository-tasks \
      --filters Name=file-system-id,Values=fs-0123456789abcdef0
  ```

  If the command is successful, Amazon FSx returns the response in JSON format.

  ```
  {
      "DataRepositoryTasks": [
          {
              "Lifecycle": "FAILED",
              "Paths": [],
              "Report": {
                  "Path":"s3://dataset-04/reports",
                  "Format":"REPORT_CSV_20191124",
                  "Enabled":true,
                  "Scope":"FAILED_FILES_ONLY"
              },
              "StartTime": 1571863862.288,
              "EndTime": 1571863905.292,
              "Type": "EXPORT_TO_REPOSITORY",
              "Tags": [],
              "TaskId": "task-0123456789abcdef1",
              "Status": {
                  "SucceededCount": 1153,
                  "TotalCount": 1156,
                  "FailedCount": 3,
                  "LastUpdatedTime": 1571863875.289
              },
              "FileSystemId": "fs-0123456789abcdef0",
              "CreationTime": 1571863850.075,
              "ResourceARN": "arn:aws:fsx:us-east-1:1234567890:task/task-0123456789abcdef1"
          },
          {
              "Lifecycle": "SUCCEEDED",
              "Paths": [],
              "Report": {
                  "Enabled": false,
              },
              "StartTime": 1571863862.288,
              "EndTime": 1571863905.292,
              "Type": "EXPORT_TO_REPOSITORY",
              "Tags": [],
              "TaskId": "task-0123456789abcdef0",
              "Status": {
                  "SucceededCount": 258,
                  "TotalCount": 258,
                  "FailedCount": 0,
                  "LastUpdatedTime": 1771848950.012,
              },
              "FileSystemId": "fs-0123456789abcdef0",
              "CreationTime": 1771848950.012,
              "ResourceARN": "arn:aws:fsx:us-east-1:1234567890:task/task-0123456789abcdef0"
          }
      ]
  }
  ```

# Canceling a data repository task
<a name="cancel-data-repo-task"></a>

You can cancel a data repository task while it's in either the PENDING or EXECUTING state. When you cancel a task, the following occurs:
+ Amazon FSx doesn't process any files that are in the queue to be processed.
+ Amazon FSx continues processing any files that are currently in process.
+ Amazon FSx doesn't revert any files that the task already processed.

## To cancel a data repository task (console)
<a name="w2aac13c33c17c13b7b1"></a>

1. Open the Amazon FSx console at [https://console.aws.amazon.com/fsx/](https://console.aws.amazon.com/fsx/).

1. Click on the file system for which you want to cancel a data repository task.

1. Open the **Data Repository** tab and scroll down to view the **Data Repository Tasks** panel.

1. Choose **Task ID** or **Task name** for the task that you want to cancel.

1. Choose **Cancel task** to cancel the task.

1. Enter the task ID to confirm the cancellation request.

## To cancel a data repository task (CLI)
<a name="w2aac13c33c17c13b7b3"></a>

Use the Amazon FSx [https://docs.aws.amazon.com/cli/latest/reference/fsx/cancel-data-repository-task.html](https://docs.aws.amazon.com/cli/latest/reference/fsx/cancel-data-repository-task.html) CLI command, to cancel a task. [https://docs.aws.amazon.com/fsx/latest/APIReference/API_CancelDataRepositoryTask.html](https://docs.aws.amazon.com/fsx/latest/APIReference/API_CancelDataRepositoryTask.html) is the equivalent API command.
+ Use the following command to cancel a data repository task.

  ```
  aws fsx cancel-data-repository-task \
      --task-id task-0123456789abcdef0
  ```

  If the command is successful, Amazon FSx returns the response in JSON format.

  ```
  {
      "Status": "CANCELING",
      "TaskId": "task-0123456789abcdef0"
  }
  ```

# Working with task completion reports
<a name="task-completion-report"></a>

A *task completion report* provides details about the results of an export, import, or release data repository task. The report includes results for the files processed by the task that match the scope of the report. You can specify whether to generate a report for a task by using the `Enabled` parameter. 

Amazon FSx delivers the report to the file system's linked data repository in Amazon S3, using the path that you specify when you enable the report for a task. The report's file name is `report.csv` for import tasks and `failures.csv` for export or release tasks.

The report format is a comma-separated value (CSV) file that has three fields: `FilePath`, `FileStatus`, and `ErrorCode`.

Reports are encoded using RFC-4180-format encoding as follows:
+ Paths starting with any of the following characters are contained in single quotation marks: `@ + - =` 
+ Strings that contain at least one of the following characters are contained in double quotation marks: `" ,`
+ All double quotation marks are escaped with an additional double quotation mark.

Following are a few examples of the report encoding:
+ `@filename.txt` becomes `"""@filename.txt"""`
+ `+filename.txt` becomes `"""+filename.txt"""`
+ `file,name.txt` becomes `"file,name.txt"`
+ `file"name.txt` becomes `"file""name.txt"`

For more information about RFC-4180 encoding, see [RFC-4180 - Common Format and MIME Type for Comma-Separated Values (CSV) Files](https://tools.ietf.org/html/rfc4180) on the IETF website.

The following is an example of the information provided in a task completion report that includes only failed files.

```
myRestrictedFile,failed,S3AccessDenied
dir1/myLargeFile,failed,FileSizeTooLarge
dir2/anotherLargeFile,failed,FileSizeTooLarge
```

For more information about task failures and how to resolve them, see [Troubleshooting data repository task failures](failed-tasks.md).

# Troubleshooting data repository task failures
<a name="failed-tasks"></a>

You can [turn on logging](cw-event-logging.md) to CloudWatch Logs to log information about any failures experienced while importing or exporting files using data repository tasks. For information about CloudWatch Logs event logs, see [Data repository event logs](data-repo-event-logs.md).

When a data repository task fails, you can find the number of files that Amazon FSx failed to process in **Files failed to export** on the console's **Task status** page. Or you can use the CLI or API and view the task's `Status: FailedCount` property. For information about accessing this information, see [Accessing data repository tasks](view-data-repo-tasks.md). 

For data repository tasks, Amazon FSx also optionally provides information about the specific files and directories that failed in a completion report. The task completion report contains the file or directory path on the Lustre file system that failed, its status, and the failure reason. For more information, see [Working with task completion reports](task-completion-report.md).

A data repository task can fail for several reasons, including those listed following.


| Error Code | Explanation | 
| --- | --- | 
|  `FileSizeTooLarge`  |  The maximum object size supported by Amazon S3 is 5 TiB.  | 
|  `InternalError`  |  An error occurred within the Amazon FSx file system for an import, export, or release task. Generally, this error code means that the Amazon FSx file system that the failed task ran on is in a FAILED lifecycle state. When this occurs, the affected files might not be recoverable due to data loss. Otherwise, you can use hierarchical storage management (HSM) commands to export the files and directories to the data repository on S3. For more information, see [Exporting files using HSM commands](exporting-files-hsm.md).  | 
|  `OperationNotPermitted`  | Amazon FSx was unable to release the file because it ihas not been exported to a linked S3 bucket. You must use automatic export or export data repository tasks to ensure that your files are first exported to your linked Amazon S3 bucket.  | 
|  `PathSizeTooLong`  |  The export path is too long. The maximum object key length supported by S3 is 1,024 characters.  | 
|  `ResourceBusy`  | Amazon FSx was unable to export or release the file because it was being accessed by another client on the file system. You can retry the DataRepositoryTask after your workflow has finished writing to the file.  | 
|  `S3AccessDenied`  |  Access was denied to Amazon S3 for a data repository export or import task. For export tasks, the Amazon FSx file system must have permission to perform the `S3:PutObject` operation to export to a linked data repository on S3. This permission is granted in the `AWSServiceRoleForFSxS3Access_fs-0123456789abcdef0` service-linked role. For more information, see [Using service-linked roles for Amazon FSx](using-service-linked-roles.md). For export tasks, because the export task requires data to flow outside a file system's VPC, this error can occur if the target repository has a bucket policy that contains one of the `aws:SourceVpc` or `aws:SourceVpce` IAM global condition keys. For import tasks, the Amazon FSx file system must have permission to perform the `S3:HeadObject` and `S3:GetObject` operations to import from a linked data repository on S3. For import tasks, if your S3 bucket uses server-side encryption with customer managed keys stored in AWS Key Management Service (SSE-KMS), you must follow the policy configurations in [Working with server-side encrypted Amazon S3 buckets](s3-server-side-encryption-support.md). If your S3 bucket contains objects uploaded from a different AWS account than your file system linked S3 bucket account, you can ensure that your data repository tasks can modify S3 metadata or overwrite S3 objects regardless of which account uploaded them. We recommend that you enable the S3 Object Ownership feature for your S3 bucket. This feature enables you to take ownership of new objects that other AWS accounts upload to your bucket, by forcing uploads to provide the `-/-acl bucket-owner-full-control` canned ACL. You enable S3 Object Ownership by choosing the **Bucket owner preferred** option in your S3 bucket. For more information, see [Controlling ownership of uploaded objects using S3 Object Ownership](https://docs.aws.amazon.com/AmazonS3/latest/userguide/about-object-ownership.html) in the *Amazon S3 User Guide*.  | 
|  `S3Error`  |  Amazon FSx encountered an S3-related error that wasn't `S3AccessDenied`.  | 
|  `S3FileDeleted`  | Amazon FSx was unable to export a hard link file because the source file doesn't exist in the data repository. | 
|  `S3ObjectInUnsupportedTier`  | Amazon FSx successfully imported a non-symlink object from an S3 Glacier Flexible Retrieval or S3 Glacier Deep Archive storage class. The `FileStatus` will be `succeeded with warning` in the task completion report. The warning indicates that to retrieve the data, you must restore the S3 Glacier Flexible Retrieval or S3 Glacier Deep Archive object first and then use an `hsm_restore` command to import the object.  | 
|  `S3ObjectNotFound`  | Amazon FSx was unable to import or export the file because it doesn't exist in the data repository. | 
|  `S3ObjectPathNotPosixCompliant`  |  The Amazon S3 object exists but can't be imported because it isn't a POSIX-compliant object. For information about supported POSIX metadata, see [POSIX metadata support for data repositories](posix-metadata-support.md).  | 
|  `S3ObjectUpdateInProgressFromFileRename`  | Amazon FSx was unable to release the file because automatic export is processing a rename of the file. The automatic export rename process must finish before the file can be released.  | 
|  `S3SymlinkInUnsupportedTier`  | Amazon FSx was unable to import a symlink object because it's in an Amazon S3 storage class that is not supported, such as an S3 Glacier Flexible Retrieval or S3 Glacier Deep Archive storage class. The `FileStatus` will be `failed` in the task completion report. | 
|  `SourceObjectDeletedBeforeReleasing`  | Amazon FSx was unable to release the file from the file system because the file was deleted from the data repository before it could be released. |