

# Transferring to or from on-premises storage with AWS DataSync
<a name="transferring-on-premises-storage"></a>

With AWS DataSync, you can transfer files and objects between a number of on-premises or self-managed storage systems and the following AWS storage services:
+ [Amazon S3](create-s3-location.md)
+ [Amazon EFS](create-efs-location.md)
+ [Amazon FSx for Windows File Server](create-fsx-location.md)
+ [Amazon FSx for Lustre](create-lustre-location.md)
+ [Amazon FSx for OpenZFS](create-openzfs-location.md)
+ [Amazon FSx for NetApp ONTAP](create-ontap-location.md)

**Topics**
+ [Configuring AWS DataSync transfers with an NFS file server](create-nfs-location.md)
+ [Configuring AWS DataSync transfers with an SMB file server](create-smb-location.md)
+ [Configuring AWS DataSync transfers with an HDFS cluster](create-hdfs-location.md)
+ [Configuring DataSync transfers with an object storage system](create-object-location.md)

# Configuring AWS DataSync transfers with an NFS file server
<a name="create-nfs-location"></a>

With AWS DataSync, you can transfer data between your Network File System (NFS) file server and the following AWS storage services. Supported storage services depend on your task mode, as shown below:


| Basic mode | Enhanced mode | 
| --- | --- | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/datasync/latest/userguide/create-nfs-location.html)  |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/datasync/latest/userguide/create-nfs-location.html)  | 

To set up this kind of transfer, you create a [location](how-datasync-transfer-works.md#sync-locations) for your NFS file server. You can use this location as a transfer source or destination.

## Providing DataSync access to NFS file servers
<a name="accessing-nfs"></a>

For DataSync to access your NFS file server, you need a DataSync [agent](how-datasync-transfer-works.md#sync-agents). The agent mounts an export on your file server by using the NFS protocol. Be sure to use the agent that corresponds to your desired task mode.

**Topics**
+ [Configuring your NFS export](#accessing-nfs-configuring-export)
+ [Supported NFS versions](#supported-nfs-versions)

### Configuring your NFS export
<a name="accessing-nfs-configuring-export"></a>

The export that DataSync needs for your transfer depends on if your NFS file server is a source or destination location and how your file server's permissions are configured.

If your file server is a source location, DataSync just has to read and traverse your files and folders. If it's a destination location, DataSync needs root access to write to the location and set ownership, permissions, and other metadata on the files and folders that you're copying. You can use the `no_root_squash` option to allow root access for your export.

The following examples describe how to configure an NFS export that provides access to DataSync.

**When your NFS file server is a source location (root access)**  
Configure your export by using the following command, which provides DataSync read-only permissions (`ro`) and root access ( `no_root_squash`):

```
export-path datasync-agent-ip-address(ro,no_root_squash)
```

**When your NFS file server is a destination location**  
Configure your export by using the following command, which provides DataSync write permissions (`rw`) and root access ( `no_root_squash`):

```
export-path datasync-agent-ip-address(rw,no_root_squash)
```

**When your NFS file server is a source location (no root access)**  
Configure your export by using the following command, which specifies the POSIX user ID (UID) and group ID (GID) that you know would provide DataSync read-only permissions on the export:

```
export-path datasync-agent-ip-address(ro,all_squash,anonuid=uid,anongid=gid)
```

### Supported NFS versions
<a name="supported-nfs-versions"></a>

By default, DataSync uses NFS version 4.1. DataSync also supports NFS 4.0 and 3.x.

## Configuring your network for NFS transfers
<a name="configure-network-nfs-location"></a>

For your DataSync transfer, you must configure traffic for a few network connections: 

1. Allow traffic on the following ports from your DataSync agent to your NFS file server:
   + **For NFS version 4.1 and 4.0** – TCP port 2049
   + **For NFS version 3.x** – TCP ports 111 and 2049

   Other NFS clients in your network should be able to mount the NFS export that you're using to transfer data. The export must also be accessible without Kerberos authentication.

1. Configure traffic for your [service endpoint connection](datasync-network.md) (such as a VPC, public, or FIPS endpoint).

1. Allow traffic from the DataSync service to the [AWS storage service](datasync-network.md#storage-service-network-requirements) you're transferring to or from.

## Creating your NFS transfer location
<a name="create-nfs-location-how-to"></a>

Before you begin, note the following:
+ You need an NFS file server that you want to transfer data from.
+ You need a DataSync agent that can [access your file server](#accessing-nfs).
+  DataSync doesn't support copying NFS version 4 access control lists (ACLs).

### Using the DataSync console
<a name="create-nfs-location-console"></a>

1. Open the AWS DataSync console at [https://console.aws.amazon.com/datasync/](https://console.aws.amazon.com/datasync/).

1. In the left navigation pane, expand **Data transfer**, then choose **Locations** and **Create location**.

1. For **Location type**, choose **Network File System (NFS)**.

1. For **Agents**, choose the DataSync agent that can connect to your NFS file server.

   You can choose more than one agent. For more information, see [Using multiple DataSync agents](do-i-need-datasync-agent.md#multiple-agents).

1. For **NFS server**, enter the Domain Name System (DNS) name or IP address of the NFS file server that your DataSync agent connects to.

1. For **Mount path**, enter the NFS export path that you want DataSync to mount.

   This path (or a subdirectory of the path) is where DataSync transfers data to or from. For more information, see [Configuring your NFS export](#accessing-nfs-configuring-export).

1. (Optional) Expand **Additional settings** and choose a specific **NFS version** for DataSync to use when accessing your file server.

   For more information, see [Supported NFS versions](#supported-nfs-versions).

1. (Optional) Choose **Add tag** to tag your NFS location.

   *Tags* are key-value pairs that help you manage, filter, and search for your locations. We recommend creating at least a name tag for your location. 

1. Choose **Create location**.

### Using the AWS CLI
<a name="create-location-nfs-cli"></a>
+ Use the following command to create an NFS location.

  ```
  aws datasync create-location-nfs \
      --server-hostname nfs-server-address \
      --on-prem-config AgentArns=datasync-agent-arns \
      --subdirectory nfs-export-path
  ```

  For more information on creating the location, see [Providing DataSync access to NFS file servers](#accessing-nfs).

  DataSync automatically chooses the NFS version that it uses to read from an NFS location. To specify an NFS version, use the optional `Version` parameter in the [NfsMountOptions](API_NfsMountOptions.md) API operation.

This command returns the Amazon Resource Name (ARN) of the NFS location, similar to the ARN shown following.

```
{
    "LocationArn": "arn:aws:datasync:us-east-1:111222333444:location/loc-0f01451b140b2af49"
}
```

To make sure that the directory can be mounted, you can connect to any computer that has the same network configuration as your agent and run the following command. 

```
mount -t nfs -o nfsvers=<nfs-server-version <nfs-server-address:<nfs-export-path <test-folder
```

The following is an example of the command.

```
mount -t nfs -o nfsvers=3 198.51.100.123:/path_for_sync_to_read_from /temp_folder_to_test_mount_on_local_machine
```

# Configuring AWS DataSync transfers with an SMB file server
<a name="create-smb-location"></a>

With AWS DataSync, you can transfer data between your Server Message Block (SMB) file server and the following AWS storage services. Supported storage services depend on your task mode, as shown below:


| Basic mode | Enhanced mode | 
| --- | --- | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/datasync/latest/userguide/create-smb-location.html)  |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/datasync/latest/userguide/create-smb-location.html)  | 

To set up this kind of transfer, you create a [location](how-datasync-transfer-works.md#sync-locations) for your SMB file server. You can use this as a transfer source or destination. Be sure to use the agent that corresponds to your desired task mode.

## Providing DataSync access to SMB file servers
<a name="configuring-smb"></a>

DataSync connects to your file server using the SMB protocol and can authenticate with NTLM or Kerberos.

**Topics**
+ [Supported SMB versions](#configuring-smb-version)
+ [Using NTLM authentication](#configuring-smb-ntlm-authentication)
+ [Using Kerberos authentication](#configuring-smb-kerberos-authentication)
+ [Required permissions](#configuring-smb-permissions)
+ [DFS Namespaces](#configuring-smb-location-dfs)

### Supported SMB versions
<a name="configuring-smb-version"></a>

By default, DataSync automatically chooses a version of the SMB protocol based on negotiation with your SMB file server.

You also can configure DataSync to use a specific SMB version, but we recommend doing this only if DataSync has trouble negotiating with the SMB file server automatically. DataSync supports SMB versions 1.0 and later. For security reasons, we recommend using SMB version 3.0.2 or later. Earlier versions, such as SMB 1.0, contain known security vulnerabilities that attackers can exploit to compromise your data.

See the following table for a list of options in the DataSync console and API:


| Console option | API option | Description | 
| --- | --- | --- | 
| Automatic |  `AUTOMATIC`  |  DataSync and the SMB file server negotiate the highest version of SMB that they mutually support between 2.1 and 3.1.1. This is the default and recommended option. If you instead choose a specific version that your file server doesn't support, you may get an `Operation Not Supported` error.  | 
|  SMB 3.0.2  |  `SMB3`  |  Restricts the protocol negotiation to only SMB version 3.0.2.  | 
| SMB 2.1 |  `SMB2`  | Restricts the protocol negotiation to only SMB version 2.1. | 
| SMB 2.0 | `SMB2_0` | Restricts the protocol negotiation to only SMB version 2.0. | 
| SMB 1.0 | `SMB1` | Restricts the protocol negotiation to only SMB version 1.0. | 

### Using NTLM authentication
<a name="configuring-smb-ntlm-authentication"></a>

To use NTLM authentication, you provide a user name and password that allows DataSync to access the SMB file server that you're transferring to or from. The user can be a local user on your file server or a domain user in your Microsoft Active Directory.

### Using Kerberos authentication
<a name="configuring-smb-kerberos-authentication"></a>

To use Kerberos authentication, you provide a Kerberos principal, Kerberos key table (keytab) file, and Kerberos configuration file that allows DataSync to access the SMB file server that you're transferring to or from.

**Topics**
+ [Prerequisites](#configuring-smb-kerberos-prerequisites)
+ [DataSync configuration options for Kerberos](#configuring-smb-kerberos-options)

#### Prerequisites
<a name="configuring-smb-kerberos-prerequisites"></a>

You need to create a couple Kerberos artifacts and configure your network so that DataSync can access your SMB file server.
+ Create a Kerberos keytab file by using the [ktpass](https://learn.microsoft.com/en-us/windows-server/administration/windows-commands/ktpass) or [kutil](https://web.mit.edu/kerberos/krb5-1.12/doc/admin/admin_commands/ktutil.html) utility.

  The following example creates a keytab file by using `ktpass`. The Kerberos realm that you specify (`MYDOMAIN.ORG`) must be upper case.

  ```
  ktpass /out C:\YOUR_KEYTAB.keytab /princ HOST/kerberosuser@MYDOMAIN.ORG /mapuser kerberosuser /pass * /crypto AES256-SHA1 /ptype KRB5_NT_PRINCIPAL
  ```
+ Prepare a simplified version of the Kerberos configuration file (`krb5.conf`). Include information about the realm, the location of the domain admin servers, and mappings of hostnames onto a Kerberos realm.

  Verify that the `krb5.conf` content is formatted with the correct mixed casing for the realms and domain realm names. For example:

  ```
  [libdefaults] 
    dns_lookup_realm = true 
    dns_lookup_kdc = true 
    forwardable = true 
    default_realm = MYDOMAIN.ORG
  
  [realms] 
    MYDOMAIN.ORG = { 
      kdc = mydomain.org 
      admin_server = mydomain.org 
    }
  
  [domain_realm] 
    .mydomain.org = MYDOMAIN.ORG 
    mydomain.org = MYDOMAIN.ORG
  ```
+ In your network configuration, make sure that your Kerberos Key Distribution Center (KDC) server port is open. The KDC port is typically TCP port 88.

#### DataSync configuration options for Kerberos
<a name="configuring-smb-kerberos-options"></a>

When creating an SMB location that uses Kerberos, you configure the following options.


| Console option | API option | Description | 
| --- | --- | --- | 
|  **SMB server**  |  `ServerHostName`  |  The domain name of the SMB file server that your DataSync agent will mount. For Kerberos, you can't specify the file server's IP address.  | 
|  **Kerberos principal**  |  `KerberosPrincipal`  |  An identity in your Kerberos realm that has permission to access the files, folders, and file metadata in your SMB file server. A Kerberos principal might look like `HOST/kerberosuser@MYDOMAIN.ORG`. Principal names are case sensitive.  | 
|  **Keytab file**  |  `KerberosKeytab`   |  A Kerberos key table (keytab) file, which includes mappings between your Kerberos principal and encryption keys.  | 
|  **Kerberos configuration file**  |  `KerberosKrbConf`  |  A `krb5.conf` file that defines your Kerberos realm configuration.  | 
|  **DNS IP addresses** (optional)  |  `DnsIpAddresses`  |  The IPv4 addresses for the DNS servers that your SMB file server belongs to. If you have multiple domains in your environment, configuring this makes sure that DataSync connects to the right SMB file server.  | 

### Required permissions
<a name="configuring-smb-permissions"></a>

The identity that you provide DataSync must have permission to mount and access your SMB file server's files, folders, and file metadata.

If you provide an identity in your Active Directory, it must be a member of an Active Directory group with one or both of the following user rights (depending the [metadata that you want DataSync to copy](configure-metadata.md)):


| User right | Description | 
| --- | --- | 
|  **Restore files and directories** (`SE_RESTORE_NAME`)  |  Allows DataSync to copy object ownership, permissions, file metadata, and NTFS discretionary access lists (DACLs). This user right is usually granted to members of the **Domain Admins** and **Backup Operators** groups (both of which are default Active Directory groups).  | 
|  **Manage auditing and security log** (`SE_SECURITY_NAME`)  |  Allows DataSync to copy NTFS system access control lists (SACLs). This user right is usually granted to members of the **Domain Admins** group.   | 

If you want to copy Windows ACLs and are transferring between an SMB file server and another storage system that uses SMB (such as Amazon FSx for Windows File Server or FSx for ONTAP), the identity that you provide DataSync must belong to the same Active Directory domain or have an Active Directory trust relationship between their domains.

### DFS Namespaces
<a name="configuring-smb-location-dfs"></a>

DataSync doesn't support Microsoft Distributed File System (DFS) Namespaces. We recommend specifying an underlying file server or share instead when creating your DataSync location.

## Creating your SMB transfer location
<a name="create-smb-location-how-to"></a>

Before you begin, you need an SMB file server that you want to transfer data from.

### Using the DataSync console
<a name="create-smb-location-how-to-console"></a>

1. Open the AWS DataSync console at [https://console.aws.amazon.com/datasync/](https://console.aws.amazon.com/datasync/).

1. In the left navigation pane, expand **Data transfer**, then choose **Locations** and **Create location**.

1. For **Location type**, choose **Server Message Block (SMB)**.

   You configure this location as a source or destination later.

1. For **Agents**, choose the DataSync agent that can connect to your SMB file server.

   You can choose more than one agent. For more information, see [Using multiple DataSync agents](do-i-need-datasync-agent.md#multiple-agents).

1. For **SMB server**, enter the domain name or IP address of the SMB file server that your DataSync agent will mount.

   Remember the following with this setting:
   + You can't specify an IP version 6 (IPv6) address.
   + If you're using Kerberos authentication, you must specify a domain name.

1. For **Share name**, enter the name of the share exported by your SMB file server where DataSync will read or write data.

   You can include a subdirectory in the share path (for example, `/path/to/subdirectory`). Make sure that other SMB clients in your network can also mount this path. 

   To copy all the data in the subdirectory, DataSync must be able to mount the SMB share and access all of its data. For more information, see [Required permissions](#configuring-smb-permissions).

1. (Optional) Expand **Additional settings** and choose an **SMB Version** for DataSync to use when accessing your file server.

   By default, DataSync automatically chooses a version based on negotiation with the SMB file server. For information, see [Supported SMB versions](#configuring-smb-version).

1. For **Authentication type**, choose **NTLM** or **Kerberos**.

1. Do one of the following depending on your authentication type:

------
#### [ NTLM ]
   + For **User**, enter a user name that can mount your SMB file server and has permission to access the files and folders involved in your transfer.

     For more information, see [Required permissions](#configuring-smb-permissions).
   + For **Password**, enter the password of the user who can mount your SMB file server and has permission to access the files and folders involved in your transfer.
   + (Optional) For **Domain**, enter the Windows domain name that your SMB file server belongs to.

     If you have multiple domains in your environment, configuring this setting makes sure that DataSync connects to the right SMB file server.

------
#### [ Kerberos ]
   + For **Kerberos principal**, specify a principal in your Kerberos realm that has permission to access the files, folders, and file metadata in your SMB file server.

     A Kerberos principal might look like `HOST/kerberosuser@MYDOMAIN.ORG`.

     Principal names are case sensitive. Your DataSync task execution will fail if the principal that you specify for this setting doesn’t exactly match the principal that you use to create the keytab file.
   + For **Keytab file**, upload a keytab file that includes mappings between your Kerberos principal and encryption keys.
   + For **Kerberos configuration file**, upload a `krb5.conf` file that defines your Kerberos realm configuration.
   + (Optional) For **DNS IP addresses**, specify up to two IPv4 addresses for the DNS servers that your SMB file server belongs to. 

     If you have multiple domains in your environment, configuring this parameter makes sure that DataSync connects to the right SMB file server.

------

1. (Optional) Choose **Add tag** to tag your SMB location.

   *Tags* are key-value pairs that help you manage, filter, and search for your locations. We recommend creating at least a name tag for your location. 

1. Choose **Create location**.

### Using the AWS CLI
<a name="create-location-smb-cli"></a>

The following instructions describe how to create SMB locations with NTLM or Kerberos authentication.

------
#### [ NTLM ]

1. Copy the following `create-location-smb` command.

   ```
   aws datasync create-location-smb \
       --agent-arns datasync-agent-arns \
       --server-hostname smb-server-address \
       --subdirectory smb-export-path \
       --authentication-type "NTLM" \
       --user user-who-can-mount-share \
       --password user-password \
       --domain windows-domain-of-smb-server
   ```

1. For `--agent-arns`, specify the DataSync agent that can connect to your SMB file server.

   You can choose more than one agent. For more information, see [Using multiple DataSync agents](do-i-need-datasync-agent.md#multiple-agents).

1. For `--server-hostname`, specify the domain name or IPv4 address of the SMB file server that your DataSync agent will mount. 

1. For `--subdirectory`, specify the name of the share exported by your SMB file server where DataSync will read or write data.

   You can include a subdirectory in the share path (for example, `/path/to/subdirectory`). Make sure that other SMB clients in your network can also mount this path. 

   To copy all the data in the subdirectory, DataSync must be able to mount the SMB share and access all of its data. For more information, see [Required permissions](#configuring-smb-permissions).

1. For `--user`, specify a user name that can mount your SMB file server and has permission to access the files and folders involved in your transfer.

   For more information, see [Required permissions](#configuring-smb-permissions).

1. For `--password`, specify the password of the user who can mount your SMB file server and has permission to access the files and folders involved in your transfer.

1. (Optional) For `--domain`, specify the Windows domain name that your SMB file server belongs to.

   If you have multiple domains in your environment, configuring this setting makes sure that DataSync connects to the right SMB file server.

1. (Optional) Add the `--version` option if you want DataSync to use a specific SMB version. For more information, see [Supported SMB versions](#configuring-smb-version).

1. Run the `create-location-smb` command.

   If the command is successful, you get a response that shows you the ARN of the location that you created. For example:

   ```
   {
       "arn:aws:datasync:us-east-1:123456789012:location/loc-01234567890example"
   }
   ```

------
#### [ Kerberos ]

1. Copy the following `create-location-smb` command.

   ```
   aws datasync create-location-smb \
       --agent-arns datasync-agent-arns \
       --server-hostname smb-server-address \
       --subdirectory smb-export-path \
       --authentication-type "KERBEROS" \
       --kerberos-principal "HOST/kerberosuser@EXAMPLE.COM" \
       --kerberos-keytab "fileb://path/to/file.keytab" \
       --kerberos-krb5-conf "file://path/to/krb5.conf" \
       --dns-ip-addresses array-of-ipv4-addresses
   ```

1. For `--agent-arns`, specify the DataSync agent that can connect to your SMB file server.

   You can choose more than one agent. For more information, see [Using multiple DataSync agents](do-i-need-datasync-agent.md#multiple-agents).

1. For `--server-hostname`, specify the domain name of the SMB file server that your DataSync agent will mount. 

1. For `--subdirectory`, specify the name of the share exported by your SMB file server where DataSync will read or write data.

   You can include a subdirectory in the share path (for example, `/path/to/subdirectory`). Make sure that other SMB clients in your network can also mount this path. 

   To copy all the data in the subdirectory, DataSync must be able to mount the SMB share and access all of its data. For more information, see [Required permissions](#configuring-smb-permissions).

1. For the Kerberos options, do the following:
   + `--kerberos-principal`: Specify a principal in your Kerberos realm that has permission to access the files, folders, and file metadata in your SMB file server.

     A Kerberos principal might look like `HOST/kerberosuser@MYDOMAIN.ORG`.

     Principal names are case sensitive. Your DataSync task execution will fail if the principal that you specify for this option doesn’t exactly match the principal that you use to create the keytab file.
   + `--kerberos-keytab`: Specify a keytab file that includes mappings between your Kerberos principal and encryption keys.
   + `--kerberos-krb5-conf`: Specify a `krb5.conf` file that defines your Kerberos realm configuration.
   + (Optional) `--dns-ip-addresses`: Specify up to two IPv4 addresses for the DNS servers that your SMB file server belongs to. 

     If you have multiple domains in your environment, configuring this parameter makes sure that DataSync connects to the right SMB file server.

1. (Optional) Add the `--version` option if you want DataSync to use a specific SMB version. For more information, see [Supported SMB versions](#configuring-smb-version).

1. Run the `create-location-smb` command.

   If the command is successful, you get a response that shows you the ARN of the location that you created. For example:

   ```
   {
       "arn:aws:datasync:us-east-1:123456789012:location/loc-01234567890example"
   }
   ```

------

# Configuring AWS DataSync transfers with an HDFS cluster
<a name="create-hdfs-location"></a>

With AWS DataSync, you can transfer data between your Hadoop Distributed File System (HDFS) cluster and one of the following AWS storage services using Basic mode tasks:
+ [Amazon S3](create-s3-location.md)
+ [Amazon EFS](create-efs-location.md)
+ [Amazon FSx for Windows File Server](create-fsx-location.md)
+ [Amazon FSx for Lustre](create-lustre-location.md)
+ [Amazon FSx for OpenZFS](create-openzfs-location.md)
+ [Amazon FSx for NetApp ONTAP](create-ontap-location.md)

To set up this kind of transfer, you create a [location](how-datasync-transfer-works.md#sync-locations) for your HDFS cluster. You can use this location as a transfer source or destination.

## Providing DataSync access to HDFS clusters
<a name="accessing-hdfs"></a>

To connect to your HDFS cluster, DataSync uses a Basic mode agent [agent that you deploy](deploy-agents.md) as close as possible to your HDFS cluster. The DataSync agent acts as an HDFS client and communicates with the NameNodes and DataNodes in your cluster.

When you start a transfer task, DataSync queries the NameNode for locations of files and folders on the cluster. If you configure your HDFS location as a source location, DataSync reads files and folder data from the DataNodes in your cluster and copies that data to the destination. If you configure your HDFS location as a destination location, then DataSync writes files and folders from the source to the DataNodes in your cluster.

### Authentication
<a name="accessing-hdfs-authentication"></a>

When connecting to an HDFS cluster, DataSync supports simple authentication or Kerberos authentication. To use simple authentication, provide the user name of a user with rights to read and write to the HDFS cluster. To use Kerberos authentication, provide a Kerberos configuration file, a Kerberos key table (keytab) file, and a Kerberos principal name. The credentials of the Kerberos principal must be in the provided keytab file.

### Encryption
<a name="accessing-hdfs-encryption"></a>

When using Kerberos authentication, DataSync supports encryption of data as it's transmitted between the DataSync agent and your HDFS cluster. Encrypt your data by using the Quality of Protection (QOP) configuration settings on your HDFS cluster and by specifying the QOP settings when creating your HDFS location. The QOP configuration includes settings for data transfer protection and Remote Procedure Call (RPC) protection. 

**DataSync supports the following Kerberos encryption types:**
+ `des-cbc-crc`
+ `des-cbc-md4`
+ `des-cbc-md5`
+ `des3-cbc-sha1`
+ `arcfour-hmac`
+ `arcfour-hmac-exp`
+ `aes128-cts-hmac-sha1-96`
+ `aes256-cts-hmac-sha1-96`
+ `aes128-cts-hmac-sha256-128`
+ `aes256-cts-hmac-sha384-192`
+ `camellia128-cts-cmac`
+ `camellia256-cts-cmac`

You can also configure HDFS clusters for encryption at rest using Transparent Data Encryption (TDE). When using simple authentication, DataSync reads and writes to TDE-enabled clusters. If you're using DataSync to copy data to a TDE-enabled cluster, first configure the encryption zones on the HDFS cluster. DataSync doesn't create encryption zones. 

## Unsupported HDFS features
<a name="hdfs-unsupported-features"></a>

The following HDFS capabilities aren't currently supported by DataSync:
+ Transparent Data Encryption (TDE) when using Kerberos authentication
+ Configuring multiple NameNodes
+ Hadoop HDFS over HTTP (HttpFS)
+ POSIX access control lists (ACLs)
+ HDFS extended attributes (xattrs)
+ HDFS clusters using Apache HBase

## Creating your HDFS transfer location
<a name="create-hdfs-location-how-to"></a>

You can use your location as a source or destination for your DataSync transfer.

**Before you begin**: Verify network connectivity between your agent and Hadoop cluster by doing the following:
+ Test access to the TCP ports listed in [Network requirements for on-premises, self-managed, and other cloud storage](datasync-network.md#on-premises-network-requirements).
+ Test access between your local agent and your Hadoop cluster. For instructions, see [Verifying your agent's connection to your storage system](test-agent-connections.md#self-managed-storage-connectivity).

### Using the DataSync console
<a name="create-hdfs-location-how-to-console"></a>

1. Open the AWS DataSync console at [https://console.aws.amazon.com/datasync/](https://console.aws.amazon.com/datasync/).

1. In the left navigation pane, expand **Data transfer**, then choose **Locations** and **Create location**.

1. For **Location type**, choose **Hadoop Distributed File System (HDFS)**.

   You can configure this location as a source or destination later. 

1. For **Agents**, choose the agent that can connect to your HDFS cluster.

   You can choose more than one agent. For more information, see [Using multiple DataSync agents](do-i-need-datasync-agent.md#multiple-agents).

1. For **NameNode**, provide the domain name or IP address of your HDFS cluster's primary NameNode.

1. For **Folder**, enter a folder on your HDFS cluster that you want DataSync to use for the data transfer.

   If your HDFS location is a source, DataSync copies the files in this folder to the destination. If your location is a destination, DataSync writes files to this folder.

1. To set the **Block size** or **Replication factor**, choose **Additional settings**.

   The default block size is 128 MiB. The block sizes that you provide must be a multiple of 512 bytes.

   The default replication factor is three DataNodes when transferring to the HDFS cluster. 

1. In the **Security** section, choose the **Authentication type** used on your HDFS cluster. 
   + **Simple** – For **User**, specify the user name with the following permissions on the HDFS cluster (depending on your use case):
     + If you plan to use this location as a source location, specify a user that only has read permissions.
     + If you plan to use this location as a destination location, specify a user that has read and write permissions.

     Optionally, specify the URI of the Key Management Server (KMS) of your HDFS cluster. 
   + **Kerberos** – Specify the Kerberos **Principal** with access to your HDFS cluster. Next, provide the **KeyTab file** that contains the provided Kerberos principal. Then, provide the **Kerberos configuration file**. Finally, specify the type of encryption in transit protection in the **RPC protection** and **Data transfer protection** dropdown lists.

1. (Optional) Choose **Add tag** to tag your HDFS location.

   *Tags* are key-value pairs that help you manage, filter, and search for your locations. We recommend creating at least a name tag for your location. 

1. Choose **Create location**.

### Using the AWS CLI
<a name="create-location-hdfs-cli"></a>

1. Copy the following `create-location-hdfs` command.

   ```
   aws datasync create-location-hdfs --name-nodes [{"Hostname":"host1", "Port": 8020}] \
       --authentication-type "SIMPLE|KERBEROS" \
       --agent-arns [arn:aws:datasync:us-east-1:123456789012:agent/agent-01234567890example] \
       --subdirectory "/path/to/my/data"
   ```

1. For the `--name-nodes` parameter, specify the hostname or IP address of your HDFS cluster's primary NameNode and the TCP port that the NameNode is listening on.

1. For the `--authentication-type` parameter, specify the type of authentication to use when connecting to the Hadoop cluster. You can specify `SIMPLE` or `KERBEROS`.

   If you use `SIMPLE` authentication, use the `--simple-user` parameter to specify the user name of the user. If you use `KERBEROS` authentication, use the `--kerberos-principal`, `--kerberos-keytab`, and `--kerberos-krb5-conf` parameters. For more information, see [create-location-hdfs](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/datasync/create-location-hdfs.html).

1. For the `--agent-arns` parameter, specify the ARN of the DataSync agent that can connect to your HDFS cluster.

   You can choose more than one agent. For more information, see [Using multiple DataSync agents](do-i-need-datasync-agent.md#multiple-agents).

1. (Optional) For the `--subdirectory` parameter, specify a folder on your HDFS cluster that you want DataSync to use for the data transfer.

   If your HDFS location is a source, DataSync copies the files in this folder to the destination. If your location is a destination, DataSync writes files to this folder.

1. Run the `create-location-hdfs` command.

   If the command is successful, you get a response that shows you the ARN of the location that you created. For example:

   ```
   {
       "arn:aws:datasync:us-east-1:123456789012:location/loc-01234567890example"
   }
   ```

# Configuring DataSync transfers with an object storage system
<a name="create-object-location"></a>

With AWS DataSync, you can transfer data between your object storage system and one of the following AWS storage services using Basic mode tasks:
+ [Amazon S3](create-s3-location.md)
+ [Amazon EFS](create-efs-location.md)
+ [Amazon FSx for Windows File Server](create-fsx-location.md)
+ [Amazon FSx for Lustre](create-lustre-location.md)
+ [Amazon FSx for OpenZFS](create-openzfs-location.md)
+ [Amazon FSx for NetApp ONTAP](create-ontap-location.md)

To set up this kind of transfer, you create a [location](how-datasync-transfer-works.md#sync-locations) for your object storage system. You can use this location as a transfer source or destination. Transferring data to or from your on-premises object storage requires a Basic mode DataSync agent.

## Prerequisites
<a name="create-object-location-prerequisites"></a>

Your object storage system must be compatible with the following [Amazon S3 API operations](https://docs.aws.amazon.com/AmazonS3/latest/API/API_Operations.html) for DataSync to connect to it:
+ `AbortMultipartUpload`
+ `CompleteMultipartUpload`
+ `CopyObject`
+ `CreateMultipartUpload`
+ `DeleteObject`
+ `DeleteObjects`
+ `DeleteObjectTagging`
+ `GetBucketLocation`
+ `GetObject`
+ `GetObjectTagging`
+ `HeadBucket`
+ `HeadObject`
+ `ListObjectsV2`
+ `PutObject`
+ `PutObjectTagging`
+ `UploadPart`

## Creating your object storage transfer location
<a name="create-object-location-how-to"></a>

Before you begin, you need an object storage system that you plan to transfer data to or from.

### Using the DataSync console
<a name="create-object-location-how-to-console"></a>

1. Open the AWS DataSync console at [https://console.aws.amazon.com/datasync/](https://console.aws.amazon.com/datasync/).

1. In the left navigation pane, expand **Data transfer**, then choose **Locations** and **Create location**.

1. For **Location type**, choose **Object storage**.

   You configure this location as a source or destination later.

1. For **Server**, provide the domain name or IP address of the object storage server. 

1. For **Bucket name**, enter the name of the object storage bucket involved in the transfer.

1. For **Folder**, enter an object prefix.

   DataSync only copies objects with this prefix. 

1. If your transfer requires an agent, choose **Use agents**, then choose the DataSync agent that connects to your object storage system.

   Some transfers don't require agents. In other scenarios, you might want to use more than one agent. For more information, see [Situations when you don't need a DataSync agent](do-i-need-datasync-agent.md#when-agent-not-required) and [Using multiple DataSync agents](do-i-need-datasync-agent.md#multiple-agents).

1. To configure the connection to the object storage server, expand **Additional settings** and do the following:

   1. For **Server protocol**, choose **HTTP** or **HTTPS**.

   1. For **Server port**, use a default port (**80** for HTTP or **443** for HTTPS) or specify a custom port if needed.

   1. For **Certificate**, if your object storage system uses a private or self-signed certificate authority (CA), select **Choose file** and specify a single `.pem` file with a full certificate chain.

      The certificate chain might include:
      + The object storage system's certificate
      + All intermediate certificates (if there are any)
      + The root certificate of the signing CA

      You can concatenate your certificates into a `.pem` file (which can be up to 32768 bytes before base64 encoding). The following example `cat` command creates an `object_storage_certificates.pem` file that includes three certificates:

      ```
      cat object_server_certificate.pem intermediate_certificate.pem ca_root_certificate.pem > object_storage_certificates.pem
      ```

1. If the object storage server requires credentials for access, select **Requires credentials** and enter the **Access key** you use to access the bucket. Then either enter the **Secret key** directly, or specify an AWS Secrets Manager secret that contains the key. For more information, see [Providing credentials for storage locations](https://docs.aws.amazon.com/datasync/latest/userguide/location-credentials.html).

   The access key and secret key can be a user name and password, respectively.

1. (Optional) Choose **Add tag** to tag your object storage location.

   *Tags* are key-value pairs that help you manage, filter, and search for your locations. We recommend creating at least a name tag for your location. 

1. Choose **Create location**.

### Using the AWS CLI
<a name="create-location-object-cli"></a>

1. Copy the following `create-location-object-storage` command:

   ```
   aws datasync create-location-object-storage \
       --server-hostname object-storage-server.example.com \
       --bucket-name your-bucket \
       --agent-arns arn:aws:datasync:us-east-1:123456789012:agent/agent-01234567890deadfb
   ```

1. Specify the following required parameters in the command:
   + `--server-hostname` – Specify the domain name or IP address of your object storage server.
   + `--bucket-name` – Specify the name of the bucket on your object storage server that you're transferring to or from.

1. (Optional) Add any of the following parameters to the command:
   + `--agent-arns` – Specify the DataSync agents that you want to connect to your object storage server.
   + `--server-port` – Specifies the port that your object storage server accepts inbound network traffic on (for example, port `443`).
   + `--server-protocol` – Specifies the protocol (`HTTP` or `HTTPS`) which your object storage server uses to communicate.
   + `--access-key` – Specifies the access key (for example, a user name) if credentials are required to authenticate with the object storage server.
   + `--secret-key` – Specifies the secret key (for example, a password) if credentials are required to authenticate with the object storage server.

     You can also provide additional parameters for securing your keys using AWS Secrets Manager. For more information, see [Providing credentials for storage locations](https://docs.aws.amazon.com/datasync/latest/userguide/location-credentials.html).
   + `--server-certificate` – Specifies a certificate chain for DataSync to authenticate with your object storage system if the system uses a private or self-signed certificate authority (CA). You must specify a single `.pem` file with a full certificate chain (for example, `file:///home/user/.ssh/object_storage_certificates.pem`).

     The certificate chain might include:
     + The object storage system's certificate
     + All intermediate certificates (if there are any)
     + The root certificate of the signing CA

     You can concatenate your certificates into a `.pem` file (which can be up to 32768 bytes before base64 encoding). The following example `cat` command creates an `object_storage_certificates.pem` file that includes three certificates:

     ```
     cat object_server_certificate.pem intermediate_certificate.pem ca_root_certificate.pem > object_storage_certificates.pem
     ```
   + `--subdirectory` – Specifies the object prefix for your object storage server.

     DataSync only copies objects with this prefix. 
   + `--tags` – Specifies the key-value pair that represents a tag that you want to add to the location resource.

     Tags can help you manage, filter, and search for your resources. We recommend creating a name tag for your location.

1. Run the `create-location-object-storage` command.

   You get a response that shows you the location ARN that you just created.

   ```
   {
       "LocationArn": "arn:aws:datasync:us-east-1:123456789012:location/loc-01234567890abcdef"
   }
   ```