You are viewing documentation for version 2 of the AWS SDK for Ruby. Version 3 documentation can be found here.
Class: Aws::DatabaseMigrationService::Types::S3Settings
- Inherits:
- 
      Struct
      
        - Object
- Struct
- Aws::DatabaseMigrationService::Types::S3Settings
 
- Defined in:
- (unknown)
Overview
When passing S3Settings as input to an Aws::Client method, you can use a vanilla Hash:
{
  service_access_role_arn: "String",
  external_table_definition: "String",
  csv_row_delimiter: "String",
  csv_delimiter: "String",
  bucket_folder: "String",
  bucket_name: "String",
  compression_type: "none", # accepts none, gzip
  encryption_mode: "sse-s3", # accepts sse-s3, sse-kms
  server_side_encryption_kms_key_id: "String",
  data_format: "csv", # accepts csv, parquet
  encoding_type: "plain", # accepts plain, plain-dictionary, rle-dictionary
  dict_page_size_limit: 1,
  row_group_length: 1,
  data_page_size: 1,
  parquet_version: "parquet-1-0", # accepts parquet-1-0, parquet-2-0
  enable_statistics: false,
  include_op_for_full_load: false,
  cdc_inserts_only: false,
  timestamp_column_name: "String",
  parquet_timestamp_in_millisecond: false,
  cdc_inserts_and_updates: false,
  date_partition_enabled: false,
  date_partition_sequence: "YYYYMMDD", # accepts YYYYMMDD, YYYYMMDDHH, YYYYMM, MMYYYYDD, DDMMYYYY
  date_partition_delimiter: "SLASH", # accepts SLASH, UNDERSCORE, DASH, NONE
}
Settings for exporting data to Amazon S3.
Returned by:
Instance Attribute Summary collapse
- 
  
    
      #bucket_folder  ⇒ String 
    
    
  
  
  
  
    
    
  
  
  
  
  
  
    An optional parameter to set a folder name in the S3 bucket. 
- 
  
    
      #bucket_name  ⇒ String 
    
    
  
  
  
  
    
    
  
  
  
  
  
  
    The name of the S3 bucket. 
- 
  
    
      #cdc_inserts_and_updates  ⇒ Boolean 
    
    
  
  
  
  
    
    
  
  
  
  
  
  
    A value that enables a change data capture (CDC) load to write INSERT and UPDATE operations to .csv or .parquet (columnar storage) output files. 
- 
  
    
      #cdc_inserts_only  ⇒ Boolean 
    
    
  
  
  
  
    
    
  
  
  
  
  
  
    A value that enables a change data capture (CDC) load to write only INSERT operations to .csv or columnar storage (.parquet) output files. 
- 
  
    
      #compression_type  ⇒ String 
    
    
  
  
  
  
    
    
  
  
  
  
  
  
    An optional parameter to use GZIP to compress the target files. 
- 
  
    
      #csv_delimiter  ⇒ String 
    
    
  
  
  
  
    
    
  
  
  
  
  
  
    The delimiter used to separate columns in the .csv file for both source and target. 
- 
  
    
      #csv_row_delimiter  ⇒ String 
    
    
  
  
  
  
    
    
  
  
  
  
  
  
    The delimiter used to separate rows in the .csv file for both source and target. 
- 
  
    
      #data_format  ⇒ String 
    
    
  
  
  
  
    
    
  
  
  
  
  
  
    The format of the data that you want to use for output. 
- 
  
    
      #data_page_size  ⇒ Integer 
    
    
  
  
  
  
    
    
  
  
  
  
  
  
    The size of one data page in bytes. 
- 
  
    
      #date_partition_delimiter  ⇒ String 
    
    
  
  
  
  
    
    
  
  
  
  
  
  
    Specifies a date separating delimiter to use during folder partitioning. 
- 
  
    
      #date_partition_enabled  ⇒ Boolean 
    
    
  
  
  
  
    
    
  
  
  
  
  
  
    When set to true, this parameter partitions S3 bucket folders based on transaction commit dates.
- 
  
    
      #date_partition_sequence  ⇒ String 
    
    
  
  
  
  
    
    
  
  
  
  
  
  
    Identifies the sequence of the date format to use during folder partitioning. 
- 
  
    
      #dict_page_size_limit  ⇒ Integer 
    
    
  
  
  
  
    
    
  
  
  
  
  
  
    The maximum size of an encoded dictionary page of a column. 
- 
  
    
      #enable_statistics  ⇒ Boolean 
    
    
  
  
  
  
    
    
  
  
  
  
  
  
    A value that enables statistics for Parquet pages and row groups. 
- 
  
    
      #encoding_type  ⇒ String 
    
    
  
  
  
  
    
    
  
  
  
  
  
  
    The type of encoding you are using:. 
- 
  
    
      #encryption_mode  ⇒ String 
    
    
  
  
  
  
    
    
  
  
  
  
  
  
    The type of server-side encryption that you want to use for your data. 
- 
  
    
      #external_table_definition  ⇒ String 
    
    
  
  
  
  
    
    
  
  
  
  
  
  
    Specifies how tables are defined in the S3 source files only. 
- 
  
    
      #include_op_for_full_load  ⇒ Boolean 
    
    
  
  
  
  
    
    
  
  
  
  
  
  
    A value that enables a full load to write INSERT operations to the comma-separated value (.csv) output files only to indicate how the rows were added to the source database. 
- 
  
    
      #parquet_timestamp_in_millisecond  ⇒ Boolean 
    
    
  
  
  
  
    
    
  
  
  
  
  
  
    A value that specifies the precision of any TIMESTAMPcolumn values that are written to an Amazon S3 object file in .parquet format.
- 
  
    
      #parquet_version  ⇒ String 
    
    
  
  
  
  
    
    
  
  
  
  
  
  
    The version of the Apache Parquet format that you want to use: parquet_1_0(the default) orparquet_2_0.
- 
  
    
      #row_group_length  ⇒ Integer 
    
    
  
  
  
  
    
    
  
  
  
  
  
  
    The number of rows in a row group. 
- 
  
    
      #server_side_encryption_kms_key_id  ⇒ String 
    
    
  
  
  
  
    
    
  
  
  
  
  
  
    If you are using SSE_KMSfor theEncryptionMode, provide the AWS KMS key ID.
- 
  
    
      #service_access_role_arn  ⇒ String 
    
    
  
  
  
  
    
    
  
  
  
  
  
  
    The Amazon Resource Name (ARN) used by the service access IAM role. 
- 
  
    
      #timestamp_column_name  ⇒ String 
    
    
  
  
  
  
    
    
  
  
  
  
  
  
    A value that when nonblank causes AWS DMS to add a column with timestamp information to the endpoint data for an Amazon S3 target. 
Instance Attribute Details
#bucket_folder ⇒ String
An optional parameter to set a folder name in the S3 bucket. If
provided, tables are created in the path 
bucketFolder/schema_name/table_name/. If this parameter isn\'t
specified, then the path used is schema_name/table_name/.
#bucket_name ⇒ String
The name of the S3 bucket.
#cdc_inserts_and_updates ⇒ Boolean
A value that enables a change data capture (CDC) load to write INSERT
and UPDATE operations to .csv or .parquet (columnar storage) output
files. The default setting is false, but when CdcInsertsAndUpdates
is set to true or y, only INSERTs and UPDATEs from the source
database are migrated to the .csv or .parquet file.
For .csv file format only, how these INSERTs and UPDATEs are recorded
depends on the value of the IncludeOpForFullLoad parameter. If
IncludeOpForFullLoad is set to true, the first field of every CDC
record is set to either I or U to indicate INSERT and UPDATE
operations at the source. But if IncludeOpForFullLoad is set to
false, CDC records are written without an indication of INSERT or
UPDATE operations at the source. For more information about how these
settings work together, see Indicating Source DB Operations in Migrated
S3 Data in the AWS Database Migration Service User Guide..
CdcInsertsAndUpdates parameter in
versions 3.3.1 and later.
 CdcInsertsOnly and CdcInsertsAndUpdates can\'t both be set to true
for the same endpoint. Set either CdcInsertsOnly or
CdcInsertsAndUpdates to true for the same endpoint, but not both.
#cdc_inserts_only ⇒ Boolean
A value that enables a change data capture (CDC) load to write only
INSERT operations to .csv or columnar storage (.parquet) output files.
By default (the false setting), the first field in a .csv or .parquet
record contains the letter I (INSERT), U (UPDATE), or D (DELETE). These
values indicate whether the row was inserted, updated, or deleted at the
source database for a CDC load to the target.
If CdcInsertsOnly is set to true or y, only INSERTs from the
source database are migrated to the .csv or .parquet file. For .csv
format only, how these INSERTs are recorded depends on the value of
IncludeOpForFullLoad. If IncludeOpForFullLoad is set to true, the
first field of every CDC record is set to I to indicate the INSERT
operation at the source. If IncludeOpForFullLoad is set to false,
every CDC record is written without a first field to indicate the INSERT
operation at the source. For more information about how these settings
work together, see Indicating Source DB Operations in Migrated S3
Data in the AWS Database Migration Service User Guide..
CdcInsertsOnly and IncludeOpForFullLoad parameters in versions 3.1.4
and later.
 CdcInsertsOnly and CdcInsertsAndUpdates can\'t both be set to true
for the same endpoint. Set either CdcInsertsOnly or
CdcInsertsAndUpdates to true for the same endpoint, but not both.
#compression_type ⇒ String
An optional parameter to use GZIP to compress the target files. Set to GZIP to compress the target files. Either set this parameter to NONE (the default) or don\'t use it to leave the files uncompressed. This parameter applies to both .csv and .parquet file formats.
Possible values:
- none
- gzip
#csv_delimiter ⇒ String
The delimiter used to separate columns in the .csv file for both source and target. The default is a comma.
#csv_row_delimiter ⇒ String
The delimiter used to separate rows in the .csv file for both source and
target. The default is a carriage return (\n).
#data_format ⇒ String
The format of the data that you want to use for output. You can choose one of the following:
- csv: This is a row-based file format with comma-separated values (.csv).
- parquet: Apache Parquet (.parquet) is a columnar storage file format that features efficient compression and provides faster query response.- Possible values: - csv
- parquet
 
#data_page_size ⇒ Integer
The size of one data page in bytes. This parameter defaults to 1024 * 1024 bytes (1 MiB). This number is used for .parquet file format only.
#date_partition_delimiter ⇒ String
Specifies a date separating delimiter to use during folder partitioning.
The default value is SLASH. Use this parameter when
DatePartitionedEnabled is set to true. 
Possible values:
- SLASH
- UNDERSCORE
- DASH
- NONE
#date_partition_enabled ⇒ Boolean
When set to true, this parameter partitions S3 bucket folders based on
transaction commit dates. The default value is false. For more
information about date-based folder partitoning, see Using date-based
folder partitioning.
#date_partition_sequence ⇒ String
Identifies the sequence of the date format to use during folder
partitioning. The default value is YYYYMMDD. Use this parameter when
DatePartitionedEnabled is set to true. 
Possible values:
- YYYYMMDD
- YYYYMMDDHH
- YYYYMM
- MMYYYYDD
- DDMMYYYY
#dict_page_size_limit ⇒ Integer
The maximum size of an encoded dictionary page of a column. If the
dictionary page exceeds this, this column is stored using an encoding
type of PLAIN. This parameter defaults to 1024 * 1024 bytes (1 MiB),
the maximum size of a dictionary page before it reverts to PLAIN
encoding. This size is used for .parquet file format only.
#enable_statistics ⇒ Boolean
A value that enables statistics for Parquet pages and row groups. Choose
true to enable statistics, false to disable. Statistics include
NULL, DISTINCT, MAX, and MIN values. This parameter defaults to
true. This value is used for .parquet file format only.
#encoding_type ⇒ String
The type of encoding you are using:
- RLE_DICTIONARYuses a combination of bit-packing and run-length encoding to store repeated values more efficiently. This is the default.
- PLAINdoesn\'t use encoding at all. Values are stored as they are.
- PLAIN_DICTIONARYbuilds a dictionary of the values encountered in a given column. The dictionary is stored in a dictionary page for each column chunk.- Possible values: - plain
- plain-dictionary
- rle-dictionary
 
#encryption_mode ⇒ String
The type of server-side encryption that you want to use for your data.
This encryption type is part of the endpoint settings or the extra
connections attributes for Amazon S3. You can choose either SSE_S3
(the default) or SSE_KMS.
ModifyEndpoint operation, you can change the existing value of
the EncryptionMode parameter from SSE_KMS to SSE_S3. But you can’t
change the existing value from SSE_S3 to SSE_KMS.
To use SSE_S3, you need an AWS Identity and Access Management (IAM)
role with permission to allow "arn:aws:s3:::dms-*" to use the
following actions:
- s3:CreateBucket
- s3:ListBucket
- s3:DeleteBucket
- s3:GetBucketLocation
- s3:GetObject
- s3:PutObject
- s3:DeleteObject
- s3:GetObjectVersion
- s3:GetBucketPolicy
- s3:PutBucketPolicy
- s3:DeleteBucketPolicy- Possible values: - sse-s3
- sse-kms
 
#external_table_definition ⇒ String
Specifies how tables are defined in the S3 source files only.
#include_op_for_full_load ⇒ Boolean
A value that enables a full load to write INSERT operations to the comma-separated value (.csv) output files only to indicate how the rows were added to the source database.
IncludeOpForFullLoad parameter in versions 3.1.4
and later.
For full load, records can only be inserted. By default (the false
setting), no information is recorded in these output files for a full
load to indicate that the rows were inserted at the source database. If
IncludeOpForFullLoad is set to true or y, the INSERT is recorded
as an I annotation in the first field of the .csv file. This allows the
format of your target records from a full load to be consistent with the
target records from a CDC load.
CdcInsertsOnly and the
CdcInsertsAndUpdates parameters for output to .csv files only. For
more information about how these settings work together, see Indicating
Source DB Operations in Migrated S3 Data in the AWS Database
Migration Service User Guide..
#parquet_timestamp_in_millisecond ⇒ Boolean
A value that specifies the precision of any TIMESTAMP column values
that are written to an Amazon S3 object file in .parquet format.
ParquetTimestampInMillisecond parameter in
versions 3.1.4 and later.
When ParquetTimestampInMillisecond is set to true or y, AWS DMS
writes all TIMESTAMP columns in a .parquet formatted file with
millisecond precision. Otherwise, DMS writes them with microsecond
precision.
Currently, Amazon Athena and AWS Glue can handle only millisecond
precision for TIMESTAMP values. Set this parameter to true for S3
endpoint object files that are .parquet formatted only if you plan to
query or process the data with Athena or AWS Glue.
TIMESTAMP column values written to an S3 file in
.csv format with microsecond precision.
 Setting ParquetTimestampInMillisecond has no effect on the string
format of the timestamp column value that is inserted by setting the
TimestampColumnName parameter.
#parquet_version ⇒ String
The version of the Apache Parquet format that you want to use:
parquet_1_0 (the default) or parquet_2_0. 
Possible values:
- parquet-1-0
- parquet-2-0
#row_group_length ⇒ Integer
The number of rows in a row group. A smaller row group size provides faster reads. But as the number of row groups grows, the slower writes become. This parameter defaults to 10,000 rows. This number is used for .parquet file format only.
If you choose a value larger than the maximum, RowGroupLength is set
to the max row group length in bytes (64 * 1024 * 1024).
#server_side_encryption_kms_key_id ⇒ String
If you are using SSE_KMS for the EncryptionMode, provide the AWS KMS
key ID. The key that you use needs an attached policy that enables AWS
Identity and Access Management (IAM) user permissions and allows use of
the key.
Here is a CLI example: aws dms create-endpoint --endpoint-identifier
value --endpoint-type target --engine-name s3 --s3-settings
ServiceAccessRoleArn=value,BucketFolder=value,BucketName=value,EncryptionMode=SSE_KMS,ServerSideEncryptionKmsKeyId=value
#service_access_role_arn ⇒ String
The Amazon Resource Name (ARN) used by the service access IAM role. It is a required parameter that enables DMS to write and read objects from an 3S bucket.
#timestamp_column_name ⇒ String
A value that when nonblank causes AWS DMS to add a column with timestamp information to the endpoint data for an Amazon S3 target.
TimestampColumnName parameter in versions 3.1.4
and later.
DMS includes an additional STRING column in the .csv or .parquet
object files of your migrated data when you set TimestampColumnName to
a nonblank value.
For a full load, each row of this timestamp column contains a timestamp for when the data was transferred from the source to the target by DMS.
For a change data capture (CDC) load, each row of the timestamp column contains the timestamp for the commit of that row in the source database.
The string format for this timestamp column value is yyyy-MM-dd
HH:mm:ss.SSSSSS. By default, the precision of this value is in
microseconds. For a CDC load, the rounding of the precision depends on
the commit timestamp supported by DMS for the source database.
When the AddColumnName parameter is set to true, DMS also includes a
name for the timestamp column that you set with TimestampColumnName.