ParquetOutputFormatProps

class aws_cdk.aws_kinesisfirehose.ParquetOutputFormatProps(*, block_size=None, compression=None, enable_dictionary_compression=None, max_padding=None, page_size=None, writer_version=None)

Bases: object

Props for Parquet output format for data record format conversion.

Parameters:

block_size (Optional[Size]) – The Hadoop Distributed File System (HDFS) block size. This is useful if you intend to copy the data from Amazon S3 to HDFS before querying. Firehose uses this value for padding calculations. Default: Size.mebibytes(256)
compression (Optional[ParquetCompression]) – The compression code to use over data blocks. The possible values are UNCOMPRESSED , SNAPPY , and GZIP. Use SNAPPY for higher decompression speed. Use GZIP if the compression ratio is more important than speed. Default: SNAPPY
enable_dictionary_compression (Optional[bool]) – Indicates whether to enable dictionary compression. Default: false
max_padding (Optional[Size]) – The maximum amount of padding to apply. This is useful if you intend to copy the data from Amazon S3 to HDFS before querying. Default: no padding is applied
page_size (Optional[Size]) – The Parquet page size. Column chunks are divided into pages. A page is conceptually an indivisible unit (in terms of compression and encoding). The minimum value is 64 KiB and the default is 1 MiB. Default: Size.mebibytes(1)
writer_version (Optional[ParquetWriterVersion]) – Indicates the version of Parquet to output. The possible values are V1 and V2 Default: V1

ExampleMetadata:

infused

Example:

output_format = firehose.ParquetOutputFormat(
    block_size=Size.mebibytes(512),
    compression=firehose.ParquetCompression.UNCOMPRESSED,
    enable_dictionary_compression=True,
    max_padding=Size.bytes(10),
    page_size=Size.mebibytes(2),
    writer_version=firehose.ParquetWriterVersion.V2
)

Attributes

block_size

The Hadoop Distributed File System (HDFS) block size.

This is useful if you intend to copy the data from Amazon S3 to HDFS before querying. Firehose uses this value for padding calculations.

Default:: Size.mebibytes(256)
Minimum:: Size.mebibytes(64)

compression

The compression code to use over data blocks.

The possible values are UNCOMPRESSED , SNAPPY , and GZIP. Use SNAPPY for higher decompression speed. Use GZIP if the compression ratio is more important than speed.

Default:: SNAPPY
See:: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-kinesisfirehose-deliverystream-parquetserde.html#cfn-kinesisfirehose-deliverystream-parquetserde-compression

enable_dictionary_compression

Indicates whether to enable dictionary compression.

Default:: false
See:: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-kinesisfirehose-deliverystream-parquetserde.html#cfn-kinesisfirehose-deliverystream-parquetserde-enabledictionarycompression

max_padding

The maximum amount of padding to apply.

This is useful if you intend to copy the data from Amazon S3 to HDFS before querying.

Default:: no padding is applied
See:: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-kinesisfirehose-deliverystream-parquetserde.html#cfn-kinesisfirehose-deliverystream-parquetserde-maxpaddingbytes

page_size

The Parquet page size.

Column chunks are divided into pages. A page is conceptually an indivisible unit (in terms of compression and encoding). The minimum value is 64 KiB and the default is 1 MiB.

Default:: Size.mebibytes(1)
See:: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-kinesisfirehose-deliverystream-parquetserde.html#cfn-kinesisfirehose-deliverystream-parquetserde-pagesizebytes
Minimum:: Size.kibibytes(64)

writer_version

Indicates the version of Parquet to output.

The possible values are V1 and V2

Default:: V1
See:: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-kinesisfirehose-deliverystream-parquetserde.html#cfn-kinesisfirehose-deliverystream-parquetserde-writerversion