ParquetOutputFormatProps
- class aws_cdk.aws_kinesisfirehose.ParquetOutputFormatProps(*, block_size=None, compression=None, enable_dictionary_compression=None, max_padding=None, page_size=None, writer_version=None)
Bases:
objectProps for Parquet output format for data record format conversion.
- Parameters:
block_size (
Optional[Size]) – The Hadoop Distributed File System (HDFS) block size. This is useful if you intend to copy the data from Amazon S3 to HDFS before querying. Firehose uses this value for padding calculations. Default:Size.mebibytes(256)compression (
Optional[ParquetCompression]) – The compression code to use over data blocks. The possible values areUNCOMPRESSED,SNAPPY, andGZIP. UseSNAPPYfor higher decompression speed. UseGZIPif the compression ratio is more important than speed. Default:SNAPPYenable_dictionary_compression (
Optional[bool]) – Indicates whether to enable dictionary compression. Default:falsemax_padding (
Optional[Size]) – The maximum amount of padding to apply. This is useful if you intend to copy the data from Amazon S3 to HDFS before querying. Default: no padding is appliedpage_size (
Optional[Size]) – The Parquet page size. Column chunks are divided into pages. A page is conceptually an indivisible unit (in terms of compression and encoding). The minimum value is 64 KiB and the default is 1 MiB. Default:Size.mebibytes(1)writer_version (
Optional[ParquetWriterVersion]) – Indicates the version of Parquet to output. The possible values areV1andV2Default:V1
- ExampleMetadata:
infused
Example:
output_format = firehose.ParquetOutputFormat( block_size=Size.mebibytes(512), compression=firehose.ParquetCompression.UNCOMPRESSED, enable_dictionary_compression=True, max_padding=Size.bytes(10), page_size=Size.mebibytes(2), writer_version=firehose.ParquetWriterVersion.V2 )
Attributes
- block_size
The Hadoop Distributed File System (HDFS) block size.
This is useful if you intend to copy the data from Amazon S3 to HDFS before querying. Firehose uses this value for padding calculations.
- Default:
Size.mebibytes(256)- Minimum:
Size.mebibytes(64)
- compression
The compression code to use over data blocks.
The possible values are
UNCOMPRESSED,SNAPPY, andGZIP. UseSNAPPYfor higher decompression speed. UseGZIPif the compression ratio is more important than speed.
- enable_dictionary_compression
Indicates whether to enable dictionary compression.
- max_padding
The maximum amount of padding to apply.
This is useful if you intend to copy the data from Amazon S3 to HDFS before querying.
- page_size
The Parquet page size.
Column chunks are divided into pages. A page is conceptually an indivisible unit (in terms of compression and encoding). The minimum value is 64 KiB and the default is 1 MiB.
- Default:
Size.mebibytes(1)- See:
- Minimum:
Size.kibibytes(64)
- writer_version
Indicates the version of Parquet to output.
The possible values are
V1andV2