Class OrcOutputFormatProps.Builder
- All Implemented Interfaces:
software.amazon.jsii.Builder<OrcOutputFormatProps>
- Enclosing interface:
OrcOutputFormatProps
OrcOutputFormatProps-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionSets the value ofOrcOutputFormatProps.getBlockSize()bloomFilterColumns(List<String> bloomFilterColumns) Sets the value ofOrcOutputFormatProps.getBloomFilterColumns()bloomFilterFalsePositiveProbability(Number bloomFilterFalsePositiveProbability) Sets the value ofOrcOutputFormatProps.getBloomFilterFalsePositiveProbability()build()Builds the configured instance.compression(OrcCompression compression) Sets the value ofOrcOutputFormatProps.getCompression()dictionaryKeyThreshold(Number dictionaryKeyThreshold) Sets the value ofOrcOutputFormatProps.getDictionaryKeyThreshold()enablePadding(Boolean enablePadding) Sets the value ofOrcOutputFormatProps.getEnablePadding()formatVersion(OrcFormatVersion formatVersion) Sets the value ofOrcOutputFormatProps.getFormatVersion()paddingTolerance(Number paddingTolerance) Sets the value ofOrcOutputFormatProps.getPaddingTolerance()rowIndexStride(Number rowIndexStride) Sets the value ofOrcOutputFormatProps.getRowIndexStride()stripeSize(Size stripeSize) Sets the value ofOrcOutputFormatProps.getStripeSize()
-
Constructor Details
-
Builder
public Builder()
-
-
Method Details
-
blockSize
Sets the value ofOrcOutputFormatProps.getBlockSize()- Parameters:
blockSize- The Hadoop Distributed File System (HDFS) block size. This is useful if you intend to copy the data from Amazon S3 to HDFS before querying. Firehose uses this value for padding calculations.- Returns:
this
-
bloomFilterColumns
@Stability(Stable) public OrcOutputFormatProps.Builder bloomFilterColumns(List<String> bloomFilterColumns) Sets the value ofOrcOutputFormatProps.getBloomFilterColumns()- Parameters:
bloomFilterColumns- The column names for which you want Firehose to create bloom filters.- Returns:
this
-
bloomFilterFalsePositiveProbability
@Stability(Stable) public OrcOutputFormatProps.Builder bloomFilterFalsePositiveProbability(Number bloomFilterFalsePositiveProbability) Sets the value ofOrcOutputFormatProps.getBloomFilterFalsePositiveProbability()- Parameters:
bloomFilterFalsePositiveProbability- The Bloom filter false positive probability (FPP). The lower the FPP, the bigger the bloom filter.- Returns:
this
-
compression
Sets the value ofOrcOutputFormatProps.getCompression()- Parameters:
compression- The compression code to use over data blocks. The possible values areNONE,SNAPPY, andZLIB. UseSNAPPYfor higher decompression speed. UseGZIPif the compression ratio is more important than speed.- Returns:
this
-
dictionaryKeyThreshold
@Stability(Stable) public OrcOutputFormatProps.Builder dictionaryKeyThreshold(Number dictionaryKeyThreshold) Sets the value ofOrcOutputFormatProps.getDictionaryKeyThreshold()- Parameters:
dictionaryKeyThreshold- Determines whether dictionary encoding should be applied to a column. If the number of distinct keys (unique values) in a column exceeds this fraction of the total non-null rows in that column, dictionary encoding will be turned off for that specific column.To turn off dictionary encoding, set this threshold to 0. To always use dictionary encoding, set this threshold to 1.
- Returns:
this
-
enablePadding
Sets the value ofOrcOutputFormatProps.getEnablePadding()- Parameters:
enablePadding- Set this totrueto indicate that you want stripes to be padded to the HDFS block boundaries. This is useful if you intend to copy the data from Amazon S3 to HDFS before querying.- Returns:
this
-
formatVersion
@Stability(Stable) public OrcOutputFormatProps.Builder formatVersion(OrcFormatVersion formatVersion) Sets the value ofOrcOutputFormatProps.getFormatVersion()- Parameters:
formatVersion- The version of the ORC format to write. The possible values areV0_11andV0_12.- Returns:
this
-
paddingTolerance
Sets the value ofOrcOutputFormatProps.getPaddingTolerance()- Parameters:
paddingTolerance- A number between 0 and 1 that defines the tolerance for block padding as a decimal fraction of stripe size. The default value is 0.05, which means 5 percent of stripe size.For the default values of 64 MiB ORC stripes and 256 MiB HDFS blocks, the default block padding tolerance of 5 percent reserves a maximum of 3.2 MiB for padding within the 256 MiB block. In such a case, if the available size within the block is more than 3.2 MiB, a new, smaller stripe is inserted to fit within that space. This ensures that no stripe crosses block boundaries and causes remote reads within a node-local task.
Kinesis Data Firehose ignores this parameter when
EnablePaddingisfalse.- Returns:
this
-
rowIndexStride
Sets the value ofOrcOutputFormatProps.getRowIndexStride()- Parameters:
rowIndexStride- The number of rows between index entries.- Returns:
this
-
stripeSize
Sets the value ofOrcOutputFormatProps.getStripeSize()- Parameters:
stripeSize- The number of bytes in each stripe. The default is 64 MiB and the minimum is 8 MiB.- Returns:
this
-
build
Builds the configured instance.- Specified by:
buildin interfacesoftware.amazon.jsii.Builder<OrcOutputFormatProps>- Returns:
- a new instance of
OrcOutputFormatProps - Throws:
NullPointerException- if any required attribute was not provided
-