Package software.amazon.awscdk.services.kinesisfirehose
Amazon Data Firehose Construct Library
Amazon Data Firehose, formerly known as Amazon Kinesis Data Firehose, is a service for fully-managed delivery of real-time streaming data to storage services such as Amazon S3, Amazon Redshift, Amazon Elasticsearch, Splunk, or any custom HTTP endpoint or third-party services such as Datadog, Dynatrace, LogicMonitor, MongoDB, New Relic, and Sumo Logic.
Amazon Data Firehose delivery streams are distinguished from Kinesis data streams in their models of consumption. Whereas consumers read from a data stream by actively pulling data from the stream, a delivery stream pushes data to its destination on a regular cadence. This means that data streams are intended to have consumers that do on-demand processing, like AWS Lambda or Amazon EC2. On the other hand, delivery streams are intended to have destinations that are sources for offline processing and analytics, such as Amazon S3 and Amazon Redshift.
This module is part of the AWS Cloud Development Kit project. It allows you to define Amazon Data Firehose delivery streams.
Defining a Delivery Stream
In order to define a Delivery Stream, you must specify a destination. An S3 bucket can be used as a destination. Currently the CDK supports only S3 as a destination which is covered below.
Bucket bucket = new Bucket(this, "Bucket");
DeliveryStream.Builder.create(this, "Delivery Stream")
.destination(new S3Bucket(bucket))
.build();
The above example defines the following resources:
- An S3 bucket
- An Amazon Data Firehose delivery stream with Direct PUT as the source and CloudWatch error logging turned on.
- An IAM role which gives the delivery stream permission to write to the S3 bucket.
Sources
An Amazon Data Firehose delivery stream can accept data from three main sources: Kinesis Data Streams, Managed Streaming for Apache Kafka (MSK), or via a "direct put" (API calls). Currently only Kinesis Data Streams and direct put are supported in the CDK.
See: Sending Data to a Delivery Stream in the Amazon Data Firehose Developer Guide.
Kinesis Data Stream
A delivery stream can read directly from a Kinesis data stream as a consumer of the data
stream. Configure this behaviour by passing in a data stream in the source
property via the KinesisStreamSource class when constructing a delivery stream:
IDestination destination;
Stream sourceStream = new Stream(this, "Source Stream");
DeliveryStream.Builder.create(this, "Delivery Stream")
.source(new KinesisStreamSource(sourceStream))
.destination(destination)
.build();
Direct Put
Data must be provided via "direct put", ie., by using a PutRecord or
PutRecordBatch API call. There are a number of ways of doing so, such as:
- Kinesis Agent: a standalone Java application that monitors and delivers files while handling file rotation, checkpointing, and retries. See: Writing to Amazon Data Firehose Using Kinesis Agent in the Amazon Data Firehose Developer Guide.
- AWS SDK: a general purpose solution that allows you to deliver data to a delivery stream from anywhere using Java, .NET, Node.js, Python, or Ruby. See: Writing to Amazon Data Firehose Using the AWS SDK in the Amazon Data Firehose Developer Guide.
- CloudWatch Logs: subscribe to a log group and receive filtered log events directly into a delivery stream. See: logs-destinations.
- Eventbridge: add an event rule target to send events to a delivery stream based on the rule filtering. See: events-targets.
- SNS: add a subscription to send all notifications from the topic to a delivery stream. See: sns-subscriptions.
- IoT: add an action to an IoT rule to send various IoT information to a delivery stream
Destinations
Amazon Data Firehose supports multiple AWS and third-party services as destinations, including Amazon S3, Amazon Redshift, and more. You can find the full list of supported destination here.
Currently in the AWS CDK, only S3 is implemented as an L2 construct destination. Other destinations can still be configured using L1 constructs.
S3
Defining a delivery stream with an S3 bucket destination:
Bucket bucket;
S3Bucket s3Destination = new S3Bucket(bucket);
DeliveryStream.Builder.create(this, "Delivery Stream")
.destination(s3Destination)
.build();
The S3 destination also supports custom dynamic prefixes. dataOutputPrefix
will be used for files successfully delivered to S3. errorOutputPrefix will be added to
failed records before writing them to S3.
import software.amazon.awscdk.TimeZone;
Bucket bucket;
S3Bucket s3Destination = S3Bucket.Builder.create(bucket)
.dataOutputPrefix("myFirehose/DeliveredYear=!{timestamp:yyyy}/anyMonth/rand=!{firehose:random-string}")
.errorOutputPrefix("myFirehoseFailures/!{firehose:error-output-type}/!{timestamp:yyyy}/anyMonth/!{timestamp:dd}")
// The time zone of timestamps (default UTC)
.timeZone(TimeZone.ASIA_TOKYO)
.build();
See: Custom S3 Prefixes in the Amazon Data Firehose Developer Guide.
To override default file extension appended by Data Format Conversion or S3 compression features, specify fileExtension.
Bucket bucket;
S3Bucket s3Destination = S3Bucket.Builder.create(bucket)
.compression(Compression.GZIP)
.fileExtension(".json.gz")
.build();
Data Format Conversion
Data format conversion allows automatic conversion of inputs from JSON to either Parquet or ORC. Converting JSON records to columnar formats like Parquet or ORC can help speed up analytical querying while also increasing compression efficiency. When data format conversion is specified, it automatically enables Snappy compression on the output.
Only S3 Destinations support data format conversion.
An example of defining an S3 destination configured with data format conversion:
Bucket bucket;
CfnTable schemaGlueTable;
S3Bucket s3Destination = S3Bucket.Builder.create(bucket)
.dataFormatConversion(DataFormatConversionProps.builder()
.schemaConfiguration(SchemaConfiguration.fromCfnTable(schemaGlueTable))
.inputFormat(InputFormat.OPENX_JSON)
.outputFormat(OutputFormat.PARQUET)
.build())
.build();
When data format conversion is enabled, the Delivery Stream's buffering size must be at least 64 MiB. Additionally, the default buffering size is changed from 5 MiB to 128 MiB. This mirrors the Cloudformation behavior.
You can only parse JSON and transform it into either Parquet or ORC:
- to read JSON using OpenX parser, choose
InputFormat.OPENX_JSON. - to read JSON using Hive parser, choose
InputFormat.HIVE_JSON. - to transform into Parquet, choose
OutputFormat.PARQUET. - to transform into ORC, choose
OutputFormat.ORC.
The following subsections explain how to specify advanced configuration options for each input and output format if the defaults are not desirable
Input Format: OpenX JSON
Example creation of custom OpenX JSON InputFormat:
OpenXJsonInputFormat inputFormat = OpenXJsonInputFormat.Builder.create()
.lowercaseColumnNames(false)
.columnToJsonKeyMappings(Map.of("ts", "timestamp"))
.convertDotsInJsonKeysToUnderscores(true)
.build();
Input Format: Hive JSON
Example creation of custom Hive JSON InputFormat:
HiveJsonInputFormat inputFormat = HiveJsonInputFormat.Builder.create()
.timestampParsers(List.of(TimestampParser.fromFormatString("yyyy-MM-dd"), TimestampParser.EPOCH_MILLIS))
.build();
Hive JSON allows you to specify custom timestamp formats to parse. The syntax of the format string is Joda Time.
To parse timestamps formatted as milliseconds since epoch, use the convenience constant TimestampParser.EPOCH_MILLIS.
Output Format: Parquet
Example of a custom Parquet OutputFormat, with all values changed from the defaults.
ParquetOutputFormat outputFormat = ParquetOutputFormat.Builder.create()
.blockSize(Size.mebibytes(512))
.compression(ParquetCompression.UNCOMPRESSED)
.enableDictionaryCompression(true)
.maxPadding(Size.bytes(10))
.pageSize(Size.mebibytes(2))
.writerVersion(ParquetWriterVersion.V2)
.build();
Output Format: ORC
Example creation of custom ORC OutputFormat, with all values changed from the defaults.
OrcOutputFormat outputFormat = OrcOutputFormat.Builder.create()
.formatVersion(OrcFormatVersion.V0_11)
.blockSize(Size.mebibytes(256))
.compression(OrcCompression.NONE)
.bloomFilterColumns(List.of("columnA"))
.bloomFilterFalsePositiveProbability(0.1)
.dictionaryKeyThreshold(0.7)
.enablePadding(true)
.paddingTolerance(0.2)
.rowIndexStride(9000)
.stripeSize(Size.mebibytes(32))
.build();
Server-side Encryption
Enabling server-side encryption (SSE) requires Amazon Data Firehose to encrypt all data sent to delivery stream when it is stored at rest. This means that data is encrypted before being written to the service's internal storage layer and decrypted after it is received from the internal storage layer. The service manages keys and cryptographic operations so that sources and destinations do not need to, as the data is encrypted and decrypted at the boundaries of the service (i.e., before the data is delivered to a destination). By default, delivery streams do not have SSE enabled.
The Key Management Service keys (KMS keys) used for SSE can either be AWS-owned or customer-managed. AWS-owned KMS keys are created, owned and managed by AWS for use in multiple AWS accounts. As a customer, you cannot view, use, track, or manage these keys, and you are not charged for their use. On the other hand, customer-managed KMS keys are created and owned within your account and managed entirely by you. As a customer, you are responsible for managing access, rotation, aliases, and deletion for these keys, and you are changed for their use.
See: AWS KMS keys in the KMS Developer Guide.
IDestination destination;
// SSE with an customer-managed key that is explicitly specified
Key key;
// SSE with an AWS-owned key
// SSE with an AWS-owned key
DeliveryStream.Builder.create(this, "Delivery Stream with AWS Owned Key")
.encryption(StreamEncryption.awsOwnedKey())
.destination(destination)
.build();
// SSE with an customer-managed key that is created automatically by the CDK
// SSE with an customer-managed key that is created automatically by the CDK
DeliveryStream.Builder.create(this, "Delivery Stream with Customer Managed Key")
.encryption(StreamEncryption.customerManagedKey())
.destination(destination)
.build();
DeliveryStream.Builder.create(this, "Delivery Stream with Customer Managed and Provided Key")
.encryption(StreamEncryption.customerManagedKey(key))
.destination(destination)
.build();
See: Data Protection in the Amazon Data Firehose Developer Guide.
Monitoring
Amazon Data Firehose is integrated with CloudWatch, so you can monitor the performance of your delivery streams via logs and metrics.
Logs
Amazon Data Firehose will send logs to CloudWatch when data transformation or data delivery fails. The CDK will enable logging by default and create a CloudWatch LogGroup and LogStream with default settings for your Delivery Stream.
When creating a destination, you can provide an ILoggingConfig, which can either be an EnableLogging or DisableLogging instance.
If you use EnableLogging, the CDK will create a CloudWatch LogGroup and LogStream with all CloudFormation default settings for you, or you can optionally
specify your own log group to be used for capturing and storing log events. For example:
import software.amazon.awscdk.services.logs.*;
Bucket bucket;
LogGroup logGroup = new LogGroup(this, "Log Group");
S3Bucket destination = S3Bucket.Builder.create(bucket)
.loggingConfig(new EnableLogging(logGroup))
.build();
DeliveryStream.Builder.create(this, "Delivery Stream")
.destination(destination)
.build();
Logging can also be disabled:
Bucket bucket;
S3Bucket destination = S3Bucket.Builder.create(bucket)
.loggingConfig(new DisableLogging())
.build();
DeliveryStream.Builder.create(this, "Delivery Stream")
.destination(destination)
.build();
See: Monitoring using CloudWatch Logs in the Amazon Data Firehose Developer Guide.
Metrics
Amazon Data Firehose sends metrics to CloudWatch so that you can collect and analyze the performance of the delivery stream, including data delivery, data ingestion, data transformation, format conversion, API usage, encryption, and resource usage. You can then use CloudWatch alarms to alert you, for example, when data freshness (the age of the oldest record in the delivery stream) exceeds the buffering limit (indicating that data is not being delivered to your destination), or when the rate of incoming records exceeds the limit of records per second (indicating data is flowing into your delivery stream faster than it is configured to process).
CDK provides methods for accessing delivery stream metrics with default configuration,
such as metricIncomingBytes, and metricIncomingRecords (see IDeliveryStream
for a full list). CDK also provides a generic metric method that can be used to produce
metric configurations for any metric provided by Amazon Data Firehose; the configurations
are pre-populated with the correct dimensions for the delivery stream.
import software.amazon.awscdk.services.cloudwatch.*;
DeliveryStream deliveryStream;
// Alarm that triggers when the per-second average of incoming bytes exceeds 90% of the current service limit
MathExpression incomingBytesPercentOfLimit = MathExpression.Builder.create()
.expression("incomingBytes / 300 / bytePerSecLimit")
.usingMetrics(Map.of(
"incomingBytes", deliveryStream.metricIncomingBytes(MetricOptions.builder().statistic(Statistic.SUM).build()),
"bytePerSecLimit", deliveryStream.metric("BytesPerSecondLimit")))
.build();
Alarm.Builder.create(this, "Alarm")
.metric(incomingBytesPercentOfLimit)
.threshold(0.9)
.evaluationPeriods(3)
.build();
See: Monitoring Using CloudWatch Metrics in the Amazon Data Firehose Developer Guide.
Compression
Your data can automatically be compressed when it is delivered to S3 as either a final or an intermediary/backup destination. Supported compression formats are: gzip, Snappy, Hadoop-compatible Snappy, and ZIP, except for Redshift destinations, where Snappy (regardless of Hadoop-compatibility) and ZIP are not supported. By default, data is delivered to S3 without compression.
// Compress data delivered to S3 using Snappy
Bucket bucket;
S3Bucket s3Destination = S3Bucket.Builder.create(bucket)
.compression(Compression.SNAPPY)
.build();
DeliveryStream.Builder.create(this, "Delivery Stream")
.destination(s3Destination)
.build();
Buffering
Incoming data is buffered before it is delivered to the specified destination. The delivery stream will wait until the amount of incoming data has exceeded some threshold (the "buffer size") or until the time since the last data delivery occurred exceeds some threshold (the "buffer interval"), whichever happens first. You can configure these thresholds based on the capabilities of the destination and your use-case. By default, the buffer size is 5 MiB and the buffer interval is 5 minutes.
// Increase the buffer interval and size to 10 minutes and 8 MiB, respectively
Bucket bucket;
S3Bucket destination = S3Bucket.Builder.create(bucket)
.bufferingInterval(Duration.minutes(10))
.bufferingSize(Size.mebibytes(8))
.build();
DeliveryStream.Builder.create(this, "Delivery Stream")
.destination(destination)
.build();
See: Data Delivery Frequency in the Amazon Data Firehose Developer Guide.
Zero buffering, where Amazon Data Firehose stream can be configured to not buffer data before delivery, is supported by setting the "buffer interval" to 0.
// Setup zero buffering
Bucket bucket;
S3Bucket destination = S3Bucket.Builder.create(bucket)
.bufferingInterval(Duration.seconds(0))
.build();
DeliveryStream.Builder.create(this, "ZeroBufferDeliveryStream")
.destination(destination)
.build();
See: Buffering Hints.
Destination Encryption
Your data can be automatically encrypted when it is delivered to S3 as a final or an intermediary/backup destination. Amazon Data Firehose supports Amazon S3 server-side encryption with AWS Key Management Service (AWS KMS) for encrypting delivered data in Amazon S3. You can choose to not encrypt the data or to encrypt with a key from the list of AWS KMS keys that you own. For more information, see Protecting Data Using Server-Side Encryption with AWS KMS–Managed Keys (SSE-KMS). By default, encryption isn’t directly enabled on the delivery stream; instead, it uses the default encryption settings of the destination S3 bucket.
Bucket bucket;
Key key;
S3Bucket destination = S3Bucket.Builder.create(bucket)
.encryptionKey(key)
.build();
DeliveryStream.Builder.create(this, "Delivery Stream")
.destination(destination)
.build();
Backup
A delivery stream can be configured to back up data to S3 that it attempted to deliver to the configured destination. Backed up data can be all the data that the delivery stream attempted to deliver or just data that it failed to deliver (Redshift and S3 destinations can only back up all data). CDK can create a new S3 bucket where it will back up data, or you can provide a bucket where data will be backed up. You can also provide a prefix under which your backed-up data will be placed within the bucket. By default, source data is not backed up to S3.
// Enable backup of all source records (to an S3 bucket created by CDK).
Bucket bucket;
// Explicitly provide an S3 bucket to which all source records will be backed up.
Bucket backupBucket;
DeliveryStream.Builder.create(this, "Delivery Stream Backup All")
.destination(
S3Bucket.Builder.create(bucket)
.s3Backup(DestinationS3BackupProps.builder()
.mode(BackupMode.ALL)
.build())
.build())
.build();
DeliveryStream.Builder.create(this, "Delivery Stream Backup All Explicit Bucket")
.destination(
S3Bucket.Builder.create(bucket)
.s3Backup(DestinationS3BackupProps.builder()
.bucket(backupBucket)
.build())
.build())
.build();
// Explicitly provide an S3 prefix under which all source records will be backed up.
// Explicitly provide an S3 prefix under which all source records will be backed up.
DeliveryStream.Builder.create(this, "Delivery Stream Backup All Explicit Prefix")
.destination(
S3Bucket.Builder.create(bucket)
.s3Backup(DestinationS3BackupProps.builder()
.mode(BackupMode.ALL)
.dataOutputPrefix("mybackup")
.build())
.build())
.build();
If any Data Processing or Transformation is configured on your Delivery Stream, the source records will be backed up in their original format.
Data Processing/Transformation
Data can be transformed before being delivered to destinations. There are two types of data processing for delivery streams: record transformation with AWS Lambda, and record format conversion using a schema stored in an AWS Glue table. If both types of data processing are configured, then the Lambda transformation is performed first. By default, no data processing occurs.
This construct library currently only supports data transformation with AWS Lambda and some built-in data processors. See #15501 to track the status of adding support for record format conversion.
Data transformation with AWS Lambda
To transform the data, Amazon Data Firehose will call a Lambda function that you provide and deliver the data returned in place of the source record. The function must return a result that contains records in a specific format, including the following fields:
recordId-- the ID of the input record that corresponds the results.result-- the status of the transformation of the record: "Ok" (success), "Dropped" (not processed intentionally), or "ProcessingFailed" (not processed due to an error).data-- the transformed data, Base64-encoded.
The data is buffered up to 1 minute and up to 3 MiB by default before being sent to the
function, but can be configured using bufferInterval and bufferSize
in the processor configuration (see: Buffering). If the function invocation
fails due to a network timeout or because of hitting an invocation limit, the invocation
is retried 3 times by default, but can be configured using retries in the processor
configuration.
Bucket bucket;
// Provide a Lambda function that will transform records before delivery, with custom
// buffering and retry configuration
Function lambdaFunction = Function.Builder.create(this, "Processor")
.runtime(Runtime.NODEJS_LATEST)
.handler("index.handler")
.code(Code.fromAsset(join(__dirname, "process-records")))
.build();
LambdaFunctionProcessor lambdaProcessor = LambdaFunctionProcessor.Builder.create(lambdaFunction)
.bufferInterval(Duration.minutes(5))
.bufferSize(Size.mebibytes(5))
.retries(5)
.build();
S3Bucket s3Destination = S3Bucket.Builder.create(bucket)
.processors(List.of(lambdaProcessor))
.build();
DeliveryStream.Builder.create(this, "Delivery Stream")
.destination(s3Destination)
.build();
import path.*;
import software.amazon.awscdk.services.kinesisfirehose.*;
import software.amazon.awscdk.services.kms.*;
import software.amazon.awscdk.services.lambda.nodejs.*;
import software.amazon.awscdk.services.logs.*;
import software.amazon.awscdk.services.s3.*;
import software.amazon.awscdk.*;
import software.amazon.awscdk.integtests.alpha.AwsApiCall;
import software.amazon.awscdk.integtests.alpha.ExpectedResult;
import software.amazon.awscdk.integtests.alpha.IntegTest;
App app = App.Builder.create()
.postCliContext(Map.of(
"@aws-cdk/aws-lambda:useCdkManagedLogGroup", false))
.build();
Stack stack = new Stack(app, "aws-cdk-firehose-delivery-stream-s3-all-properties");
Bucket bucket = Bucket.Builder.create(stack, "FirehoseDeliveryStreamS3AllPropertiesBucket")
.removalPolicy(RemovalPolicy.DESTROY)
.autoDeleteObjects(true)
.build();
Bucket backupBucket = Bucket.Builder.create(stack, "FirehoseDeliveryStreamS3AllPropertiesBackupBucket")
.removalPolicy(RemovalPolicy.DESTROY)
.autoDeleteObjects(true)
.build();
LogGroup logGroup = LogGroup.Builder.create(stack, "LogGroup")
.removalPolicy(RemovalPolicy.DESTROY)
.build();
NodejsFunction dataProcessorFunction = NodejsFunction.Builder.create(stack, "DataProcessorFunction")
.entry(join(__dirname, "lambda-data-processor.js"))
.timeout(Duration.minutes(1))
.build();
LambdaFunctionProcessor processor = LambdaFunctionProcessor.Builder.create(dataProcessorFunction)
.bufferInterval(Duration.seconds(60))
.bufferSize(Size.mebibytes(1))
.retries(1)
.build();
Key key = Key.Builder.create(stack, "Key")
.removalPolicy(RemovalPolicy.DESTROY)
.build();
Key backupKey = Key.Builder.create(stack, "BackupKey")
.removalPolicy(RemovalPolicy.DESTROY)
.build();
DeliveryStream deliveryStream = DeliveryStream.Builder.create(stack, "DeliveryStream")
.destination(S3Bucket.Builder.create(bucket)
.loggingConfig(new EnableLogging(logGroup))
.processor(processor)
.compression(Compression.GZIP)
.dataOutputPrefix("regularPrefix")
.errorOutputPrefix("errorPrefix")
.fileExtension(".log.gz")
.timeZone(TimeZone.ASIA_TOKYO)
.bufferingInterval(Duration.seconds(60))
.bufferingSize(Size.mebibytes(1))
.encryptionKey(key)
.s3Backup(DestinationS3BackupProps.builder()
.mode(BackupMode.ALL)
.bucket(backupBucket)
.compression(Compression.ZIP)
.dataOutputPrefix("backupPrefix")
.errorOutputPrefix("backupErrorPrefix")
.bufferingInterval(Duration.seconds(60))
.bufferingSize(Size.mebibytes(1))
.encryptionKey(backupKey)
.build())
.build())
.build();
DeliveryStream.Builder.create(stack, "ZeroBufferingDeliveryStream")
.destination(S3Bucket.Builder.create(bucket)
.compression(Compression.GZIP)
.dataOutputPrefix("regularPrefix")
.errorOutputPrefix("errorPrefix")
.bufferingInterval(Duration.seconds(0))
.build())
.build();
IntegTest testCase = IntegTest.Builder.create(app, "integ-tests")
.testCases(List.of(stack))
.regions(List.of("us-east-1"))
.build();
testCase.assertions.awsApiCall("Firehose", "putRecord", Map.of(
"DeliveryStreamName", deliveryStream.getDeliveryStreamName(),
"Record", Map.of(
"Data", "testData123")));
IApiCall s3ApiCall = testCase.assertions.awsApiCall("S3", "listObjectsV2", Map.of(
"Bucket", bucket.bucketName,
"MaxKeys", 1)).expect(ExpectedResult.objectLike(Map.of(
"KeyCount", 1))).waitForAssertions(WaiterStateMachineOptions.builder()
.interval(Duration.seconds(30))
.totalTimeout(Duration.minutes(10))
.build());
if (s3ApiCall instanceof AwsApiCall && s3ApiCall.getWaiterProvider()) {
s3ApiCall.waiterProvider.addToRolePolicy(Map.of(
"Effect", "Allow",
"Action", List.of("s3:GetObject", "s3:ListBucket"),
"Resource", List.of("*")));
}
See: Data Transformation in the Amazon Data Firehose Developer Guide.
Add a new line delimiter when delivering data to Amazon S3
You can specify the AppendDelimiterToRecordProcessor built-in processor to add a new line delimiter between records in objects that are delivered to Amazon S3. This can be helpful for parsing objects in Amazon S3.
For details, see Use Amazon S3 bucket prefix to deliver data.
Bucket bucket;
S3Bucket s3Destination = S3Bucket.Builder.create(bucket)
.processors(List.of(
new AppendDelimiterToRecordProcessor()))
.build();
DeliveryStream.Builder.create(this, "Delivery Stream")
.destination(s3Destination)
.build();
Decompress and extract message of CloudWatch Logs
CloudWatch Logs events are sent to Firehose in compressed gzip format. If you want to deliver decompressed log events to Firehose destinations, you can use the DecompressionProcessor to automatically decompress CloudWatch Logs.
For details, see Send CloudWatch Logs to Firehose.
You may also needed to specify AppendDelimiterToRecordProcessor
because decompressed log events record has no trailing newline.
Bucket bucket;
S3Bucket s3Destination = S3Bucket.Builder.create(bucket)
.processors(List.of(
new DecompressionProcessor(),
new AppendDelimiterToRecordProcessor()))
.build();
DeliveryStream.Builder.create(this, "Delivery Stream")
.destination(s3Destination)
.build();
When you enable decompression, you have the option to also enable message extraction. When using message extraction, Firehose filters out all metadata, such as owner, loggroup, logstream, and others from the decompressed CloudWatch Logs records and delivers only the content inside the message fields.
Bucket bucket;
S3Bucket s3Destination = S3Bucket.Builder.create(bucket)
.processors(List.of(
new DecompressionProcessor(),
CloudWatchLogProcessor.Builder.create().dataMessageExtraction(true).build()))
.build();
DeliveryStream.Builder.create(this, "Delivery Stream")
.destination(s3Destination)
.build();
Specifying an IAM role
The DeliveryStream class automatically creates IAM service roles with all the minimum
necessary permissions for Amazon Data Firehose to access the resources referenced by your
delivery stream. One service role is created for the delivery stream that allows Amazon
Data Firehose to read from a Kinesis data stream (if one is configured as the delivery
stream source) and for server-side encryption. Note that if the DeliveryStream is created
without specifying a source or encryptionKey, this role is not created as it is not needed.
Another service role is created for each destination, which gives Amazon Data Firehose write access to the destination resource, as well as the ability to invoke data transformers and read schemas for record format conversion. If you wish, you may specify your own IAM role for either the delivery stream or the destination service role, or both. It must have the correct trust policy (it must allow Amazon Data Firehose to assume it) or delivery stream creation or data delivery will fail. Other required permissions to destination resources, encryption keys, etc., will be provided automatically.
// Specify the roles created above when defining the destination and delivery stream.
Bucket bucket;
// Create service roles for the delivery stream and destination.
// These can be used for other purposes and granted access to different resources.
// They must include the Amazon Data Firehose service principal in their trust policies.
// Two separate roles are shown below, but the same role can be used for both purposes.
Role deliveryStreamRole = Role.Builder.create(this, "Delivery Stream Role")
.assumedBy(new ServicePrincipal("firehose.amazonaws.com"))
.build();
Role destinationRole = Role.Builder.create(this, "Destination Role")
.assumedBy(new ServicePrincipal("firehose.amazonaws.com"))
.build();
S3Bucket destination = S3Bucket.Builder.create(bucket).role(destinationRole).build();
DeliveryStream.Builder.create(this, "Delivery Stream")
.destination(destination)
.role(deliveryStreamRole)
.build();
See Controlling Access in the Amazon Data Firehose Developer Guide.
Granting application access to a delivery stream
IAM roles, users or groups which need to be able to work with delivery streams should be granted IAM permissions.
Any object that implements the IGrantable interface (i.e., has an associated principal)
can be granted permissions to a delivery stream by calling:
grantPutRecords(principal)- grants the principal the ability to put records onto the delivery streamgrant(principal, ...actions)- grants the principal permission to a custom set of actions
// Give the role permissions to write data to the delivery stream
DeliveryStream deliveryStream;
Role lambdaRole = Role.Builder.create(this, "Role")
.assumedBy(new ServicePrincipal("lambda.amazonaws.com"))
.build();
deliveryStream.grantPutRecords(lambdaRole);
The following write permissions are provided to a service principal by the
grantPutRecords() method:
firehose:PutRecordfirehose:PutRecordBatch
Granting a delivery stream access to a resource
Conversely to the above, Amazon Data Firehose requires permissions in order for delivery
streams to interact with resources that you own. For example, if an S3 bucket is specified
as a destination of a delivery stream, the delivery stream must be granted permissions to
put and get objects from the bucket. When using the built-in AWS service destinations, the CDK grants the
permissions automatically. However, custom or third-party destinations may require custom
permissions. In this case, use the delivery stream as an IGrantable, as follows:
DeliveryStream deliveryStream;
Function fn = Function.Builder.create(this, "Function")
.code(Code.fromInline("exports.handler = (event) => {}"))
.runtime(Runtime.NODEJS_LATEST)
.handler("index.handler")
.build();
fn.grantInvoke(deliveryStream);
-
ClassDescriptionThe data processor to append new line delimiter to each record.Options for S3 record backup of a delivery stream.The
AWS::KinesisFirehose::DeliveryStreamresource specifies an Amazon Kinesis Data Firehose (Kinesis Data Firehose) delivery stream that delivers real-time streaming data to an Amazon Simple Storage Service (Amazon S3), Amazon Redshift, or Amazon Elasticsearch Service (Amazon ES) destination.Describes the buffering to perform before delivering data to the Serverless offering for Amazon OpenSearch Service destination.An implementation forCfnDeliveryStream.AmazonOpenSearchServerlessBufferingHintsPropertyDescribes the configuration of a destination in the Serverless offering for Amazon OpenSearch Service.An implementation forCfnDeliveryStream.AmazonOpenSearchServerlessDestinationConfigurationPropertyConfigures retry behavior in case Firehose is unable to deliver documents to the Serverless offering for Amazon OpenSearch Service.An implementation forCfnDeliveryStream.AmazonOpenSearchServerlessRetryOptionsPropertyDescribes the buffering to perform before delivering data to the Amazon OpenSearch Service destination.An implementation forCfnDeliveryStream.AmazonopensearchserviceBufferingHintsPropertyDescribes the configuration of a destination in Amazon OpenSearch Service.An implementation forCfnDeliveryStream.AmazonopensearchserviceDestinationConfigurationPropertyConfigures retry behavior in case Kinesis Data Firehose is unable to deliver documents to Amazon OpenSearch Service.An implementation forCfnDeliveryStream.AmazonopensearchserviceRetryOptionsPropertyThe authentication configuration of the Amazon MSK cluster.A builder forCfnDeliveryStream.AuthenticationConfigurationPropertyAn implementation forCfnDeliveryStream.AuthenticationConfigurationPropertyTheBufferingHintsproperty type specifies how Amazon Kinesis Data Firehose (Kinesis Data Firehose) buffers incoming data before delivering it to the destination.A builder forCfnDeliveryStream.BufferingHintsPropertyAn implementation forCfnDeliveryStream.BufferingHintsPropertyA fluent builder forCfnDeliveryStream.Describes the containers where the destination Apache Iceberg Tables are persisted.A builder forCfnDeliveryStream.CatalogConfigurationPropertyAn implementation forCfnDeliveryStream.CatalogConfigurationPropertyTheCloudWatchLoggingOptionsproperty type specifies Amazon CloudWatch Logs (CloudWatch Logs) logging options that Amazon Kinesis Data Firehose (Kinesis Data Firehose) uses for the delivery stream.A builder forCfnDeliveryStream.CloudWatchLoggingOptionsPropertyAn implementation forCfnDeliveryStream.CloudWatchLoggingOptionsPropertyTheCopyCommandproperty type configures the Amazon RedshiftCOPYcommand that Amazon Kinesis Data Firehose (Kinesis Data Firehose) uses to load data into an Amazon Redshift cluster from an Amazon S3 bucket.A builder forCfnDeliveryStream.CopyCommandPropertyAn implementation forCfnDeliveryStream.CopyCommandPropertyExample:A builder forCfnDeliveryStream.DatabaseColumnsPropertyAn implementation forCfnDeliveryStream.DatabaseColumnsPropertyThe structure to configure the authentication methods for Firehose to connect to source database endpoint.An implementation forCfnDeliveryStream.DatabaseSourceAuthenticationConfigurationPropertyThe top level object for configuring streams with database as a source.A builder forCfnDeliveryStream.DatabaseSourceConfigurationPropertyAn implementation forCfnDeliveryStream.DatabaseSourceConfigurationPropertyThe structure for details of the VPC Endpoint Service which Firehose uses to create a PrivateLink to the database.A builder forCfnDeliveryStream.DatabaseSourceVPCConfigurationPropertyAn implementation forCfnDeliveryStream.DatabaseSourceVPCConfigurationPropertyExample:A builder forCfnDeliveryStream.DatabasesPropertyAn implementation forCfnDeliveryStream.DatabasesPropertyExample:A builder forCfnDeliveryStream.DatabaseTablesPropertyAn implementation forCfnDeliveryStream.DatabaseTablesPropertySpecifies that you want Kinesis Data Firehose to convert data from the JSON format to the Parquet or ORC format before writing it to Amazon S3.An implementation forCfnDeliveryStream.DataFormatConversionConfigurationPropertySpecifies the type and Amazon Resource Name (ARN) of the CMK to use for Server-Side Encryption (SSE).An implementation forCfnDeliveryStream.DeliveryStreamEncryptionConfigurationInputPropertyThe deserializer you want Kinesis Data Firehose to use for converting the input data from JSON.A builder forCfnDeliveryStream.DeserializerPropertyAn implementation forCfnDeliveryStream.DeserializerPropertyDescribes the configuration of a destination in Apache Iceberg Tables.A builder forCfnDeliveryStream.DestinationTableConfigurationPropertyAn implementation forCfnDeliveryStream.DestinationTableConfigurationPropertyThe structure that configures parameters such asThroughputHintInMBsfor a stream configured with Direct PUT as a source.A builder forCfnDeliveryStream.DirectPutSourceConfigurationPropertyAn implementation forCfnDeliveryStream.DirectPutSourceConfigurationPropertyIndicates the method for setting up document ID.A builder forCfnDeliveryStream.DocumentIdOptionsPropertyAn implementation forCfnDeliveryStream.DocumentIdOptionsPropertyTheDynamicPartitioningConfigurationproperty type specifies the configuration of the dynamic partitioning mechanism that creates targeted data sets from the streaming data by partitioning it based on partition keys.An implementation forCfnDeliveryStream.DynamicPartitioningConfigurationPropertyTheElasticsearchBufferingHintsproperty type specifies how Amazon Kinesis Data Firehose (Kinesis Data Firehose) buffers incoming data while delivering it to the destination.A builder forCfnDeliveryStream.ElasticsearchBufferingHintsPropertyAn implementation forCfnDeliveryStream.ElasticsearchBufferingHintsPropertyTheElasticsearchDestinationConfigurationproperty type specifies an Amazon Elasticsearch Service (Amazon ES) domain that Amazon Kinesis Data Firehose (Kinesis Data Firehose) delivers data to.An implementation forCfnDeliveryStream.ElasticsearchDestinationConfigurationPropertyTheElasticsearchRetryOptionsproperty type configures the retry behavior for when Amazon Kinesis Data Firehose (Kinesis Data Firehose) can't deliver data to Amazon Elasticsearch Service (Amazon ES).A builder forCfnDeliveryStream.ElasticsearchRetryOptionsPropertyAn implementation forCfnDeliveryStream.ElasticsearchRetryOptionsPropertyTheEncryptionConfigurationproperty type specifies the encryption settings that Amazon Kinesis Data Firehose (Kinesis Data Firehose) uses when delivering data to Amazon Simple Storage Service (Amazon S3).A builder forCfnDeliveryStream.EncryptionConfigurationPropertyAn implementation forCfnDeliveryStream.EncryptionConfigurationPropertyTheExtendedS3DestinationConfigurationproperty type configures an Amazon S3 destination for an Amazon Kinesis Data Firehose delivery stream.An implementation forCfnDeliveryStream.ExtendedS3DestinationConfigurationPropertyThe native Hive / HCatalog JsonSerDe.A builder forCfnDeliveryStream.HiveJsonSerDePropertyAn implementation forCfnDeliveryStream.HiveJsonSerDePropertyDescribes the metadata that's delivered to the specified HTTP endpoint destination.A builder forCfnDeliveryStream.HttpEndpointCommonAttributePropertyAn implementation forCfnDeliveryStream.HttpEndpointCommonAttributePropertyDescribes the configuration of the HTTP endpoint to which Kinesis Firehose delivers data.A builder forCfnDeliveryStream.HttpEndpointConfigurationPropertyAn implementation forCfnDeliveryStream.HttpEndpointConfigurationPropertyDescribes the configuration of the HTTP endpoint destination.An implementation forCfnDeliveryStream.HttpEndpointDestinationConfigurationPropertyThe configuration of the HTTP endpoint request.An implementation forCfnDeliveryStream.HttpEndpointRequestConfigurationPropertySpecifies the destination configure settings for Apache Iceberg Table.An implementation forCfnDeliveryStream.IcebergDestinationConfigurationPropertySpecifies the deserializer you want to use to convert the format of the input data.A builder forCfnDeliveryStream.InputFormatConfigurationPropertyAn implementation forCfnDeliveryStream.InputFormatConfigurationPropertyTheKinesisStreamSourceConfigurationproperty type specifies the stream and role Amazon Resource Names (ARNs) for a Kinesis stream used as the source for a delivery stream.An implementation forCfnDeliveryStream.KinesisStreamSourceConfigurationPropertyTheKMSEncryptionConfigproperty type specifies the AWS Key Management Service ( AWS KMS) encryption key that Amazon Simple Storage Service (Amazon S3) uses to encrypt data delivered by the Amazon Kinesis Data Firehose (Kinesis Data Firehose) stream.A builder forCfnDeliveryStream.KMSEncryptionConfigPropertyAn implementation forCfnDeliveryStream.KMSEncryptionConfigPropertyThe configuration for the Amazon MSK cluster to be used as the source for a delivery stream.A builder forCfnDeliveryStream.MSKSourceConfigurationPropertyAn implementation forCfnDeliveryStream.MSKSourceConfigurationPropertyThe OpenX SerDe.A builder forCfnDeliveryStream.OpenXJsonSerDePropertyAn implementation forCfnDeliveryStream.OpenXJsonSerDePropertyA serializer to use for converting data to the ORC format before storing it in Amazon S3.A builder forCfnDeliveryStream.OrcSerDePropertyAn implementation forCfnDeliveryStream.OrcSerDePropertySpecifies the serializer that you want Firehose to use to convert the format of your data before it writes it to Amazon S3.A builder forCfnDeliveryStream.OutputFormatConfigurationPropertyAn implementation forCfnDeliveryStream.OutputFormatConfigurationPropertyA serializer to use for converting data to the Parquet format before storing it in Amazon S3.A builder forCfnDeliveryStream.ParquetSerDePropertyAn implementation forCfnDeliveryStream.ParquetSerDePropertyRepresents a single field in aPartitionSpec.A builder forCfnDeliveryStream.PartitionFieldPropertyAn implementation forCfnDeliveryStream.PartitionFieldPropertyRepresents how to produce partition data for a table.A builder forCfnDeliveryStream.PartitionSpecPropertyAn implementation forCfnDeliveryStream.PartitionSpecPropertyTheProcessingConfigurationproperty configures data processing for an Amazon Kinesis Data Firehose delivery stream.A builder forCfnDeliveryStream.ProcessingConfigurationPropertyAn implementation forCfnDeliveryStream.ProcessingConfigurationPropertyTheProcessorParameterproperty specifies a processor parameter in a data processor for an Amazon Kinesis Data Firehose delivery stream.A builder forCfnDeliveryStream.ProcessorParameterPropertyAn implementation forCfnDeliveryStream.ProcessorParameterPropertyTheProcessorproperty specifies a data processor for an Amazon Kinesis Data Firehose delivery stream.A builder forCfnDeliveryStream.ProcessorPropertyAn implementation forCfnDeliveryStream.ProcessorPropertyTheRedshiftDestinationConfigurationproperty type specifies an Amazon Redshift cluster to which Amazon Kinesis Data Firehose (Kinesis Data Firehose) delivers data.An implementation forCfnDeliveryStream.RedshiftDestinationConfigurationPropertyConfigures retry behavior in case Firehose is unable to deliver documents to Amazon Redshift.A builder forCfnDeliveryStream.RedshiftRetryOptionsPropertyAn implementation forCfnDeliveryStream.RedshiftRetryOptionsPropertyDescribes the retry behavior in case Kinesis Data Firehose is unable to deliver data to the specified HTTP endpoint destination, or if it doesn't receive a valid acknowledgment of receipt from the specified HTTP endpoint destination.A builder forCfnDeliveryStream.RetryOptionsPropertyAn implementation forCfnDeliveryStream.RetryOptionsPropertyTheS3DestinationConfigurationproperty type specifies an Amazon Simple Storage Service (Amazon S3) destination to which Amazon Kinesis Data Firehose (Kinesis Data Firehose) delivers data.A builder forCfnDeliveryStream.S3DestinationConfigurationPropertyAn implementation forCfnDeliveryStream.S3DestinationConfigurationPropertySpecifies the schema to which you want Firehose to configure your data before it writes it to Amazon S3.A builder forCfnDeliveryStream.SchemaConfigurationPropertyAn implementation forCfnDeliveryStream.SchemaConfigurationPropertyThe configuration to enable schema evolution.A builder forCfnDeliveryStream.SchemaEvolutionConfigurationPropertyAn implementation forCfnDeliveryStream.SchemaEvolutionConfigurationPropertyThe structure that defines how Firehose accesses the secret.A builder forCfnDeliveryStream.SecretsManagerConfigurationPropertyAn implementation forCfnDeliveryStream.SecretsManagerConfigurationPropertyThe serializer that you want Firehose to use to convert data to the target format before writing it to Amazon S3.A builder forCfnDeliveryStream.SerializerPropertyAn implementation forCfnDeliveryStream.SerializerPropertyDescribes the buffering to perform before delivering data to the Snowflake destination.A builder forCfnDeliveryStream.SnowflakeBufferingHintsPropertyAn implementation forCfnDeliveryStream.SnowflakeBufferingHintsPropertyConfigure Snowflake destination.An implementation forCfnDeliveryStream.SnowflakeDestinationConfigurationPropertySpecify how long Firehose retries sending data to the New Relic HTTP endpoint.A builder forCfnDeliveryStream.SnowflakeRetryOptionsPropertyAn implementation forCfnDeliveryStream.SnowflakeRetryOptionsPropertyOptionally configure a Snowflake role.A builder forCfnDeliveryStream.SnowflakeRoleConfigurationPropertyAn implementation forCfnDeliveryStream.SnowflakeRoleConfigurationPropertyConfigure a Snowflake VPC.A builder forCfnDeliveryStream.SnowflakeVpcConfigurationPropertyAn implementation forCfnDeliveryStream.SnowflakeVpcConfigurationPropertyThe buffering options.A builder forCfnDeliveryStream.SplunkBufferingHintsPropertyAn implementation forCfnDeliveryStream.SplunkBufferingHintsPropertyTheSplunkDestinationConfigurationproperty type specifies the configuration of a destination in Splunk for a Kinesis Data Firehose delivery stream.A builder forCfnDeliveryStream.SplunkDestinationConfigurationPropertyAn implementation forCfnDeliveryStream.SplunkDestinationConfigurationPropertyTheSplunkRetryOptionsproperty type specifies retry behavior in case Kinesis Data Firehose is unable to deliver documents to Splunk or if it doesn't receive an acknowledgment from Splunk.A builder forCfnDeliveryStream.SplunkRetryOptionsPropertyAn implementation forCfnDeliveryStream.SplunkRetryOptionsPropertyThe configuration to enable automatic table creation.A builder forCfnDeliveryStream.TableCreationConfigurationPropertyAn implementation forCfnDeliveryStream.TableCreationConfigurationPropertyThe details of the VPC of the Amazon ES destination.A builder forCfnDeliveryStream.VpcConfigurationPropertyAn implementation forCfnDeliveryStream.VpcConfigurationPropertyProperties for defining aCfnDeliveryStream.A builder forCfnDeliveryStreamPropsAn implementation forCfnDeliveryStreamPropsThe data processor to extract message after decompression of CloudWatch Logs.A fluent builder forCloudWatchLogProcessor.Options for CloudWatchLogProcessor.A builder forCloudWatchLogProcessorOptionsAn implementation forCloudWatchLogProcessorOptionsGeneric properties for defining a delivery stream destination.A builder forCommonDestinationPropsAn implementation forCommonDestinationPropsCommon properties for defining a backup, intermediary, or final S3 destination for a Amazon Data Firehose delivery stream.A builder forCommonDestinationS3PropsAn implementation forCommonDestinationS3PropsPossible compression options Amazon Data Firehose can use to compress data on delivery.Props for specifying data format conversion for Firehose.A builder forDataFormatConversionPropsAn implementation forDataFormatConversionPropsOptions when binding a DataProcessor to a delivery stream destination.A builder forDataProcessorBindOptionsAn implementation forDataProcessorBindOptionsThe full configuration of a data processor.A builder forDataProcessorConfigAn implementation forDataProcessorConfigThe key-value pair that identifies the underlying processor resource.A builder forDataProcessorIdentifierAn implementation forDataProcessorIdentifierConfigure the LambdaFunctionProcessor.A builder forDataProcessorPropsAn implementation forDataProcessorPropsThe data processor to decompress CloudWatch Logs.A fluent builder forDecompressionProcessor.Compression format for DecompressionProcessor.Options for DecompressionProcessor.A builder forDecompressionProcessorOptionsAn implementation forDecompressionProcessorOptionsCreate a Amazon Data Firehose delivery stream.A fluent builder forDeliveryStream.A full specification of a delivery stream that can be used to import it fluently into the CDK application.A builder forDeliveryStreamAttributesAn implementation forDeliveryStreamAttributesCollection of grant methods for a IDeliveryStreamRef.Properties for a new delivery stream.A builder forDeliveryStreamPropsAn implementation forDeliveryStreamPropsOptions when binding a destination to a delivery stream.A builder forDestinationBindOptionsAn implementation forDestinationBindOptionsAn Amazon Data Firehose delivery stream destination configuration.A builder forDestinationConfigAn implementation forDestinationConfigProperties for defining an S3 backup destination.A builder forDestinationS3BackupPropsAn implementation forDestinationS3BackupPropsDisables logging for error logs.Enables logging for error logs with an optional custom CloudWatch log group.This class specifies properties for Hive JSON input format for record format conversion.A fluent builder forHiveJsonInputFormat.Props for Hive JSON input format for data record format conversion.A builder forHiveJsonInputFormatPropsAn implementation forHiveJsonInputFormatPropsA data processor that Amazon Data Firehose will call to transform records before delivering data.Internal default implementation forIDataProcessor.A proxy class which represents a concrete javascript instance of this type.Represents an Amazon Data Firehose delivery stream.Internal default implementation forIDeliveryStream.A proxy class which represents a concrete javascript instance of this type.An Amazon Data Firehose delivery stream destination.Internal default implementation forIDestination.A proxy class which represents a concrete javascript instance of this type.An input format to be used in Firehose record format conversion.Internal default implementation forIInputFormat.A proxy class which represents a concrete javascript instance of this type.Configuration interface for logging errors when data transformation or delivery fails.Internal default implementation forILoggingConfig.A proxy class which represents a concrete javascript instance of this type.Represents possible input formats when performing record data conversion.An output format to be used in Firehose record format conversion.Internal default implementation forIOutputFormat.A proxy class which represents a concrete javascript instance of this type.An interface for defining a source that can be used in an Amazon Data Firehose delivery stream.Internal default implementation forISource.A proxy class which represents a concrete javascript instance of this type.An Amazon Data Firehose delivery stream source.Use an AWS Lambda function to transform records.A fluent builder forLambdaFunctionProcessor.This class specifies properties for OpenX JSON input format for record format conversion.A fluent builder forOpenXJsonInputFormat.Props for OpenX JSON input format for data record format conversion.A builder forOpenXJsonInputFormatPropsAn implementation forOpenXJsonInputFormatPropsPossible compression options available for ORC OutputFormat.The available WriterVersions for ORC output format.This class specifies properties for ORC output format for record format conversion.A fluent builder forOrcOutputFormat.Props for ORC output format for data record format conversion.A builder forOrcOutputFormatPropsAn implementation forOrcOutputFormatPropsRepresents possible output formats when performing record data conversion.Possible compression options available for Parquet OutputFormat.This class specifies properties for Parquet output format for record format conversion.A fluent builder forParquetOutputFormat.Props for Parquet output format for data record format conversion.A builder forParquetOutputFormatPropsAn implementation forParquetOutputFormatPropsThe available WriterVersions for Parquet output format.An S3 bucket destination for data from an Amazon Data Firehose delivery stream.A fluent builder forS3Bucket.Props for defining an S3 destination of an Amazon Data Firehose delivery stream.A builder forS3BucketPropsAn implementation forS3BucketPropsRepresents a schema configuration for Firehose S3 data record format conversion.Options when binding a SchemaConfig to a Destination.A builder forSchemaConfigurationBindOptionsAn implementation forSchemaConfigurationBindOptionsOptions for creating a Schema for record format conversion from aglue.CfnTable.A builder forSchemaConfigurationFromCfnTablePropsAn implementation forSchemaConfigurationFromCfnTablePropsRepresents server-side encryption for an Amazon Firehose Delivery Stream.Options for server-side encryption of a delivery stream.Value class that wraps a Joda Time format string.