Understand custom prefixes for Amazon S3 objects
Objects delivered to Amazon S3 follow the name format of <evaluated prefix><suffix>.
You can specify your custom prefix that includes expressions that are evaluated at runtime. Custom prefix you specify will override the default prefix of yyyy/MM/dd/HH.
You can use expressions of the following forms in your custom prefix:
!{namespace:, where
value}namespace can be one of the following, as explained in the following
sections.
-
firehose -
timestamp -
partitionKeyFromQuery -
partitionKeyFromLambda
If a prefix ends with a slash, it appears as a folder in the Amazon S3 bucket. For more information, see Amazon S3 Object Name Format in the Amazon Data FirehoseDeveloper Guide.
timestamp namespace
Valid values for this namespace are strings that are valid Java DateTimeFormatter!{timestamp:yyyy} evaluates to 2018.
When evaluating timestamps, Firehose uses the approximate arrival timestamp of the oldest record that's contained in the Amazon S3 object being written.
By default, timestamp is in UTC. But, you can specify a time zone that you prefer. For example, you can configure the time zone to Asia/Tokyo in the AWS Management Console or in API parameter setting (CustomTimeZone) if you want to use Japan Standard Time instead of UTC. To see the list of supported time zones, see Amazon S3 Object Name Format.
If you use the timestamp namespace more than once in the same prefix
expression, every instance evaluates to the same instant in time.
firehose namespace
There are two values that you can use with this namespace:
error-output-type and random-string. The following table
explains how to use them.
| Conversion | Description | Example input | Example output | Notes |
|---|---|---|---|---|
error-output-type |
Evaluates to one of the following strings, depending on the
configuration of your Firehose stream, and the reason of failure:
{processing-failed, AmazonOpenSearchService-failed, splunk-failed,
format-conversion-failed, http-endpoint-failed}. If you use it more than once in the same expression, every instance evaluates to the same error string.. |
myPrefix/result=!{firehose:error-output-type}/!{timestamp:yyyy/MM/dd} |
myPrefix/result=processing-failed/2018/08/03 |
The error-output-type value can only be used in the ErrorOutputPrefix field. |
random-string |
Evaluates to a random string of 11 characters. If you use it more than once in the same expression, every instance evaluates to a new random string. |
myPrefix/!{firehose:random-string}/ |
myPrefix/046b6c7f-0b/ |
You can use it with both prefix types. You can place it at the beginning of the format string to get a randomized prefix, which is sometimes necessary for attaining extremely high throughput with Amazon S3. |
partitionKeyFromLambda and partitionKeyFromQuery
namespaces
For dynamic partitioning, you must use the following expression format in your
S3 bucket prefix: !{namespace:value}, where namespace can be either
partitionKeyFromQuery or partitionKeyFromLambda, or both.
If you are using inline parsing to create the partitioning keys for your source data,
you must specify an S3 bucket prefix value that consists of expressions specified in the
following format: "partitionKeyFromQuery:keyID". If you are using an AWS
Lambda function to create partitioning keys for your source data, you must specify an S3
bucket prefix value that consists of expressions specified in the following format:
"partitionKeyFromLambda:keyID". For more information, see the "Choose
Amazon S3 for Your Destination" in Creating an Amazon Firehose stream.
Semantic rules
The following rules apply to Prefix and ErrorOutputPrefix
expressions.
-
For the
timestampnamespace, any character that isn't in single quotes is evaluated. In other words, any string escaped with single quotes in the value field is taken literally. -
If you specify a prefix that doesn't contain a timestamp namespace expression, Firehose appends the expression
!{timestamp:yyyy/MM/dd/HH/}to the value in thePrefixfield. -
The sequence
!{can only appear in!{namespace:expressions.value} -
ErrorOutputPrefixcan be null only ifPrefixcontains no expressions. In this case,Prefixevaluates to<specified-prefix>yyyy/MM/DDD/HH/andErrorOutputPrefixevaluates to<specified-prefix><error-output-type>yyyy/MM/DDD/HH/.DDDrepresents the day of the year. -
If you specify an expression for
ErrorOutputPrefix, you must include at least one instance of!{firehose:error-output-type}. -
Prefixcan't contain!{firehose:error-output-type}. -
Neither
PrefixnorErrorOutputPrefixcan be greater than 512 characters after they're evaluated. -
If the destination is Amazon Redshift,
Prefixmust not contain expressions andErrorOutputPrefixmust be null. -
When the destination is Amazon OpenSearch Service or Splunk, and no
ErrorOutputPrefixis specified, Firehose uses thePrefixfield for failed records. -
When the destination is Amazon S3, the
PrefixandErrorOutputPrefixin the Amazon S3 destination configuration are used for successful records and failed records, respectively. If you use the AWS CLI or the API, you can useExtendedS3DestinationConfigurationto specify an Amazon S3 backup configuration with its ownPrefixandErrorOutputPrefix. -
When you use the AWS Management Console and set the destination to Amazon S3, Firehose uses the
PrefixandErrorOutputPrefixin the destination configuration for successful records and failed records, respectively. If you specify a prefix using expressions, you must specify the error prefix including!{firehose:error-output-type}. -
When you use
ExtendedS3DestinationConfigurationwith the AWS CLI, the API, or AWS CloudFormation, if you specify aS3BackupConfiguration, Firehose doesn't provide a defaultErrorOutputPrefix. -
You cannot use
partitionKeyFromLambdaandpartitionKeyFromQuerynamespaces when creating ErrorOutputPrefix expressions.
Example prefixes
| Input | Evaluated prefix (at 10:30 AM UTC on Aug 27, 2018) |
|---|---|
|
|
|
|
|
Invalid input: ErrorOutputPrefix can't be null when
Prefix contains expressions |
|
|
|
|
|
|
|
|