Configurable parser-type processors
This section contains information about the configurable data parser processors that you can use in a log event transformer.
Contents
parseJSON
The parseJSON processor parses JSON log events and
inserts extracted JSON key-value pairs under the destination. If you don't
specify a destination, the processor places the key-value pair under the root
node. When using parseJSON as the first processor, you must parse
the entire log event using @message as the source field. After the
initial JSON parsing, you can then manipulate specific fields in subsequent
processors.
The original @message content is not changed, the new keys are
added to the message.
| Field | Description | Required? | Default | Limits |
|---|---|---|---|---|
|
source |
Path to the field in the log event that will be parsed. Use
dot notation to access child fields. For example,
store.book |
No |
|
Maximum length: 128 Maximum nested key depth: 3 |
|
destination |
The destination field of the parsed JSON |
No |
|
Maximum length: 128 Maximum nested key depth: 3 |
Example
Suppose an ingested log event looks like this:
{ "outer_key": { "inner_key": "inner_value" } }
Then if we have this parseJSON processor:
[ { "parseJSON": { "destination": "new_key" } } ]
The transformed log event would be the following.
{ "new_key": { "outer_key": { "inner_key": "inner_value" } } }
grok
Use the grok processor to parse and structure unstructured data using pattern matching. This processor can also extract fields from log messages.
| Field | Description | Required? | Default | Limits | Notes |
|---|---|---|---|---|---|
|
source |
Path of the field to apply Grok matching on |
No |
|
Maximum length: 128 Maximum nested key depth: 3 |
|
|
match |
The grok pattern to match against the log event |
Yes |
Maximum length: 512 Maximum grok patterns: 20 Some grok pattern types have individual usage limits. Any combination of the following patterns can be used as many as five times: {URI, URIPARAM, URIPATHPARAM, SPACE, DATA, GREEDYDATA, GREEDYDATA_MULTILINE} Grok patterns don't support type conversions. For common log format patterns (APACHE_ACCESS_LOG, NGINX_ACCESS_LOG, SYSLOG5424), only DATA, GREEDYDATA, or GREEDYDATA_MULTILINE patterns are supported to be included after the common log pattern. |
Structure of a Grok Pattern
This is the supported grok pattern structure:
%{PATTERN_NAME:FIELD_NAME}
-
PATTERN_NAME: Refers to a pre-defined regular expression for matching a specific type of data. Only predefined grok patterns are supported. Creating custom patterns is not allowed.
-
FIELD_NAME: Assigns a name to the extracted value.
FIELD_NAMEis optional, but if you don't specify this value then the extracted data will be dropped from the transformed log event. IfFIELD_NAMEuses dotted notation (e.g., "parent.child"), it is considered as a JSON path. -
Type Conversion: Explicit type conversions are not be supported. Use TypeConverter processor to convert the datatype of any value extracted by grok.
To create more complex matching expressions, you can combine several grok
patterns. As many as 20 grok patterns can be combined to match a log event. For
example, this combination of patterns %{NUMBER:timestamp} [%{NUMBER:db}
%{IP:client_ip}:%{NUMBER:client_port}] %{GREEDYDATA:data} can be used
to extract fields from a Redis slow log entry like this:
1629860738.123456 [0 127.0.0.1:6379] "SET" "key1" "value1"
Grok examples
Example 1: Use grok to extract a field from unstructured logs
Sample log:
293750 server-01.internal-network.local OK "[Thread-000] token generated"
Transformer used:
[ { "grok": { "match": "%{NUMBER:version} %{HOSTNAME:hostname} %{NOTSPACE:status} %{QUOTEDSTRING:logMsg}" } } ]
Output:
{ "version": "293750", "hostname": "server-01.internal-network.local", "status": "OK", "logMsg": "[Thread-000] token generated" }
Sample log:
23/Nov/2024:10:25:15 -0900 172.16.0.1 200
Transformer used:
[ { "grok": { "match": "%{HTTPDATE:timestamp} %{IPORHOST:clientip} %{NUMBER:response_status}" } } ]
Output:
{ "timestamp": "23/Nov/2024:10:25:15 -0900", "clientip": "172.16.0.1", "response_status": "200" }
Example 2: Use grok in combination with parseJSON to extract fields from a JSON log event
Sample log:
{ "timestamp": "2024-11-23T16:03:12Z", "level": "ERROR", "logMsg": "GET /page.html HTTP/1.1" }
Transformer used:
[ { "parseJSON": {} }, { "grok": { "source": "logMsg", "match": "%{WORD:http_method} %{NOTSPACE:request} HTTP/%{NUMBER:http_version}" } } ]
Output:
{ "timestamp": "2024-11-23T16:03:12Z", "level": "ERROR", "logMsg": "GET /page.html HTTP/1.1", "http_method": "GET", "request": "/page.html", "http_version": "1.1" }
Example 3: Grok pattern with dotted annotation in FIELD_NAME
Sample log:
192.168.1.1 GET /index.html?param=value 200 1234
Transformer used:
[ { "grok": { "match": "%{IP:client.ip} %{WORD:method} %{URIPATHPARAM:request.uri} %{NUMBER:response.status} %{NUMBER:response.bytes}" } } ]
Output:
{ "client": { "ip": "192.168.1.1" }, "method": "GET", "request": { "uri": "/index.html?param=value" }, "response": { "status": "200", "bytes": "1234" } }
Supported grok patterns
The following tables list the patterns that are supported by the
grok processor.
General grok patterns
| Grok Pattern | Description | Maximum pattern limit | Example |
|---|---|---|---|
| USERNAME or USER | Matches one or more characters that can include lowercase letters (a-z), uppercase letters (A-Z), digits (0-9), dots (.), underscores (_), or hyphens (-) | 20 |
Input: Pattern: Output: |
| INT | Matches an optional plus or minus sign followed by one or more digits. | 20 |
Input: Pattern: Output: |
| BASE10NUM | Matches an integer or a floating-point number with optional sign and decimal point | 20 |
Input: Pattern: Output: |
| BASE16NUM | Matches decimal and hexadecimal numbers with an optional sign (+ or -) and an optional 0x prefix | 20 |
Input: Pattern: Output: |
| POSINT | Matches whole positive integers without leading zeros, consisting of one or more digits (1-9 followed by 0-9) | 20 |
Input: Pattern: Output: |
| NONNEGINT | Matches any whole numbers (consisting of one or more digits 0-9) including zero and numbers with leading zeros. | 20 |
Input: Pattern: Output: |
| WORD | Matches whole words composed of one or more word characters (\w), including letters, digits, and underscores | 20 |
Input: Pattern: Output: |
| NOTSPACE | Matches one or more non-whitespace characters. | 5 |
Input: Pattern: Output: |
| SPACE | Matches zero or more whitespace characters. | 5 |
Input: Pattern: Output: |
| DATA | Matches any character (except newline) zero or more times, non-greedy. | 5 |
Input: Pattern: Output: |
| GREEDYDATA | Matches any character (except newline) zero or more times, greedy. | 5 |
Input: Pattern: Output: |
| GREEDYDATA_MULTILINE | Matches any character (including newline) zero or more times, greedy. | 1 |
Input:
Pattern:
Output: |
| QUOTEDSTRING | Matches quoted strings (single or double quotes) with escaped characters. | 20 |
Input: Pattern: Output: |
| UUID | Matches a standard UUID format: 8 hexadecimal characters, followed by three groups of 4 hexadecimal characters, and ending with 12 hexadecimal characters, all separated by hyphens. | 20 |
Input:
Pattern: Output: |
| URN | Matches URN (Uniform Resource Name) syntax. | 20 |
Input: Pattern: Output: |
AWS grok patterns
| Pattern | Description | Maximum pattern limit | Example |
|---|---|---|---|
|
ARN |
Matches AWS Amazon Resource Names (ARNs), capturing
the partition ( |
5 |
Input:
Pattern: Output: |
Networking grok patterns
| Grok Pattern | Description | Maximum pattern limit | Example |
|---|---|---|---|
| CISCOMAC | Matches a MAC address in 4-4-4 hexadecimal format. | 20 |
Input: Pattern: Output: |
| WINDOWSMAC | Matches a MAC address in hexadecimal format with hyphens | 20 |
Input: Pattern: Output: |
| COMMONMAC | Matches a MAC address in hexadecimal format with colons. | 20 |
Input: Pattern: Output: |
| MAC | Matches one of CISCOMAC, WINDOWSMAC or COMMONMAC grok patterns | 20 |
Input: Pattern: Output: |
| IPV6 | Matches IPv6 addresses, including compressed forms and IPv4-mapped IPv6 addresses. | 5 |
Input:
Pattern: Output: |
| IPV4 | Matches an IPv4 address. | 20 |
Input: Pattern: Output: |
| IP | Matches either IPv6 addresses as supported by %{IPv6} or IPv4 addresses as supported by %{IPv4} | 5 |
Input: Pattern: Output: |
| HOSTNAME or HOST | Matches domain names, including subdomains | 5 |
Input:
Pattern: Output: |
| IPORHOST | Matches either a hostname or an IP address | 5 |
Input:
Pattern: Output: |
| HOSTPORT | Matches an IP address or hostname as supported by %{IPORHOST} pattern followed by a colon and a port number, capturing the port as "PORT" in the output. | 5 |
Input: Pattern: Output:
|
| URIHOST | Matches an IP address or hostname as supported by %{IPORHOST} pattern, optionally followed by a colon and a port number, capturing the port as "port" if present. | 5 |
Input: Pattern: Output:
|
Path grok patterns
| Grok Pattern | Description | Maximum pattern limit | Example |
|---|---|---|---|
| UNIXPATH | Matches URL paths, potentially including query parameters. | 20 |
Input: Pattern: Output: |
| WINPATH | Matches Windows file paths. | 5 |
Input:
Pattern: Output: |
| PATH | Matches either URL or Windows file paths | 5 |
Input: Pattern: Output: |
| TTY | Matches Unix device paths for terminals and pseudo-terminals. | 20 |
Input: Pattern: Output: |
| URIPROTO | Matches letters, optionally followed by a plus (+) character and additional letters or plus (+) characters | 20 |
Input: Pattern: Output:
|
| URIPATH | Matches the path component of a URI | 20 |
Input:
Pattern: Output:
|
| URIPARAM | Matches URL query parameters | 5 |
Input:
Pattern: Output:
|
| URIPATHPARAM | Matches a URI path optionally followed by query parameters | 5 |
Input:
Pattern: Output:
|
| URI | Matches a complete URI | 5 |
Input:
Pattern: Output:
|
Date and time grok patterns
| Grok Pattern | Description | Maximum pattern limit | Example |
|---|---|---|---|
| MONTH | Matches full or abbreviated english month names as whole words | 20 |
Input: Pattern: Output: Input: Pattern: Output: |
| MONTHNUM | Matches month numbers from 1 to 12, with optional leading zero for single-digit months. | 20 |
Input: Pattern: Output: Input: Pattern: Output: |
| MONTHNUM2 | Matches two-digit month numbers from 01 to 12. | 20 |
Input: Pattern: Output: |
| MONTHDAY | Matches day of the month from 1 to 31, with optional leading zero. | 20 |
Input: Pattern: Output: |
| YEAR | Matches year in two or four digits | 20 |
Input: Pattern: Output: Input: Pattern: Output: |
| DAY | Matches full or abbreviated day names. | 20 |
Input: Pattern: Output: |
| HOUR | Matches hour in 24-hour format with an optional leading zero (0)0-23. | 20 |
Input: Pattern: Output: |
| MINUTE | Matches minutes (00-59). | 20 |
Input: Pattern: Output: |
| SECOND | Matches a number representing seconds (0)0-60, optionally followed by a decimal point or colon and one or more digits for fractional minutes | 20 |
Input: Pattern: Output: Input: Pattern: Output: Input: Pattern: Output: |
| TIME | Matches a time format with hours, minutes, and seconds in the format (H)H:mm:(s)s. Seconds include leap second (0)0-60. | 20 |
Input: Pattern: Output: |
| DATE_US | Matches a date in the format of (M)M/(d)d/(yy)yy or (M)M-(d)d-(yy)yy. | 20 |
Input: Pattern: Output: Input: Pattern: Output: |
| DATE_EU | Matches date in format of (d)d/(M)M/(yy)yy, (d)d-(M)M-(yy)yy, or (d)d.(M)M.(yy)yy. | 20 |
Input: Pattern: Output: Input: Pattern: Output: |
| ISO8601_TIMEZONE | Matches UTC offset 'Z' or time zone offset with optional colon in format [+-](H)H(:)mm. | 20 |
Input: Pattern: Output: Input: Pattern: Output: Input: Pattern: Output: |
| ISO8601_SECOND | Matches a number representing seconds (0)0-60, optionally followed by a decimal point or colon and one or more digits for fractional seconds | 20 |
Input: Pattern: Output: |
| TIMESTAMP_ISO8601 | Matches ISO8601 datetime format (yy)yy-(M)M-(d)dT(H)H:mm:((s)s)(Z|[+-](H)H:mm) with optional seconds and timezone. | 20 |
Input: Pattern:
Output:
Input: Pattern:
Output:
Input: Pattern:
Output:
|
| DATE | Matches either a date in the US format using %{DATE_US} or in the EU format using %{DATE_EU} | 20 |
Input: Pattern: Output: Input: Pattern: Output: |
| DATESTAMP | Matches %{DATE} followed by %{TIME} pattern, separated by space or hyphen. | 20 |
Input: Pattern: Output: |
| TZ | Matches common time zone abbreviations (PST, PDT, MST, MDT, CST CDT, EST, EDT, UTC). | 20 |
Input: Pattern: Output: |
| DATESTAMP_RFC822 | Matches date and time in format: Day MonthName (D)D (YY)YY (H)H:mm:(s)s Timezone | 20 |
Input: Pattern:
Output: Input: Pattern:
Output: |
| DATESTAMP_RFC2822 | Matches RFC2822 date-time format: Day, (d)d MonthName (yy)yy (H)H:mm:(s)s Z|[+-](H)H:mm | 20 |
Input: Pattern:
Output: Input: Pattern:
Output: |
| DATESTAMP_OTHER | Matches date and time in format: Day MonthName (d)d (H)H:mm:(s)s Timezone (yy)yy | 20 |
Input: Pattern:
Output: |
| DATESTAMP_EVENTLOG | Matches compact datetime format without separators: (yy)yyMM(d)d(H)Hmm(s)s | 20 |
Input: Pattern:
Output:
|
Log grok patterns
| Grok Pattern | Description | Maximum pattern limit | Example |
|---|---|---|---|
| LOGLEVEL | Matches standard log levels in different capitalizations
and abbreviations, including the following:
Alert/ALERT, Trace/TRACE,
Debug/DEBUG, Notice/NOTICE,
Info/INFO,
Warn/Warning/WARN/WARNING,
Err/Error/ERR/ERROR,
Crit/Critical/CRIT/CRITICAL,
Fatal/FATAL, Severe/SEVERE,
Emerg/Emergency/EMERG/EMERGENCY |
20 |
Input: Pattern: Output: |
| HTTPDATE | Matches date and time format often used in log files. Format: (d)d/MonthName/(yy)yy:(H)H:mm:(s)s Timezone MonthName: Matches full or abbreviated english month names (Example: "Jan" or "January") Timezone: Matches %{INT} grok pattern | 20 |
Input: Pattern: Output: |
| SYSLOGTIMESTAMP | Matches date format with MonthName (d)d (H)H:mm:(s)s MonthName: Matches full or abbreviated english month names (Example: "Jan" or "January") | 20 |
Input: Pattern:
Output: |
| PROG | Matches a program name consisting of string of letters, digits, dot, underscore, forward slash, percent sign, and hyphen characters. | 20 |
Input: Pattern: Output:
|
| SYSLOGPROG | Matches PROG grok pattern optionally followed by a process ID in square brackets. | 20 |
Input:
Pattern:
Output:
|
| SYSLOGHOST | Matches either a %{HOST} or %{IP} pattern | 5 |
Input:
Pattern: Output: |
| SYSLOGFACILITY | Matches syslog priority in decimal format. The value should be enclosed in angular brackets (<>). | 20 |
Input: Pattern: Output:
|
Common log grok patterns
You can use pre-defined custom grok patterns to match Apache, NGINX and Syslog Protocol (RFC 5424) log formats. When you use these specific patterns, they must be the first patterns in your matching configuration, and no other patterns can precede them. Also, you can follow them only with exactly one DATA. GREEDYDATA or GREEDYDATA_MULTILINE pattern.
| Grok pattern | Description | Maximum pattern limit |
|---|---|---|
|
APACHE_ACCESS_LOG |
Matches Apache access logs |
1 |
|
NGINX_ACCESS_LOG |
Matches NGINX access logs |
1 |
|
SYSLOG5424 |
Matches Syslog Protocol (RFC 5424) logs |
1 |
The following shows valid and invalid examples for using these common log format patterns.
"%{NGINX_ACCESS_LOG} %{DATA}" // Valid "%{SYSLOG5424}%{DATA:logMsg}" // Valid "%{APACHE_ACCESS_LOG} %{GREEDYDATA:logMsg}" // Valid "%{APACHE_ACCESS_LOG} %{SYSLOG5424}" // Invalid (multiple common log patterns used) "%{NGINX_ACCESS_LOG} %{NUMBER:num}" // Invalid (Only GREEDYDATA and DATA patterns are supported with common log patterns) "%{GREEDYDATA:logMsg} %{SYSLOG5424}" // Invalid (GREEDYDATA and DATA patterns are supported only after common log patterns)
Common log format examples
Apache log example
Sample log:
127.0.0.1 - - [03/Aug/2023:12:34:56 +0000] "GET /page.html HTTP/1.1" 200 1234
Transformer:
[ { "grok": { "match": "%{APACHE_ACCESS_LOG}" } } ]
Output:
{ "request": "/page.html", "http_method": "GET", "status_code": 200, "http_version": "1.1", "response_size": 1234, "remote_host": "127.0.0.1", "timestamp": "2023-08-03T12:34:56Z" }
NGINX log example
Sample log:
192.168.1.100 - Foo [03/Aug/2023:12:34:56 +0000] "GET /account/login.html HTTP/1.1" 200 42 "https://www.amazon.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36"
Transformer:
[ { "grok": { "match": "%{NGINX_ACCESS_LOG}" } } ]
Output:
{ "request": "/account/login.html", "referrer": "https://www.amazon.com/", "agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36", "http_method": "GET", "status_code": 200, "auth_user": "Foo", "http_version": "1.1", "response_size": 42, "remote_host": "192.168.1.100", "timestamp": "2023-08-03T12:34:56Z" }
Syslog Protocol (RFC 5424) log example
Sample log:
<165>1 2003-10-11T22:14:15.003Z mymachine.example.com evntslog - ID47 [exampleSDID@32473 iut="3" eventSource= "Application" eventID="1011"][examplePriority@32473 class="high"]
Transformer:
[ { "grok": { "match": "%{SYSLOG5424}" } } ]
Output:
{ "pri": 165, "version": 1, "timestamp": "2003-10-11T22:14:15.003Z", "hostname": "mymachine.example.com", "app": "evntslog", "msg_id": "ID47", "structured_data": "exampleSDID@32473 iut=\"3\" eventSource= \"Application\" eventID=\"1011\"", "message": "[examplePriority@32473 class=\"high\"]" }
csv
The csv processor parses comma-separated values (CSV) from the log events into columns.
| Field | Description | Required? | Default | Limits |
|---|---|---|---|---|
|
source |
Path to the field in the log event that will be parsed |
No |
|
Maximum length: 128 Maximum nested key depth: 3 |
|
delimiter |
The character used to separate each column in the original comma-separated value log event |
No |
|
Maximum length: 1 unless the value is |
|
quoteCharacter |
Character used as a text qualifier for a single column of data |
No |
|
Maximum length: 1 |
|
columns |
List of names to use for the columns in the transformed log event. |
No |
|
Maximum CSV columns: 100 Maximum length: 128 Maximum nested key depth: 3 |
Setting delimiter to \t will separate each column on
a tab character, and \t will separate each column on a single space
character.
Example
Suppose part of an ingested log event looks like this:
'Akua Mansa':28:'New York: USA'
Suppose we use only the csv processor:
[ { "csv": { "delimiter": ":", "quoteCharacter": "'" } } ]
The transformed log event would be the following.
{ "column_1": "Akua Mansa", "column_2": "28", "column_3": "New York: USA" }
parseKeyValue
Use the parseKeyValue processor to parse a specified field into key-value pairs. You can customize the processor to parse field information with the following options.
| Field | Description | Required? | Default | Limits |
|---|---|---|---|---|
|
source |
Path to the field in the log event that will be parsed |
No |
|
Maximum length: 128 Maximum nested key depth: 3 |
|
destination |
The destination field to put the extracted key-value pairs into |
No |
Maximum length: 128 |
|
|
fieldDelimiter |
The field delimiter string that is used between key-value pairs in the original log events |
No |
|
Maximum length: 128 |
|
keyValueDelimiter |
The delimiter string to use between the key and value in each pair in the transformed log event |
No |
|
Maximum length: 128 |
|
nonMatchValue |
A value to insert into the value field in the result, when a key-value pair is not successfully split. |
No |
Maximum length: 128 |
|
|
keyPrefix |
If you want to add a prefix toall transformed keys, specify it here. |
No |
Maximum length: 128 |
|
|
overwriteIfExists |
Whether to overwrite the value if the destination key already exists |
No |
|
Example
Take the following example log event:
key1:value1!key2:value2!key3:value3!key4
Suppose we use the following processor configuration:
[ { "parseKeyValue": { "destination": "new_key", "fieldDelimiter": "!", "keyValueDelimiter": ":", "nonMatchValue": "defaultValue", "keyPrefix": "parsed_" } } ]
The transformed log event would be the following.
{ "new_key": { "parsed_key1": "value1", "parsed_key2": "value2", "parsed_key3": "value3", "parsed_key4": "defaultValue" } }