Getting started with Bloom filters
ElastiCache supports the Bloom filter data structure, which provides a space efficient probabilistic data structure to check if an element is a member of a set. When using Bloom filters, false positives are possible—a filter can incorrectly indicate that an element exists, even though that element was not added to the set. However, using Bloom filters will prevent false negatives—incorrect indications that an element does not exist, even though that element was added to the set.
You can set the percentage of potential false positives to a preferred rate for your workload, by adjusting the fp rate. You can also configure capacity (the number of items a Bloom filter can hold), scaling and non-scaling properties, and more.
After you create a cluster with a supported engine version, the Bloom data type and associated commands are automatically available. The bloom
data type is API compatible with the Bloom filter command syntax of the official Valkey client libraries including valkey-py
, valkey-java
, and valkey-go
. You can easily migrate existing Bloom-based Valkey and Redis OSS applications into ElastiCache. For a complete list of commands see Bloom filter commands.
The Bloom-related metrics BloomFilterBasedCmds
, BloomFilterBasedCmdsLatency
, and BloomFilterBasedCmdsECPUs
are incorporated into CloudWatch to monitor the usage of this data type. For more information, see Metrics for Valkey and Redis OSS.
Note
To use Bloom filters, you must be running on ElastiCache Valkey 8.1 and later.
The bloom data type is not RDB compatible with other non-Valkey based bloom offerings.
Bloom filters data type overview
Bloom filters are a space efficient probabilistic data structure that allows adding elements and checking whether elements exist. False positives are possible where a filter incorrectly indicates that an element exists, even though it was not added. However, Bloom Filters guarantee that false negatives (incorrectly indicating that an element does not exist, even though it was added) do not occur.
The main source of documentation for bloom filters can be found on the valkey.io documentation page. This contains the following information:
Common use cases for bloom filters
Advertisement / Event deduplication
Fraud detection
Filtering harmful content / spam
Unique user detection
Differences between scaling and non scaling bloom filters
How to decide between scaling and non scaling bloom filters
-
Learn about the tunable properties of Bloom filters. This includes the false positive rate, capacity, scaling and non scaling properties, and more.
-
-
Recommendations and details on how to check if a bloom filter is reaching its memory usage limit, and if it can scale to reach the desired capacity.
-
You can specifically check the amount of memory consumed by a bloom filter document through using the BF.INFO
command.
-
Bloom size limit
The consumption of memory by a single Bloom filter object is limited to 128 MB. You can check the amount of memory consumed by a Bloom filter by using the BF.INFO <key> SIZE
command.
Bloom ACLs
Similar to the existing per-datatype categories (@string, @hash, etc.) a new category @bloom is added to simplify managing access to Bloom commands and data. No other existing Valkey or Redis OSS commands are members of the @bloom category.
There are 3 existing ACL categories that are updated to include the new Bloom commands: @read, @write and @fast. The following table indicates the mapping of Bloom commands to the appropriate categories.
Bloom command | @read | @write | @fast | @bloom |
---|---|---|---|---|
BF.ADD |
y |
y |
y |
|
BF.CARD |
y |
y |
y |
|
BF.EXISTS |
y |
y |
y |
|
BF.INFO |
y |
y |
y |
|
BF.INSERT |
y |
y |
y |
|
BF.MADD |
y |
y |
y |
|
BF.MEXISTS |
y |
y |
y |
|
BF.RESERVE |
y |
y |
y |
Bloom filter related metrics
The following CloudWatch metrics related to bloom data structures are provided:
CW Metrics | Unit | Serverless/Self-designed | Description |
---|---|---|---|
BloomFilterBasedCmds |
Count |
Both |
The total number of Bloom filter commands, including both read and write commands. |
BloomFilterBasedCmdsLatency |
Microseconds |
Self-managed |
Latency of all Bloom filter commands, including both read and write commands. |
BloomFilterBasedCmdsECPUs |
Count |
Serverless |
ECPUs consumed by all Bloom filter commands, including both read and write commands. |
Bloom filter commands
Bloom Filter commands
Name | Description |
---|---|
BF.ADD |
Adds a single item to a bloom filter.If the filter doesn't already exist, it is created. |
BF.CARD |
Returns the cardinality of a bloom filter. |
BF.EXISTS |
Determines if the bloom filter contains the specified item. |
BF.INFO |
Returns usage information and properties of a specific bloom filter. |
BF.INSERT |
Creates a bloom filter with 0 or more itemes, or adds items to an existing bloom filter. |
BF.MADD |
Adds one or more items to a bloom filter. |
BF.MEXISTS |
Determines if the bloom filter contains 1 or more items. |
BF.RESERVE |
Creates an empty bloom filter with the specified properties. |
Note
BF.LOAD is not supported by ElastiCache. It is only relevant for AOF usage, which ElastiCache does not support.