# Techniques for bot control
<a name="techniques"></a>

The main goal of bot mitigation is limiting the negative impact of automated bot activity on an organization's web sites, services, and applications. The technology and techniques used depend on the type of traffic or activity you want to defend against. Understanding the application and its traffic is key to accomplishing this. For more information on where to start, see the [Guidelines for monitoring your bot control strategy](monitoring.md) section in this guide.

In general, the controls that bot mitigation solutions provide can be grouped into the following high-level categories: static, client identification, and advanced analysis. The following figure shows the different techniques available and how they can be used depending on the bot activity complexity. This highlights how the base, or the broadest mitigation, can be obtained through the use of static controls, such as allow listing and intrinsic checks. The smallest portion of bots is always the most advanced, and mitigating against these bots requires more advanced technology and a combination of controls.


![\[As bot complexity increases, so must the complexity and sophistication of the mitigation techniques.\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/bot-control/images/bot-mitigation-techniques.png)


Next, this guide explores each category and its techniques. It also describes the options that are available in [AWS WAF](https://docs.aws.amazon.com/waf/latest/developerguide/waf-chapter.html) to implement these controls:
+ [Static controls for managing bots](static-controls.md)
+ [Client identification controls for managing bots](client-identification-controls.md)
+ [Advanced analysis controls for managing bots](advanced-analysis-controls.md)

# Static controls for managing bots
<a name="static-controls"></a>

To take an action,* static controls* evaluate static information from the HTTP(S) request, such as its IP address or its headers. These controls can be useful for low-sophistication bad bot activity or for expected beneficial bot traffic that needs to be verified and managed. Static control techniques include: allow listing, IP-based controls, and intrinsic checks.

## Allow listing
<a name="allow-listing"></a>

Allow listing is a control that allows identified friendly traffic through existing bot mitigation controls. There are a variety of ways of accomplishing this. The simplest is to use a rule that [matches a set of IP addresses](https://docs.aws.amazon.com/waf/latest/developerguide/waf-rule-statement-forwarded-ip-address.html) or a similar match condition. When a request matches a rule that is set to an `Allow` action, it is not evaluated by subsequent rules. In some cases, you need to prevent only certain rules from being acted on; in other words, you need to allow list for one rule but not all rules. This is a common scenario for handling false positives for rules. Allow listing is considered a broad-scope rule. To reduce the potential for false negatives, we recommend that you pair it with another option that is more granular, such as a path or header match.

## IP-based controls
<a name="ip-based-controls"></a>

### Single IP address blocks
<a name="ip-address-blocks"></a>

A commonly used tool to mitigate the impact of bots is to limit requests from a single requestor. The simplest example is to block the source IP address of the traffic if its requests are malicious or high in volume. This uses AWS WAF [IP set match rules](https://docs.aws.amazon.com/waf/latest/developerguide/waf-rule-statement-type-ipset-match.html) to implement IP-based blocks. These rules match on IP addresses and apply an action of `Block`, `Challenge`, or `CAPTCHA`. You can determine when too many requests are coming in from an IP address by looking at Content Delivery Network (CDN), a web application firewall, or application and service logs. However, in most cases, this control is impractical without automation.

Automating IP address block lists in AWS WAF is commonly done with rate-based rules. For more information, see [Rate-based rules](#rate-based-rules) in this guide. You can also implement the [Security Automations for AWS WAF](https://docs.aws.amazon.com/solutions/latest/security-automations-for-aws-waf/welcome.html) solution. This solution automatically updates a list of IP addresses to block, and an AWS WAF rule denies requests that match those IP addresses.

One way to recognize a bot attack is if a multitude of requests from the same IP address focus on a small number of web pages. This indicates that the bot is price scrapping or repeatedly attempting logins that fail at a high percentage. You can create automations that immediately recognize this pattern. The automations block the IP address, which reduces the efficacy of the attack by quickly identifying and mitigating it. Blocking specific IP addresses is less effective when an attacker has a large collection of IP addresses to launch attacks from or when the attacking behavior is difficult to recognize and separate from normal traffic. 

### IP address reputation
<a name="ip-address-reputation"></a>

An *IP reputation service* provides intelligence that helps evaluate the trustworthiness of an IP address. This intelligence is commonly derived by aggregating IP-related information from past activity from that IP address. Prior activity helps indicate how likely an IP address is to generate malicious requests. The data is added to managed lists that track the IP address behavior.

Anonymous IP addresses are a specialized case of IP address reputation. The source IP address originates from known sources of easily acquired IP addresses, such as cloud-based virtual machines, or from proxies, such as known VPN providers or Tor nodes. The AWS WAF [Amazon IP reputation list](https://docs.aws.amazon.com/waf/latest/developerguide/aws-managed-rule-groups-ip-rep.html#aws-managed-rule-groups-ip-rep-amazon) and [Anonymous IP list](https://docs.aws.amazon.com/waf/latest/developerguide/aws-managed-rule-groups-ip-rep.html#aws-managed-rule-groups-ip-rep-anonymous) managed rule groups use Amazon internal threat intelligence to help identify these IP addresses.

The intelligence provided by these managed lists can help you act on activity identified from these sources. Based on this intelligence, you can create rules that directly block traffic or rules that limit the number of requests (such as rate-based rules). You can also use this intelligence to evaluate the source of the traffic by using the rules in `COUNT` mode. This examines the match criteria and applies labels that you can use to create custom rules.

### Rate-based rules
<a name="rate-based-rules"></a>

Rate-based rules can be a valuable tool for certain scenarios. For example, rate-based rules are effective when bot traffic reaches high volumes compared to users in sensitive uniform resource identifiers (URIs) or when the traffic volume begins to affect normal operations. Rate limiting can keep requests at manageable levels and limit and control access. AWS WAF can implement rate-limiting rule in a [web access control list (web ACL)](https://docs.aws.amazon.com/waf/latest/developerguide/web-acl.html) by using a [rate-based rule statement](https://docs.aws.amazon.com/waf/latest/developerguide/waf-rule-statement-type-rate-based.html). A recommended approach when using rate-based rules is to include a blanket rule that covers the whole site, URI-specific rules, and IP reputation rate-based rules. IP reputation rate-based rules combines the intelligence of IP address reputation with rate-limiting functionality.

For the whole site, a blanket IP reputation rate-based rule creates a ceiling that prevents unsophisticated bots from flooding a site from a small number of IPs. Rate limiting is especially recommended for protecting URIs that have high cost or impact, such as login or account-creation pages.

Rate-limiting rules can provide a cost-efficient first layer of defense. You can use more advanced rules to protect sensitive URIs. URI-specific rate-based rules can limit the impact on critical pages or on APIs that affect the backend, such as database access. Advanced mitigations to protect certain URIs, which are discussed later in this guide, often incur additional costs, and these URI-specific rate-based rules can help you control costs. For more information about commonly recommended rate-based rules, see [The three most important AWS WAF rate-based rules](https://aws.amazon.com/blogs/security/three-most-important-aws-waf-rate-based-rules/) in the AWS Security Blog. In some situations, it is useful to limit what type of request is evaluated by a rate-based rule. You can use [scope-down statements](https://docs.aws.amazon.com/waf/latest/developerguide/waf-rule-scope-down-statements.html) to, for example, limit rate-based rules by the geographic area of the source IP address.

AWS WAF offers an advanced capability for rate-based rules through the use of [aggregation keys](https://docs.aws.amazon.com/waf/latest/developerguide/waf-rule-statement-type-rate-based-aggregation-instances.html). With this functionality, you can configure a rate-based rule to use various other aggregation keys and key combinations, aside from the source IP address. For example, as a single combination, you can aggregate requests based on a forwarded IP address, the HTTP method, and a query argument. This helps you configure more fine-grained rules for sophisticated volumetric traffic mitigation.

## Intrinsic checks
<a name="intrinsic-checks"></a>

*Intrinsic checks* are various types of internal or inherent validations or verifications within a system or process. For bot control, AWS WAF performs an intrinsic check by validating that the information sent in the request matches the system signals. For example, it performs reverse DNS lookups and other system verifications. Some automated requests are necessary, such as SEO-related requests. Allow listing is a way to permit good, expected bots through. But sometimes, malicious bots emulate good bots, and it can be challenging to separate them. AWS WAF provides methods to accomplish this through the managed [AWS WAF Bot Control rule group](https://docs.aws.amazon.com/waf/latest/developerguide/aws-managed-rule-groups-bot.html). The rules in this group provide verification that self-identified bots are who they say they are. AWS WAF checks the details of the request against the known pattern of that bot, and it also performs reverse DNS lookups and other objective verifications.

# Client identification controls for managing bots
<a name="client-identification-controls"></a>

If attack-related traffic cannot be easily recognized through static attributes, then detection needs to be able to accurately identify the client making the request. For example, rate-based rules are often more effective and harder to evade when the attribute being rate-limited is application-specific, such as a cookie or token. Using a cookie tied to a session prevents botnet operators from being able to duplicate similar request flows across many bots.

Token acquisition is commonly used for client identification. For token acquisition, a JavaScript code collects information to generate a token that is evaluated on the server side. The evaluation can range from verifying that JavaScript is running on the client to collecting device information for fingerprinting. Token acquisition requires integrating a JavaScript SDK into the site or application, or it requires that a service provider does the injection dynamically.

Requiring JavaScript support adds an additional hurdle for bots attempting to emulate browsers. When an SDK is involved, such as in a mobile application, token acquisition verifies the SDK implementation and prevents bots from mimicking the application's requests.

Token acquisition requires the use of SDKs implemented on the client side of the connection. The following AWS WAF features provide a JavaScript-based SDK for browsers and a application-based SDK for mobile devices: [Bot Control](https://docs.aws.amazon.com/waf/latest/developerguide/waf-bot-control.html), [Fraud Control account takeover prevention (ATP)](https://docs.aws.amazon.com/waf/latest/developerguide/waf-atp.html) and [Fraud Control account creation fraud prevention (ACFP)](https://docs.aws.amazon.com/waf/latest/developerguide/waf-acfp.html).

The techniques for client identification include CAPTCHA, browser profiling, device fingerprinting, and TLS fingerprinting.

## CAPTCHA
<a name="captcha"></a>

Completely automated public Turing test to tell computers and humans apart ([CAPTCHA](https://docs.aws.amazon.com/waf/latest/developerguide/waf-captcha.html)) is used to distinguish between robotic and human visitors and to prevent web scraping, credential stuffing, and spam. There are a variety of implementations, but they often involve a puzzle that a human can solve. CAPTCHAs offer an additional layer of defense against common bots and can reduce the false positives in bot detection.

AWS WAF allows rules to run a CAPTCHA action against web requests that match a rule's inspection criteria. This action is the result of the evaluation of client identification information collected by the service. AWS WAF rules can require CAPTCHA challenges to be solved for specific resources that are frequently targeted by bots, such as login, search, and form submissions. AWS WAF can directly serve CAPTCHA through interstitial means or by using an SDK to handle it on the client side. For more information see [CAPTCHA and Challenge in AWS WAF](https://docs.aws.amazon.com/waf/latest/developerguide/waf-captcha-and-challenge.html).

## Browser profiling
<a name="browser-profiling"></a>

*Browser profiling* is a method of collecting and evaluating browser characteristics, as part of token acquisition, to distinguish real humans using an interactive browser from distributed bot activity. You can perform browser profiling passively through headers, header order, and other characteristics of requests that are inherent to how browsers work.

You can also perform browser profiling in code by using token acquisition. By using JavaScript for browser profiling, you can quickly determine if a client supports JavaScript. This helps you detect simple bots that do not support it. Browser profiling checks more than just HTTP headers and JavaScript support; browser profiling makes it difficult for bots to fully emulate a web browser. Both browser profiling options have the same goal: to find patterns in a browser profile that indicate inconsistency with how a real browser behaves.

AWS WAF bot control for targeted bots provides an indication, as part of token evaluation, of whether a browser shows evidence of automation or inconsistent signals. AWS WAF flags the request in order to take the action specified in the rule. For more information, see [Detect and block advanced bot traffic](https://aws.amazon.com/blogs/security/detect-and-block-advanced-bot-traffic/) in the AWS Security Blog.

## Device fingerprinting
<a name="device-fingerprinting"></a>

Device fingerprinting is similar to browser profiling, but it is not limited to browsers. Code running on a device (which can be a mobile device or a web browser) collects and reports details of the device to a backend server. The details can include system attributes, such as memory, CPU type, operating system (OS) kernel type, OS version, and virtualization.

You can use device fingerprinting to recognize if a bot is emulating an environment or if there are direct signs that automation is in use. Beyond this, device fingerprinting can also be used to recognize repeated requests from the same device.

Recognizing repeated requests from the same device, even if the device tries to change some characteristics of the request, allows a backend system to impose rate-limiting rules. Rate-limiting rules that are based on device fingerprinting are typically more effective than rate-limiting rules based on IP addresses. This helps you mitigate against bot traffic that is rotating between VPNs or proxies but is sourced from a small number of devices.

When used with application integration SDKs, AWS WAF bot control for targeted bots, can aggregate client session request behavior. This helps you detect and separate legitimate client sessions from malicious client sessions, even when both originate from the same IP address. For more information about AWS WAF bot control for targeted bots, see [Detect and block advanced bot traffic](https://aws.amazon.com/blogs/security/detect-and-block-advanced-bot-traffic/) in the AWS Security Blog.

## TLS fingerprinting
<a name="tls-fingerprinting"></a>

TLS fingerprinting, also known as *signature-based rules,* are commonly used when bots originate from many IP addresses but exhibit similar characteristics. When using HTTPS, the client and server sides exchange messages to acknowledge and verify one another. They establish cryptographic algorithms and sessions keys. This is called a *TLS handshake*. How a TLS handshake is implemented is a signature that is often valuable for recognizing large attacks spread across many IP addresses.

TLS fingerprinting enables web servers to determine a web client's identity with a high degree of accuracy. It requires only the parameters in the first packet connection, before any application data exchange occurs. In this case, *web client* refers to the application initiating a request, which might be a browser, CLI tool, script (bot), native application, or other client.

One SSL and TLS fingerprinting approach is [JA3 fingerprint](https://github.com/salesforce/ja3). JA3 fingerprints a client connection based on fields in the Client Hello message from the SSL or TLS handshake. It helps you profile specific SSL and TLS clients across different source IP addresses, ports, and X.509 certificates.

Amazon CloudFront supports [adding JA3 headers](https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/adding-cloudfront-headers.html) to requests. A `CloudFront-Viewer-JA3-Fingerprint` header contains a 32-character hash fingerprint of the TLS Client Hello packet of an incoming viewer request. The fingerprint encapsulates information about how the client communicates. This information can be used to profile clients that share the same pattern. You can add the `CloudFront-Viewer-JA3-Fingerprint` header to an origin request policy and attach the policy to a CloudFront distribution. You can then inspect the header value in origin applications or in Lambda@Edge and CloudFront Functions. You can compare the header value against a list of known malware fingerprints to block the malicious clients. You can also compare the header value against a list of expected fingerprints to allow requests only from known clients.

# Advanced analysis controls for managing bots
<a name="advanced-analysis-controls"></a>

Some bots employ advanced deception tools to actively evade detection. These bots mimic human behavior in order to perform a specific activity, such as scalping. These bots have a purpose, and it is usually linked to a big monetary reward.

These advanced, persistent bots use a mix of technologies to evade detection or blend in with regular traffic. In turn, this also requires a mix of different detection technologies to accurately identify and mitigate the malicious traffic.

## Targeted use cases
<a name="targeted-use-cases"></a>

Use-case data can provide bot-detection opportunities. *Fraud detections* are special use cases where special mitigation is warranted. For example, to help prevent account takeovers, you can compare a list of compromised account usernames and passwords against login or account creation requests. This helps website owners to detect login attempts that use compromised credentials. Use of compromised credentials can indicate bots trying take over an account, or it could be users who are unaware their credentials are compromised. In this use case, website owners can take additional steps to verify the user and then help them change their password. AWS WAF provides the [Fraud Control account takeover prevention (ATP)](https://docs.aws.amazon.com/waf/latest/developerguide/waf-atp.html) managed rule for this use case.

## Application-level or aggregated bot detection
<a name="aggregated-bot-detection"></a>

Some use cases require combining data about requests from the content delivery network (CDN), AWS WAF, and the backend of the application or service. Sometimes, you even need to integrate third-party intelligence to be able to make high-confidence decisions about bots.

Features in [Amazon CloudFront](https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/Introduction.html) and AWS WAF can send signals to the backend infrastructure, or they can subsequently aggregate rules through headers and [labels](https://docs.aws.amazon.com/waf/latest/developerguide/waf-labels.html). CloudFront exposes JA3 fingerprint headers, as previously mentioned. This is an example of CloudFront providing such data through a header. AWS WAF can send labels when it matches on a rule. Subsequent rules can use these labels to make better decisions about bots. When multiple rules are combined, you can implement highly granular controls. A common use case is to match on parts of a managed rule through a label and then combine it with other request data. For more information, see [Label match examples](https://docs.aws.amazon.com/waf/latest/developerguide/waf-rule-label-match-examples.html) in the AWS WAF documentation.

## Machine learning analysis
<a name="machine-learning-analysis"></a>

Machine leaning (ML) is a powerful technique for dealing with bots. ML can adapt to changing details, and when combined with other tools, provides the most robust and complete way to mitigate bots with minimal false positives. The two most common ML techniques are *behavioral analysis* and *anomaly detection*. With behavioral analysis, a system (in the client, server, or both) monitors how a user interacts with the application or website. It monitors mouse movement patterns or frequency of click and touch interactions. The behavior is then analyzed with a ML model to recognize bots. Anomaly detection is similar. It focuses on detecting behavior or patterns that are significantly different from a baseline that is defined for the application or website.

AWS WAF targeted controls for bots provides predictive ML technology. This technology helps defend against distributed, proxy-based attacks that are made by bots designed to evade detection. The managed [AWS WAF Bot Control rule group](https://docs.aws.amazon.com/waf/latest/developerguide/aws-managed-rule-groups-bot.html) uses automated, ML analysis of website traffic statistics to detect anomalous behavior that is indicative of distributed, coordinated bot activity.