View a markdown version of this page

Web Crawler connector overview - Amazon Q Business

Amazon Q Business will no longer be open to new customers starting on July 31, 2026. If you would like to use the service, please sign up prior to July 30. For capabilities similar to Q Business, explore Amazon Quick. Learn more.

Web Crawler connector overview

The following table gives an overview of the Amazon Q Business Web Crawler connector and its supported features.

Category Feature Support
Security Authentication type
  • Basic

  • NTLM/Kerberos

  • Form

  • SAML

Note

You don't need authentication to crawl public websites you have permission to crawl.

Authentication credentials

Basic authentication

  • Website username

  • Website password

NTLM/Kerberos authentication

  • NTLM/Kerberos username

  • NTLM/Kerberos password

Form authentication

  • Login page URL

  • Website username

  • Website password

  • Username field Xpath

  • Password field Xpath

  • Password button Xpath

  • (Optional) Username button Xpath

SAML authentication

  • Login page URL

  • Website username

  • Website password

  • Username field Xpath

  • Password field Xpath

  • Password button Xpath

  • (Optional) Username button Xpath

Access Control List (ACL) crawling No
Identity crawling No
Crawl features Custom metadata Yes
Visual content processing Yes. Amazon Q Business can extract and index content from images embedded in webpages and the following supported document types: PDF, PowerPoint, Microsoft Word (DOCX), Google Slides, Google Docs
Entities Yes. The following entities are supported:
  • Web page

  • Attachment

See What is a document? for more details on what each connector crawls as a document.

Field mappings Yes. For more information, see Field mappings.
Filters Yes. The following filters are supported:
  • Sync specific domains and subdomains

  • Include files linked on web pages

  • Regex patterns to crawl and index specific URLs

  • Regex patterns to crawl and index specific files

  • Include web pages by crawl depth

  • Specify maximum file size and links per page for Amazon Q to crawl

Sync mode Supports full and new, modified, or deleted content sync
File types Supports all files supported by Amazon Q.