配置 Amazon Kendra Web 爬网程序如何访问您的网站阻止 Amazon Kendra Web 爬网程序爬取您的网站

为 Amazon Kendra Web 爬网程序配置 `robots.txt` 文件

Amazon Kendra 是一项智能搜索服务，AWS 客户使用它来搜索自己选择的文档并编制索引。要为 Web 上的文档编制索引，客户可以使用 Amazon Kendra Web 爬网程序，指明应为哪些 URL 编制索引以及其他操作参数。Amazon Kendra 在为任何特定网站编制索引之前，客户必须获得授权。

Amazon Kendra Web 爬网程序遵循标准 robots.txt 指令，例如，Allow 和 Disallow。您可以修改网站的 robots.txt 文件以控制 Amazon Kendra Web 爬网程序如何爬取您的网站。

配置 Amazon Kendra Web 爬网程序如何访问您的网站

您可以使用 Allow 和 Disallow 指令控制 Amazon Kendra Web 爬网程序如何为您的网站编制索引。您还可以控制为哪些网页编制索引，以及不爬取哪些网页。

要允许 Amazon Kendra Web 爬网程序爬取除不允许的网页之外的所有网页，请使用以下指令：


User-agent: amazon-kendra    # Amazon Kendra Web Crawler
Disallow: /credential-pages/ # disallow access to specific pages

要允许 Amazon Kendra Web 爬网程序仅爬取特定的网页，请使用以下指令：


User-agent: amazon-kendra    # Amazon Kendra Web Crawler
Allow: /pages/ # allow access to specific pages

要允许 Amazon Kendra Web 爬网程序爬取所有网站内容并禁止任何其他机器人爬取，请使用以下指令：


User-agent: amazon-kendra # Amazon Kendra Web Crawler
Allow: / # allow access to all pages
User-agent: * # any (other) robot
Disallow: / # disallow access to any pages

阻止 Amazon Kendra Web 爬网程序爬取您的网站

您可以使用 Disallow 指令阻止 Amazon Kendra Web 爬网程序将您的网站编入索引。您还可以控制爬取哪些网页以及不爬取哪些网页。

要阻止 Amazon Kendra Web 爬网程序爬取网站，请使用以下指令：


User-agent: amazon-kendra # Amazon Kendra Web Crawler
Disallow: / # disallow access to any pages

如果您对 Amazon Kendra Web 爬网程序有任何疑问或疑虑，请联系 AWS 支持团队。

Javascript 在您的浏览器中被禁用或不可用。

要使用 Amazon Web Services 文档，必须启用 Javascript。请参阅浏览器的帮助页面以了解相关说明。

文档惯例

Amazon Kendra Web 爬网程序连接器 v2.0

Box (箱体)

为 Amazon Kendra Web 爬网程序配置 robots.txt 文件

配置 Amazon Kendra Web 爬网程序如何访问您的网站

阻止 Amazon Kendra Web 爬网程序爬取您的网站

为 Amazon Kendra Web 爬网程序配置 `robots.txt` 文件