# Data source connectors
<a name="data-sources"></a>

This section shows you how to connect Amazon Kendra to supported databases and data source repositories using Amazon Kendra in the AWS Management Console and the Amazon Kendra APIs.

**Topics**
+ [Data source template schemas](ds-schemas.md)
+ [Adobe Experience Manager](data-source-aem.md)
+ [Alfresco](data-source-alfresco.md)
+ [Aurora (MySQL)](data-source-aurora-mysql.md)
+ [Aurora (PostgreSQL)](data-source-aurora-postgresql.md)
+ [Amazon FSx (Windows)](data-source-fsx.md)
+ [Amazon FSx (NetApp ONTAP)](data-source-fsx-ontap.md)
+ [Amazon RDS/Aurora](data-source-database.md)
+ [Amazon RDS (Microsoft SQL Server)](data-source-rds-ms-sql-server.md)
+ [Amazon RDS (MySQL)](data-source-rds-mysql.md)
+ [Amazon RDS (Oracle)](data-source-rds-oracle.md)
+ [Amazon RDS (PostgreSQL)](data-source-rds-postgresql.md)
+ [Amazon S3](data-source-s3.md)
+ [Amazon Kendra Web Crawler](data-source-web-crawler.md)
+ [Box](data-source-box.md)
+ [Confluence](data-source-confluence.md)
+ [Custom data source connector](data-source-custom.md)
+ [Dropbox](data-source-dropbox.md)
+ [Drupal](data-source-drupal.md)
+ [GitHub](data-source-github.md)
+ [Gmail](data-source-gmail.md)
+ [Google Drive](data-source-google-drive.md)
+ [IBM DB2](data-source-ibm-db2.md)
+ [Jira](data-source-jira.md)
+ [Microsoft Exchange](data-source-exchange.md)
+ [Microsoft OneDrive](data-source-onedrive.md)
+ [Microsoft SharePoint](data-source-sharepoint.md)
+ [Microsoft SQL Server](data-source-ms-sql-server.md)
+ [Microsoft Teams](data-source-teams.md)
+ [Microsoft Yammer](data-source-yammer.md)
+ [MySQL](data-source-mysql.md)
+ [Oracle Database](data-source-oracle-database.md)
+ [PostgreSQL](data-source-postgresql.md)
+ [Quip](data-source-quip.md)
+ [Salesforce](data-source-salesforce.md)
+ [ServiceNow](data-source-servicenow.md)
+ [Slack](data-source-slack.md)
+ [Zendesk](data-source-zendesk.md)

# Data source template schemas
<a name="ds-schemas"></a>

The following are template schemas for data sources where templates are supported.

**Topics**
+ [Adobe Experience Manager template schema](#ds-aem-schema)
+ [Amazon FSx (Windows) template schema](#ds-fsx-windows-schema)
+ [Amazon FSx (NetApp ONTAP) template schema](#ds-fsx-ontap-schema)
+ [Alfresco template schema](#ds-alfresco-schema)
+ [Aurora (MySQL) template schema](#ds-aurora-mysql-schema)
+ [Aurora (PostgreSQL) template schema](#ds-aurora-postgresql-schema)
+ [Amazon RDS (Microsoft SQL Server) template schema](#ds-rds-ms-sql-server-schema)
+ [Amazon RDS (MySQL) template schema](#ds-rds-mysql-schema)
+ [Amazon RDS (Oracle) template schema](#ds-rds-oracle-schema)
+ [Amazon RDS (PostgreSQL) template schema](#ds-rds-postgresql-schema)
+ [Amazon S3 template schema](#ds-s3-schema)
+ [Amazon Kendra Web Crawler template schema](#ds-schema-web-crawler)
+ [Confluence template schema](#ds-confluence-schema)
+ [Dropbox template schema](#ds-dropbox-schema)
+ [Drupal template schema](#ds-drupal-schema)
+ [GitHub template schema](#ds-github-schema)
+ [Gmail template schema](#ds-gmail-schema)
+ [Google Drive template schema](#ds-googledrive-schema)
+ [IBM DB2 template schema](#ds-ibm-db2-schema)
+ [Microsoft Exchange template schema](#ds-msexchange-schema)
+ [Microsoft OneDrive template schema](#ds-onedrive-schema)
+ [Microsoft SharePoint template schema](#ds-schema-sharepoint)
+ [Microsoft SQL Server template schema](#ds-ms-sql-server-schema)
+ [Microsoft Teams template schema](#ds-msteams-schema)
+ [Microsoft Yammer template schema](#ds-schema-yammer)
+ [MySQL template schema](#ds-mysql-schema)
+ [Oracle Database template schema](#ds-oracle-database-schema)
+ [PostgreSQL template schema](#ds-postgresql-schema)
+ [Salesforce template schema](#ds-salesforce-schema)
+ [ServiceNow template schema](#ds-servicenow-schema)
+ [Slack template schema](#ds-schema-slack)
+ [Zendesk template schema](#ds-schema-zendesk)

## Adobe Experience Manager template schema
<a name="ds-aem-schema"></a>

You include a JSON that contains the data source schema as part of the [https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) object. You provide the Adobe Experience Manager host URL, the authentication type, and whether you use Adobe Experience Manager (AEM) as a Cloud Service or AEM On-Premise as part of the connection configuration or repository endpoint details. Also, specify the type of data source as `AEM`, a secret for your authentication credentials, and other necessary configurations. You then specify `TEMPLATE` as the `Type` when you call [https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html](https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html).

You can use the template provided in this developer guide. For more information, see [Adobe Experience Manager JSON schema](#aem-json).

The following table describes the parameters of the AEM JSON schema.


| Configuration | Description | 
| --- | --- | 
| connectionConfiguration | Configuration information for the endpoint for the data source. | 
| repositoryEndpointMetadata | The endpoint information for the data source. | 
| aemUrl | The Adobe Experience Manager host URL. For example, if you use AEM On-Premise, you include the hostname and port: https://hostname:port. Or, if you use AEM as a Cloud Service, you can use the author URL: https://author-xxxxxx-xxxxxxx.adobeaemcloud.com. | 
| authType | The type of authentication you use, whether Basic or OAuth2. | 
| deploymentType | The type of Adobe Experience Manager that you use, either CLOUD or ON\$1PREMISE. | 
| repositoryConfigurations | Configuration information for the content of the data source. For example, configuring specific types of content and field mappings. | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | A list of objects that map the attributes or field names of your Adobe Experience Manager pages and assets to Amazon Kendra index field names. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html). | 
| additionalProperties | Additional configuration options for your content in your data source. | 
| timeZoneId |  If you use AEM On-Premise and the time zone of your server is different than the time zone of the Amazon Kendra AEM connector or index, you can specify the server time zone to align with the AEM connector or index. The default time zone for AEM On-Premise is the time zone of the Amazon Kendra AEM connector or index. The default time zone for AEM as a Cloud Service is Greenwich Mean Time.  | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | A list of root paths for pages and assets. For example, the root path for a page could be /content/sub and the root path for an asset could be /content/sub/asset1. | 
| crawlAssets | true to crawl assets. | 
| crawlPages | true to crawl pages. | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | A list of regular expression patterns to include certain pages and assets in your Adobe Experience Manager data source. Pages and assets that match the patterns are included in the index. Pages and assets that don't match the patterns are excluded from the index. If a page or asset matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence, and the content isn't included in the index. | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | A list of regular expression patterns to exclude certain pages and assets in your Adobe Experience Manager data source. Pages and assets that match the patterns are excluded from the index. Pages and assets that don't match the patterns are included in the index. If a page or asset matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence, and the content isn't included in the index. | 
| pageComponents | A list of names for the specific page components that you want to index. | 
| contentFragmentVariations | A list of names for the specific saved variations of Adobe Experience Manager Content Fragments that you want to index. | 
| type | The type of data source. Specify AEM as your data source type. | 
| syncMode |  Specify how Amazon Kendra should update your index when your data source content changes. You can choose between: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | 
| secretArn | The Amazon Resource Name (ARN) of an AWS Secrets Manager secret that contains the key-value pairs required to connect to your Adobe Experience Manager. For information on these key-value pairs, see [Connection instructions for Adobe Experience Manager](https://docs.aws.amazon.com/kendra/latest/dg/data-source-aem.html#data-source-procedure-aem). | 
| version | The version of this template that is currently supported. | 

### Adobe Experience Manager JSON schema
<a name="aem-json"></a>

```
{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "type": "object",
  "properties":
  {
    "connectionConfiguration": {
      "type": "object",
      "properties":
      {
        "repositoryEndpointMetadata":
        {
          "type": "object",
          "properties":
          {
            "aemUrl":
            {
              "type": "string",
              "pattern": "https:.*"
            },
            "authType": {
              "type": "string",
              "enum": ["Basic", "OAuth2"]
            },
            "deploymentType": {
              "type": "string",
              "enum": ["CLOUD","ON_PREMISE"]
            }
          },
          "required":
          [
            "aemUrl",
            "authType",
            "deploymentType"
          ]
        }
      },
      "required":
      [
        "repositoryEndpointMetadata"
      ]
    },
    "repositoryConfigurations": {
      "type": "object",
      "properties":
      {
        "page":
        {
          "type": "object",
          "properties":
          {
            "fieldMappings":
            {
              "type": "array",
              "items":
              [
                {
                  "type": "object",
                  "properties":
                  {
                    "indexFieldName":
                    {
                      "type": "string"
                    },
                    "indexFieldType":
                    {
                      "type": "string",
                      "enum":
                      [
                        "STRING",
                        "STRING_LIST",
                        "DATE",
                        "LONG"
                      ]
                    },
                    "dataSourceFieldName":
                    {
                      "type": "string"
                    },
                    "dateFieldFormat":
                    {
                      "type": "string",
                      "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                    }
                  },
                  "required":
                  [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required":
          [
            "fieldMappings"
          ]
        },
        "asset":
        {
          "type": "object",
          "properties":
          {
            "fieldMappings":
            {
              "type": "array",
              "items":
              [
                {
                  "type": "object",
                  "properties":
                  {
                    "indexFieldName":
                    {
                      "type": "string"
                    },
                    "indexFieldType":
                    {
                      "type": "string",
                      "enum":
                      [
                        "STRING",
                        "STRING_LIST",
                        "DATE",
                        "LONG"
                      ]
                    },
                    "dataSourceFieldName":
                    {
                      "type": "string"
                    },
                    "dateFieldFormat":
                    {
                      "type": "string",
                      "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                    }
                  },
                  "required":
                  [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required":
          [
            "fieldMappings"
          ]
        }
      }
    },
    "additionalProperties": {
      "type": "object",
      "properties":
      {
        "timeZoneId": {
          "type": "string",
          "enum": [
            "Africa/Abidjan",
            "Africa/Accra",
            "Africa/Addis_Ababa",
            "Africa/Algiers",
            "Africa/Asmara",
            "Africa/Asmera",
            "Africa/Bamako",
            "Africa/Bangui",
            "Africa/Banjul",
            "Africa/Bissau",
            "Africa/Blantyre",
            "Africa/Brazzaville",
            "Africa/Bujumbura",
            "Africa/Cairo",
            "Africa/Casablanca",
            "Africa/Ceuta",
            "Africa/Conakry",
            "Africa/Dakar",
            "Africa/Dar_es_Salaam",
            "Africa/Djibouti",
            "Africa/Douala",
            "Africa/El_Aaiun",
            "Africa/Freetown",
            "Africa/Gaborone",
            "Africa/Harare",
            "Africa/Johannesburg",
            "Africa/Juba",
            "Africa/Kampala",
            "Africa/Khartoum",
            "Africa/Kigali",
            "Africa/Kinshasa",
            "Africa/Lagos",
            "Africa/Libreville",
            "Africa/Lome",
            "Africa/Luanda",
            "Africa/Lubumbashi",
            "Africa/Lusaka",
            "Africa/Malabo",
            "Africa/Maputo",
            "Africa/Maseru",
            "Africa/Mbabane",
            "Africa/Mogadishu",
            "Africa/Monrovia",
            "Africa/Nairobi",
            "Africa/Ndjamena",
            "Africa/Niamey",
            "Africa/Nouakchott",
            "Africa/Ouagadougou",
            "Africa/Porto-Novo",
            "Africa/Sao_Tome",
            "Africa/Timbuktu",
            "Africa/Tripoli",
            "Africa/Tunis",
            "Africa/Windhoek",
            "America/Adak",
            "America/Anchorage",
            "America/Anguilla",
            "America/Antigua",
            "America/Araguaina",
            "America/Argentina/Buenos_Aires",
            "America/Argentina/Catamarca",
            "America/Argentina/ComodRivadavia",
            "America/Argentina/Cordoba",
            "America/Argentina/Jujuy",
            "America/Argentina/La_Rioja",
            "America/Argentina/Mendoza",
            "America/Argentina/Rio_Gallegos",
            "America/Argentina/Salta",
            "America/Argentina/San_Juan",
            "America/Argentina/San_Luis",
            "America/Argentina/Tucuman",
            "America/Argentina/Ushuaia",
            "America/Aruba",
            "America/Asuncion",
            "America/Atikokan",
            "America/Atka",
            "America/Bahia",
            "America/Bahia_Banderas",
            "America/Barbados",
            "America/Belem",
            "America/Belize",
            "America/Blanc-Sablon",
            "America/Boa_Vista",
            "America/Bogota",
            "America/Boise",
            "America/Buenos_Aires",
            "America/Cambridge_Bay",
            "America/Campo_Grande",
            "America/Cancun",
            "America/Caracas",
            "America/Catamarca",
            "America/Cayenne",
            "America/Cayman",
            "America/Chicago",
            "America/Chihuahua",
            "America/Ciudad_Juarez",
            "America/Coral_Harbour",
            "America/Cordoba",
            "America/Costa_Rica",
            "America/Creston",
            "America/Cuiaba",
            "America/Curacao",
            "America/Danmarkshavn",
            "America/Dawson",
            "America/Dawson_Creek",
            "America/Denver",
            "America/Detroit",
            "America/Dominica",
            "America/Edmonton",
            "America/Eirunepe",
            "America/El_Salvador",
            "America/Ensenada",
            "America/Fort_Nelson",
            "America/Fort_Wayne",
            "America/Fortaleza",
            "America/Glace_Bay",
            "America/Godthab",
            "America/Goose_Bay",
            "America/Grand_Turk",
            "America/Grenada",
            "America/Guadeloupe",
            "America/Guatemala",
            "America/Guayaquil",
            "America/Guyana",
            "America/Halifax",
            "America/Havana",
            "America/Hermosillo",
            "America/Indiana/Indianapolis",
            "America/Indiana/Knox",
            "America/Indiana/Marengo",
            "America/Indiana/Petersburg",
            "America/Indiana/Tell_City",
            "America/Indiana/Vevay",
            "America/Indiana/Vincennes",
            "America/Indiana/Winamac",
            "America/Indianapolis",
            "America/Inuvik",
            "America/Iqaluit",
            "America/Jamaica",
            "America/Jujuy",
            "America/Juneau",
            "America/Kentucky/Louisville",
            "America/Kentucky/Monticello",
            "America/Knox_IN",
            "America/Kralendijk",
            "America/La_Paz",
            "America/Lima",
            "America/Los_Angeles",
            "America/Louisville",
            "America/Lower_Princes",
            "America/Maceio",
            "America/Managua",
            "America/Manaus",
            "America/Marigot",
            "America/Martinique",
            "America/Matamoros",
            "America/Mazatlan",
            "America/Mendoza",
            "America/Menominee",
            "America/Merida",
            "America/Metlakatla",
            "America/Mexico_City",
            "America/Miquelon",
            "America/Moncton",
            "America/Monterrey",
            "America/Montevideo",
            "America/Montreal",
            "America/Montserrat",
            "America/Nassau",
            "America/New_York",
            "America/Nipigon",
            "America/Nome",
            "America/Noronha",
            "America/North_Dakota/Beulah",
            "America/North_Dakota/Center",
            "America/North_Dakota/New_Salem",
            "America/Nuuk",
            "America/Ojinaga",
            "America/Panama",
            "America/Pangnirtung",
            "America/Paramaribo",
            "America/Phoenix",
            "America/Port-au-Prince",
            "America/Port_of_Spain",
            "America/Porto_Acre",
            "America/Porto_Velho",
            "America/Puerto_Rico",
            "America/Punta_Arenas",
            "America/Rainy_River",
            "America/Rankin_Inlet",
            "America/Recife",
            "America/Regina",
            "America/Resolute",
            "America/Rio_Branco",
            "America/Rosario",
            "America/Santa_Isabel",
            "America/Santarem",
            "America/Santiago",
            "America/Santo_Domingo",
            "America/Sao_Paulo",
            "America/Scoresbysund",
            "America/Shiprock",
            "America/Sitka",
            "America/St_Barthelemy",
            "America/St_Johns",
            "America/St_Kitts",
            "America/St_Lucia",
            "America/St_Thomas",
            "America/St_Vincent",
            "America/Swift_Current",
            "America/Tegucigalpa",
            "America/Thule",
            "America/Thunder_Bay",
            "America/Tijuana",
            "America/Toronto",
            "America/Tortola",
            "America/Vancouver",
            "America/Virgin",
            "America/Whitehorse",
            "America/Winnipeg",
            "America/Yakutat",
            "America/Yellowknife",
            "Antarctica/Casey",
            "Antarctica/Davis",
            "Antarctica/DumontDUrville",
            "Antarctica/Macquarie",
            "Antarctica/Mawson",
            "Antarctica/McMurdo",
            "Antarctica/Palmer",
            "Antarctica/Rothera",
            "Antarctica/South_Pole",
            "Antarctica/Syowa",
            "Antarctica/Troll",
            "Antarctica/Vostok",
            "Arctic/Longyearbyen",
            "Asia/Aden",
            "Asia/Almaty",
            "Asia/Amman",
            "Asia/Anadyr",
            "Asia/Aqtau",
            "Asia/Aqtobe",
            "Asia/Ashgabat",
            "Asia/Ashkhabad",
            "Asia/Atyrau",
            "Asia/Baghdad",
            "Asia/Bahrain",
            "Asia/Baku",
            "Asia/Bangkok",
            "Asia/Barnaul",
            "Asia/Beirut",
            "Asia/Bishkek",
            "Asia/Brunei",
            "Asia/Calcutta",
            "Asia/Chita",
            "Asia/Choibalsan",
            "Asia/Chongqing",
            "Asia/Chungking",
            "Asia/Colombo",
            "Asia/Dacca",
            "Asia/Damascus",
            "Asia/Dhaka",
            "Asia/Dili",
            "Asia/Dubai",
            "Asia/Dushanbe",
            "Asia/Famagusta",
            "Asia/Gaza",
            "Asia/Harbin",
            "Asia/Hebron",
            "Asia/Ho_Chi_Minh",
            "Asia/Hong_Kong",
            "Asia/Hovd",
            "Asia/Irkutsk",
            "Asia/Istanbul",
            "Asia/Jakarta",
            "Asia/Jayapura",
            "Asia/Jerusalem",
            "Asia/Kabul",
            "Asia/Kamchatka",
            "Asia/Karachi",
            "Asia/Kashgar",
            "Asia/Kathmandu",
            "Asia/Katmandu",
            "Asia/Khandyga",
            "Asia/Kolkata",
            "Asia/Krasnoyarsk",
            "Asia/Kuala_Lumpur",
            "Asia/Kuching",
            "Asia/Kuwait",
            "Asia/Macao",
            "Asia/Macau",
            "Asia/Magadan",
            "Asia/Makassar",
            "Asia/Manila",
            "Asia/Muscat",
            "Asia/Nicosia",
            "Asia/Novokuznetsk",
            "Asia/Novosibirsk",
            "Asia/Omsk",
            "Asia/Oral",
            "Asia/Phnom_Penh",
            "Asia/Pontianak",
            "Asia/Pyongyang",
            "Asia/Qatar",
            "Asia/Qostanay",
            "Asia/Qyzylorda",
            "Asia/Rangoon",
            "Asia/Riyadh",
            "Asia/Saigon",
            "Asia/Sakhalin",
            "Asia/Samarkand",
            "Asia/Seoul",
            "Asia/Shanghai",
            "Asia/Singapore",
            "Asia/Srednekolymsk",
            "Asia/Taipei",
            "Asia/Tashkent",
            "Asia/Tbilisi",
            "Asia/Tehran",
            "Asia/Tel_Aviv",
            "Asia/Thimbu",
            "Asia/Thimphu",
            "Asia/Tokyo",
            "Asia/Tomsk",
            "Asia/Ujung_Pandang",
            "Asia/Ulaanbaatar",
            "Asia/Ulan_Bator",
            "Asia/Urumqi",
            "Asia/Ust-Nera",
            "Asia/Vientiane",
            "Asia/Vladivostok",
            "Asia/Yakutsk",
            "Asia/Yangon",
            "Asia/Yekaterinburg",
            "Asia/Yerevan",
            "Atlantic/Azores",
            "Atlantic/Bermuda",
            "Atlantic/Canary",
            "Atlantic/Cape_Verde",
            "Atlantic/Faeroe",
            "Atlantic/Faroe",
            "Atlantic/Jan_Mayen",
            "Atlantic/Madeira",
            "Atlantic/Reykjavik",
            "Atlantic/South_Georgia",
            "Atlantic/St_Helena",
            "Atlantic/Stanley",
            "Australia/ACT",
            "Australia/Adelaide",
            "Australia/Brisbane",
            "Australia/Broken_Hill",
            "Australia/Canberra",
            "Australia/Currie",
            "Australia/Darwin",
            "Australia/Eucla",
            "Australia/Hobart",
            "Australia/LHI",
            "Australia/Lindeman",
            "Australia/Lord_Howe",
            "Australia/Melbourne",
            "Australia/NSW",
            "Australia/North",
            "Australia/Perth",
            "Australia/Queensland",
            "Australia/South",
            "Australia/Sydney",
            "Australia/Tasmania",
            "Australia/Victoria",
            "Australia/West",
            "Australia/Yancowinna",
            "Brazil/Acre",
            "Brazil/DeNoronha",
            "Brazil/East",
            "Brazil/West",
            "CET",
            "CST6CDT",
            "Canada/Atlantic",
            "Canada/Central",
            "Canada/Eastern",
            "Canada/Mountain",
            "Canada/Newfoundland",
            "Canada/Pacific",
            "Canada/Saskatchewan",
            "Canada/Yukon",
            "Chile/Continental",
            "Chile/EasterIsland",
            "Cuba",
            "EET",
            "EST5EDT",
            "Egypt",
            "Eire",
            "Etc/GMT",
            "Etc/GMT+0",
            "Etc/GMT+1",
            "Etc/GMT+10",
            "Etc/GMT+11",
            "Etc/GMT+12",
            "Etc/GMT+2",
            "Etc/GMT+3",
            "Etc/GMT+4",
            "Etc/GMT+5",
            "Etc/GMT+6",
            "Etc/GMT+7",
            "Etc/GMT+8",
            "Etc/GMT+9",
            "Etc/GMT-0",
            "Etc/GMT-1",
            "Etc/GMT-10",
            "Etc/GMT-11",
            "Etc/GMT-12",
            "Etc/GMT-13",
            "Etc/GMT-14",
            "Etc/GMT-2",
            "Etc/GMT-3",
            "Etc/GMT-4",
            "Etc/GMT-5",
            "Etc/GMT-6",
            "Etc/GMT-7",
            "Etc/GMT-8",
            "Etc/GMT-9",
            "Etc/GMT0",
            "Etc/Greenwich",
            "Etc/UCT",
            "Etc/UTC",
            "Etc/Universal",
            "Etc/Zulu",
            "Europe/Amsterdam",
            "Europe/Andorra",
            "Europe/Astrakhan",
            "Europe/Athens",
            "Europe/Belfast",
            "Europe/Belgrade",
            "Europe/Berlin",
            "Europe/Bratislava",
            "Europe/Brussels",
            "Europe/Bucharest",
            "Europe/Budapest",
            "Europe/Busingen",
            "Europe/Chisinau",
            "Europe/Copenhagen",
            "Europe/Dublin",
            "Europe/Gibraltar",
            "Europe/Guernsey",
            "Europe/Helsinki",
            "Europe/Isle_of_Man",
            "Europe/Istanbul",
            "Europe/Jersey",
            "Europe/Kaliningrad",
            "Europe/Kiev",
            "Europe/Kirov",
            "Europe/Kyiv",
            "Europe/Lisbon",
            "Europe/Ljubljana",
            "Europe/London",
            "Europe/Luxembourg",
            "Europe/Madrid",
            "Europe/Malta",
            "Europe/Mariehamn",
            "Europe/Minsk",
            "Europe/Monaco",
            "Europe/Moscow",
            "Europe/Nicosia",
            "Europe/Oslo",
            "Europe/Paris",
            "Europe/Podgorica",
            "Europe/Prague",
            "Europe/Riga",
            "Europe/Rome",
            "Europe/Samara",
            "Europe/San_Marino",
            "Europe/Sarajevo",
            "Europe/Saratov",
            "Europe/Simferopol",
            "Europe/Skopje",
            "Europe/Sofia",
            "Europe/Stockholm",
            "Europe/Tallinn",
            "Europe/Tirane",
            "Europe/Tiraspol",
            "Europe/Ulyanovsk",
            "Europe/Uzhgorod",
            "Europe/Vaduz",
            "Europe/Vatican",
            "Europe/Vienna",
            "Europe/Vilnius",
            "Europe/Volgograd",
            "Europe/Warsaw",
            "Europe/Zagreb",
            "Europe/Zaporozhye",
            "Europe/Zurich",
            "GB",
            "GB-Eire",
            "GMT",
            "GMT0",
            "Greenwich",
            "Hongkong",
            "Iceland",
            "Indian/Antananarivo",
            "Indian/Chagos",
            "Indian/Christmas",
            "Indian/Cocos",
            "Indian/Comoro",
            "Indian/Kerguelen",
            "Indian/Mahe",
            "Indian/Maldives",
            "Indian/Mauritius",
            "Indian/Mayotte",
            "Indian/Reunion",
            "Iran",
            "Israel",
            "Jamaica",
            "Japan",
            "Kwajalein",
            "Libya",
            "MET",
            "MST7MDT",
            "Mexico/BajaNorte",
            "Mexico/BajaSur",
            "Mexico/General",
            "NZ",
            "NZ-CHAT",
            "Navajo",
            "PRC",
            "PST8PDT",
            "Pacific/Apia",
            "Pacific/Auckland",
            "Pacific/Bougainville",
            "Pacific/Chatham",
            "Pacific/Chuuk",
            "Pacific/Easter",
            "Pacific/Efate",
            "Pacific/Enderbury",
            "Pacific/Fakaofo",
            "Pacific/Fiji",
            "Pacific/Funafuti",
            "Pacific/Galapagos",
            "Pacific/Gambier",
            "Pacific/Guadalcanal",
            "Pacific/Guam",
            "Pacific/Honolulu",
            "Pacific/Johnston",
            "Pacific/Kanton",
            "Pacific/Kiritimati",
            "Pacific/Kosrae",
            "Pacific/Kwajalein",
            "Pacific/Majuro",
            "Pacific/Marquesas",
            "Pacific/Midway",
            "Pacific/Nauru",
            "Pacific/Niue",
            "Pacific/Norfolk",
            "Pacific/Noumea",
            "Pacific/Pago_Pago",
            "Pacific/Palau",
            "Pacific/Pitcairn",
            "Pacific/Pohnpei",
            "Pacific/Ponape",
            "Pacific/Port_Moresby",
            "Pacific/Rarotonga",
            "Pacific/Saipan",
            "Pacific/Samoa",
            "Pacific/Tahiti",
            "Pacific/Tarawa",
            "Pacific/Tongatapu",
            "Pacific/Truk",
            "Pacific/Wake",
            "Pacific/Wallis",
            "Pacific/Yap",
            "Poland",
            "Portugal",
            "ROK",
            "Singapore",
            "SystemV/AST4",
            "SystemV/AST4ADT",
            "SystemV/CST6",
            "SystemV/CST6CDT",
            "SystemV/EST5",
            "SystemV/EST5EDT",
            "SystemV/HST10",
            "SystemV/MST7",
            "SystemV/MST7MDT",
            "SystemV/PST8",
            "SystemV/PST8PDT",
            "SystemV/YST9",
            "SystemV/YST9YDT",
            "Turkey",
            "UCT",
            "US/Alaska",
            "US/Aleutian",
            "US/Arizona",
            "US/Central",
            "US/East-Indiana",
            "US/Eastern",
            "US/Hawaii",
            "US/Indiana-Starke",
            "US/Michigan",
            "US/Mountain",
            "US/Pacific",
            "US/Samoa",
            "UTC",
            "Universal",
            "W-SU",
            "WET",
            "Zulu",
            "EST",
            "HST",
            "MST",
            "ACT",
            "AET",
            "AGT",
            "ART",
            "AST",
            "BET",
            "BST",
            "CAT",
            "CNT",
            "CST",
            "CTT",
            "EAT",
            "ECT",
            "IET",
            "IST",
            "JST",
            "MIT",
            "NET",
            "NST",
            "PLT",
            "PNT",
            "PRT",
            "PST",
            "SST",
            "VST"
          ]
        },
        "pageRootPaths":
        {
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "assetRootPaths":
        {
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "crawlAssets":
        {
          "type": "boolean"
        },
        "crawlPages":
        {
          "type": "boolean"
        },
        "pagePathInclusionPatterns":
        {
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "pagePathExclusionPatterns":
        {
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "pageNameInclusionPatterns":
        {
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "pageNameExclusionPatterns":
        {
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "assetPathInclusionPatterns":
        {
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "assetPathExclusionPatterns":
        {
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "assetTypeInclusionPatterns":
        {
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "assetTypeExclusionPatterns":
        {
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "assetNameInclusionPatterns":
        {
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "assetNameExclusionPatterns":
        {
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "pageComponents": {
          "type": "array",
          "items": {
            "type": "object"
            }
        },
        "contentFragmentVariations": {
          "type": "array",
          "items": {
            "type": "object"
          }
        },
        "cugExemptedPrincipals": {
          "type": "array",
          "items": {
            "type": "string"
          }
        }
      },
      "required":
      []
    },
    "type": {
      "type": "string",
      "pattern": "AEM"
    },
    "enableIdentityCrawler": {
      "type": "boolean"
    },
    "syncMode": {
      "type": "string",
      "enum": [
        "FORCED_FULL_CRAWL",
        "FULL_CRAWL",
        "CHANGE_LOG"
      ]
    },
    "secretArn": {
      "type": "string",
      "minLength": 20,
      "maxLength": 2048
    }
  },
  "version": {
    "type": "string",
    "anyOf": [
      {
        "pattern": "1.0.0"
      }
    ]
  },
  "required": [
    "connectionConfiguration",
    "repositoryConfigurations",
    "syncMode",
    "additionalProperties",
    "secretArn",
    "type"
  ]
}
```

## Amazon FSx (Windows) template schema
<a name="ds-fsx-windows-schema"></a>

You include a JSON that contains the data source schema as part of the [https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html](https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html) object. You provide the file system ID as part of the connection configuration or repository endpoint details. You must also specify the type of data source as `FSX`, a secret for your authentication credentials, and other necessary configurations. You then specify `TEMPLATE` as the `Type` when you call [https://docs.aws.amazon.com/kendra/latest/APIReference/API_CreateDataSource.html](https://docs.aws.amazon.com/kendra/latest/APIReference/API_CreateDataSource.html).

You can use the template provided in this developer guide. See [Amazon FSx (Windows) JSON schema](#fsx-windows-json).

The following table describes the parameters of the Amazon FSx (Windows) JSON schema.


| Configuration | Description | 
| --- | --- | 
| connectionConfiguration | Configuration information for the endpoint for the data source. | 
| repositoryEndpointMetadata | The endpoint information for the data source. | 
| fileSystemId | The identifier of the Amazon FSx file system. You can find your file system ID on the File Systems dashboard in the Amazon FSx console. | 
| fileSystemType | The Amazon FSx file system type. To use Windows File Server as your type of file system, specify WINDOWS. | 
| repositoryConfigurations | Configuration information for the content of the data source. For example, configuring specific types of content and field mappings. | 
| All | A list of objects that map attributes or field names of your files in your Amazon FSx data source to Amazon Kendra index field names. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html). | 
| additionalProperties | Additional configuration options for your content in your data source. | 
| isCrawlAcl | true to crawl the access control list (ACL) information for your documents, if you have an ACL and want to use it for access control. The ACL specifies which documents that users and groups can access. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see [User context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources). | 
| inclusionPatterns | A list of regular expression patterns to include certain files in your Amazon FSx data source. Files that match the patterns are included in the index. Files that don't match the patterns are excluded from the index. If a file matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence and the file isn't included in the index. | 
| exclusionPatterns | A list of regular expression patterns to exclude certain files in your Amazon FSx data source. Files that match the patterns are excluded from the index. Files that don't match the patterns are included in the index. If a file matches both an exclusion and inclusion pattern, the exclusion pattern takes precedence and the file isn't included in the index. | 
| enableIdentityCrawler | true to use Amazon Kendra's identity crawler to sync identity/principal information on users and groups with access to certain documents. If identity crawler is turned off, all documents can be publicly searched. If you want to use access control for your documents and identity crawler is turned off, you can alternatively use the [https://docs.aws.amazon.com/kendra/latest/dg/API_PutPrincipalMapping.html](https://docs.aws.amazon.com/kendra/latest/dg/API_PutPrincipalMapping.html) API to upload user and group access information. | 
| syncMode |  Specify how Amazon Kendra should update your index when your data source content changes. You can choose between: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | 
| type | The type of data source. For Windows file system data sources, specify FSX. | 

### Amazon FSx (Windows) JSON schema
<a name="fsx-windows-json"></a>

```
{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "type": "object",
  "properties": {
    "connectionConfiguration": {
      "type": "object",
      "properties": {
        "repositoryEndpointMetadata": {
          "type": "object",
          "properties": {
            "fileSystemId": {
              "type": "string",
              "pattern": "fs-.*"
            },
            "fileSystemType": {
              "type": "string",
              "pattern": "WINDOWS"
            }
          },
          "required": ["fileSystemId", "fileSystemType"]
        }
      }
    },
    "repositoryConfigurations": {
      "type": "object",
      "properties": {
        "All": {
          "type": "object",
          "properties": {
            "fieldMappings": {
              "type": "array",
              "items": [
                {
                  "type": "object",
                  "properties": {
                    "indexFieldName": {
                      "type": "string"
                    },
                    "indexFieldType": {
                      "type": "string",
                      "enum": ["STRING", "STRING_LIST", "DATE"]
                    },
                    "dataSourceFieldName": {
                      "type": "string"
                    },
                    "dateFieldFormat": {
                      "type": "string",
                      "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                    }
                  },
                  "required": [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required": ["fieldMappings"]
        }
      },
      "required": ["All"]
    },
    "additionalProperties": {
      "type": "object",
      "properties": {
        "isCrawlAcl": {
          "type": "boolean"
        },
        "exclusionPatterns": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "inclusionPatterns": {
          "type": "array",
          "items": {
            "type": "string"
          }
        }
      },
      "required": []
    },
    "enableIdentityCrawler": {
      "type": "boolean"
    },
    "syncMode": {
      "type": "string",
      "enum": [
        "FORCED_FULL_CRAWL",
        "FULL_CRAWL"
      ]
    },
    "type" : {
      "type" : "string",
      "pattern": "FSX"
    }
  },
  "version": {
    "type": "string",
    "anyOf": [
      {
        "pattern": "1.0.0"
      }
    ]
  },
  "required": [
    "connectionConfiguration",
    "repositoryConfigurations",
    "syncMode",
    "enableIdentityCrawler",
    "additionalProperties",
    "type"
  ]
}
```

## Amazon FSx (NetApp ONTAP) template schema
<a name="ds-fsx-ontap-schema"></a>

You include a JSON that contains the data source schema as part of the [https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html](https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html) object. You provide the file system ID and the storage virtual machine (SVM) as part of the connection configuration or repository endpoint details. You must also specify the type of data source as `FSXONTAP`, a secret for your authentication credentials, and other necessary configurations. You then specify `TEMPLATE` as the `Type` when you call [https://docs.aws.amazon.com/kendra/latest/APIReference/API_CreateDataSource.html](https://docs.aws.amazon.com/kendra/latest/APIReference/API_CreateDataSource.html).

You can use the template provided in this developer guide. See [Amazon FSx (NetApp ONTAP) JSON schema](#fsx-ontap-json).

The following table describes the parameters of the Amazon FSx (NetApp ONTAP) JSON schema.


| Configuration | Description | 
| --- | --- | 
| connectionConfiguration | Configuration information for the endpoint for the data source. | 
| repositoryEndpointMetadata | The endpoint information for the data source. | 
| fileSystemId | The identifier of the Amazon FSx file system. You can find your file system ID on the File Systems dashboard in the Amazon FSx console. For information about how to create a file system in the Amazon FSx console for NetApp ONTAP, see [Getting Started Guide for NetApp ONTAP](https://docs.aws.amazon.com/fsx/latest/ONTAPGuide/getting-started.html) in the FSx for ONTAP User Guide. | 
| fileSystemType | The Amazon FSx file system type. To use NetApp ONTAP as your type of file system, specify ONTAP. | 
| svmId | The identifier of storage virtual machine (SVM) used with your Amazon FSx file system for NetApp ONTAP. You can find your SVM ID by going to the File Systems dashboard in the Amazon FSx console, selecting your file system ID, and then selecting Storage virtual machines. For information about how to create a file system in the Amazon FSx console for NetApp ONTAP, see [Getting Started Guide for NetApp ONTAP](https://docs.aws.amazon.com/fsx/latest/ONTAPGuide/getting-started.html) in the FSx for ONTAP User Guide. | 
| protocolType | Whether you use the Common Internet File System (CIFS) protocol for Windows, or the Network File System (NFS) protocol for Linux. | 
| repositoryConfigurations | Configuration information for the content of the data source. For example, configuring specific types of content and field mappings. | 
| file | A list of objects that map attributes or field names of your files in your Amazon FSx data source to Amazon Kendra index field names. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html). The data source field names must exist in your files custom metadata. | 
| additionalProperties | Additional configuration options for your content in your data source. | 
| crawlAcl | true to crawl the access control list (ACL) information for your documents, if you have an ACL and want to use it for access control. The ACL specifies which documents that users and groups can access. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see [User context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources). | 
| inclusionPatterns | A list of regular expression patterns to include certain files in your Amazon FSx data source. Files that match the patterns are included in the index. Files that don't match the patterns are excluded from the index. If a file matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence and the file isn't included in the index. | 
| exclusionPatterns | A list of regular expression patterns to exclude certain files in your Amazon FSx data source. Files that match the patterns are excluded from the index. Files that don't match the patterns are included in the index. If a file matches both an exclusion and inclusion pattern, the exclusion pattern takes precedence and the file isn't included in the index. | 
| type | The type of data source. For NetApp ONTAP file system data sources, specify FSXONTAP. | 
| syncMode |  Specify how Amazon Kendra should update your index when your data source content changes. You can choose between: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | 
| secretArn |  The Amazon Resource Name (ARN) of an AWS Secrets Manager secret that contains the key-value pairs required to connect to your Amazon FSx file system. The secret must contain a JSON structure with the following keys: <pre>{<br />    "username": "user@corp.example.com",<br />    "password": "password"<br />}</pre> If you use the NFS protocol for your Amazon FSx file system, the secret is stored in a JSON structure with the following keys: <pre>{<br />    "leftId": "left ID",<br />    "rightId": "right ID",<br />    "preSharedKey": "pre-shared key"<br />}</pre>  | 

### Amazon FSx (NetApp ONTAP) JSON schema
<a name="fsx-ontap-json"></a>

```
{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "type": "object",
  "properties": {
    "connectionConfiguration": {
      "type": "object",
      "properties": {
        "repositoryEndpointMetadata": {
          "type": "object",
          "properties": {
            "fileSystemId": {
              "type": "string",
                "pattern": "^(fs-[0-9a-f]{8,21})$"
            },
            "fileSystemType": {
              "type": "string",
              "enum": ["ONTAP"]
            },
            "svmId": {
              "type": "string",
              "pattern": "^(svm-[0-9a-f]{17,21})$"
            },
            "protocolType": {
              "type": "string",
              "enum": [
                "CIFS",
                "NFS"
              ]
            }
          },
          "required": [
            "fileSystemId",
            "fileSystemType"
          ]
        }
      },
      "required": [
        "repositoryEndpointMetadata"
      ]
    },
    "repositoryConfigurations": {
      "type": "object",
      "properties": {
        "file": {
          "type": "object",
          "properties": {
            "fieldMappings": {
              "type": "array",
              "items": [
                {
                  "type": "object",
                  "properties": {
                    "indexFieldName": {
                      "type": "string",
                      "pattern": "^([a-zA-Z_]{1,20})$"
                    },
                    "indexFieldType": {
                      "type": "string",
                      "enum": [
                        "STRING",
                        "STRING_LIST",
                        "DATE",
                        "LONG"
                      ]
                    },
                    "dataSourceFieldName": {
                      "type": "string",
                      "pattern": "^([a-zA-Z_]{1,20})$"
                    },
                    "dateFieldFormat": {
                      "type": "string",
                      "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                    }
                  },
                  "required": [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ],
              "maxItems": 50
            }
          },
          "required": [
            "fieldMappings"
          ]
        }
      },
      "required": [
        "file"
      ]
    },
    "additionalProperties": {
      "type": "object",
      "properties": {
        "crawlAcl": {
          "type": "boolean"
        },
        "inclusionPatterns": {
          "type": "array",
          "items": {
            "type": "string",
            "maxLength": 30
          },
          "maxItems": 100
        },
        "exclusionPatterns": {
          "type": "array",
          "items": {
            "type": "string",
            "maxLength": 30
          },
          "maxItems": 100
        }
      }
    },
    "type": {
      "type": "string",
      "pattern": "FSXONTAP"
    },
    "syncMode": {
      "type": "string",
      "enum": [
        "FORCED_FULL_CRAWL",
        "FULL_CRAWL"
      ]
    },
    "secretArn": {
      "type": "string",
      "pattern": "arn:aws:secretsmanager:.*"
    }
  },
  "required": [
    "connectionConfiguration",
    "repositoryConfigurations",
    "additionalProperties",
    "secretArn",
    "type"
  ]
}
```

## Alfresco template schema
<a name="ds-alfresco-schema"></a>

You include a JSON that contains the data source schema as part of the [https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) object. You provide the Alfresco site ID, repository URL, user interface URL, authentication type, whether you use cloud or on-premises, and the type of content you want to crawl. You provide this as a part of the connection configuration or repository endpoint details. Also specify the type of data source as `ALFRESCO`, a secret for your authentication credentials, and other necessary configurations. You then specify `TEMPLATE` as the `Type` when you call [https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html](https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html).

You can use the template provided in this developer guide. See [Alfresco JSON schema](#alfresco-json).

The following table describes the parameters of the Alfresco JSON schema.


| Configuration | Description | 
| --- | --- | 
| connectionConfiguration | Configuration information for the endpoint for the data source. | 
| repositoryEndpointMetadata | The endpoint information for the data source. | 
| siteId | The identifier of the Alfresco site. | 
| repoUrl | The URL of your Alfresco repository. You can get the repository URL from your Alfresco administrator. For example, if you use Alfresco Cloud (PaaS), the repository URL could be https://company.alfrescocloud.com. Or, if you use Alfresco On-Premises, the repository URL could be https://company-alfresco-instance.company-domain.suffix:port. | 
| webAppUrl | The URL of your Alfresco user interface. You can get the Alfresco user interface URL from your Alfresco administrator. For example, the user interface URL could be https://example.com. | 
| repositoryAdditionalProperties | Additional properties to connect with the repository/data source endpoint. | 
| authType | The type of authentication that you use, whether OAuth2 or Basic. | 
| type (deployment) | The type of Alfresco that you use, whether PAAS or ON-PREM. | 
| crawlType | The type of content that you want to crawl, whether ASPECT (content marked with 'Aspects' in Alfresco), SITE\$1ID (content within a specific Alfresco site), or ALL\$1SITES (content across all your Alfresco sites). | 
| repositoryConfigurations | Configuration information for the content of the data source. For example, configuring specific types of content and field mappings. | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | A list of objects that map the attributes or field names of your Alfresco documents and comments to Amazon Kendra index field names. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html). | 
| additionalProperties | Additional configuration options for your content in your data source. | 
| aspectName |  The name of a specific 'Aspect' that you want to index.  | 
| aspectProperties |  A list of specific 'Aspect' content properties that you want to index.  | 
| enableFineGrainedControl |  `true` to crawl 'Aspects'.  | 
| isCrawlComment |  `true` to crawl comments.  | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | A list of regular expression patterns to include certain files in your Alfresco data source. Files that match the patterns are included in the index. Files that don't match the patterns are excluded from the index. If a file matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence, and the file isn't included in the index. | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | A list of regular expression patterns to exclude certain files in your Alfresco data source. Files that match the patterns are excluded from the index. Files that don't match the patterns are included in the index. If a file matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence, and the file isn't included in the index. | 
| type | The type of data source. Specify ALFRESCO as your data source type. | 
| secretArn |  The Amazon Resource Name (ARN) of an AWS Secrets Manager secret that contains the key-value pairs that are required to connect to your Alfresco. The secret must contain a JSON structure with the following keys: If using basic authentication: <pre>{<br />    "username": "user name",<br />    "password": "password"<br />}</pre> If using OAuth 2.0 authentication: <pre>{<br />    "clientId": "client ID",<br />    "clientSecret": "client secret",<br />    "tokenUrl": "token URL"<br />}</pre>  | 
| syncMode |  Specify how Amazon Kendra should update your index when your data source content changes. You can choose between: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | 
| enableIdentityCrawler | true to use Amazon Kendra's identity crawler to sync identity/principal information on users and groups with access to certain documents. If identity crawler is turned off, all documents can be publicly searched. If you want to use access control for your documents and identity crawler is turned off, you can alternatively use the [https://docs.aws.amazon.com/kendra/latest/dg/API_PutPrincipalMapping.html](https://docs.aws.amazon.com/kendra/latest/dg/API_PutPrincipalMapping.html) API to upload user and group access information. | 
| version | The version of this template that is currently supported. | 

### Alfresco JSON schema
<a name="alfresco-json"></a>

```
{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "type": "object",
  "properties": {
    "connectionConfiguration": {
      "type": "object",
      "properties": {
        "repositoryEndpointMetadata": {
          "type": "object",
          "properties": {
            "siteId": {
              "type": "string"
            },
            "repoUrl": {
              "type": "string"
            },
            "webAppUrl": {
              "type": "string"
            },
            "repositoryAdditionalProperties": {
              "type": "object",
              "properties": {
                "authType": {
                  "type": "string",
                  "enum": [
                    "OAuth2",
                    "Basic"
                  ]
                },
                "type": {
                  "type": "string",
                  "enum": [
                    "PAAS",
                    "ON_PREM"
                  ]
                },
                "crawlType": {
                  "type": "string",
                  "enum": [
                    "ASPECT",
                    "SITE_ID",
                    "ALL_SITES"
                  ]
                }
              }
            }
          }
        }
      },
      "required": [
        "repositoryEndpointMetadata"
      ]
    },
    "repositoryConfigurations": {
      "type": "object",
      "properties": {
        "document": {
          "type": "object",
          "properties": {
            "fieldMappings": {
              "type": "array",
              "items": {
                "anyOf": [
                  {
                    "type": "object",
                    "properties": {
                      "indexFieldName": {
                        "type": "string"
                      },
                      "indexFieldType": {
                        "type": "string",
                        "enum": [
                          "STRING",
                          "DATE",
                          "STRING_LIST",
                          "LONG"
                        ]
                      },
                      "dataSourceFieldName": {
                        "type": "string"
                      },
                      "dateFieldFormat": {
                        "type": "string",
                        "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                      }
                    },
                    "required": [
                      "indexFieldName",
                      "indexFieldType",
                      "dataSourceFieldName"
                    ]
                  }
                ]
              }
            }
          },
          "required": [
            "fieldMappings"
          ]
        },
        "comment": {
          "type": "object",
          "properties": {
            "fieldMappings": {
              "type": "array",
              "items": {
                "anyOf": [
                  {
                    "type": "object",
                    "properties": {
                      "indexFieldName": {
                        "type": "string"
                      },
                      "indexFieldType": {
                        "type": "string",
                        "enum": [
                          "STRING",
                          "DATE",
                          "STRING_LIST",
                          "LONG"
                        ]
                      },
                      "dataSourceFieldName": {
                        "type": "string"
                      },
                      "dateFieldFormat": {
                        "type": "string",
                        "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                      }
                    },
                    "required": [
                      "indexFieldName",
                      "indexFieldType",
                      "dataSourceFieldName"
                    ]
                  }
                ]
              }
            }
          },
          "required": [
            "fieldMappings"
          ]
        }
      }
    },
    "additionalProperties": {
      "type": "object",
      "properties": {
        "aspectName": {
          "type": "string"
        },
        "aspectProperties": {
          "type": "array"
        },
        "enableFineGrainedControl": {
          "type": "boolean"
        },
        "isCrawlComment": {
          "type": "boolean"
        },
        "inclusionFileNamePatterns": {
          "type": "array"
        },
        "exclusionFileNamePatterns": {
          "type": "array"
        },
        "inclusionFileTypePatterns": {
          "type": "array"
        },
        "exclusionFileTypePatterns": {
          "type": "array"
        },
        "inclusionFilePathPatterns": {
          "type": "array"
        },
        "exclusionFilePathPatterns": {
          "type": "array"
        }
      }
    },
    "type": {
      "type": "string",
      "pattern": "ALFRESCO"
    },
    "secretArn": {
      "type": "string",
      "minLength": 20,
      "maxLength": 2048
    },
    "syncMode": {
      "type": "string",
      "enum": [
        "FORCED_FULL_CRAWL",
        "FULL_CRAWL"
      ]
    },
    "enableIdentityCrawler": {
      "type": "boolean"
    },
    "version": {
      "type": "string",
      "anyOf": [
        {
          "pattern": "1.0.0"
        }
      ]
    }
  },
  "required": [
    "connectionConfiguration",
    "repositoryConfigurations",
    "additionalProperties",
    "type",
    "secretArn"
  ]
}
```

## Aurora (MySQL) template schema
<a name="ds-aurora-mysql-schema"></a>

You include a JSON that contains the data source schema as part of the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html) object. Specify the type of data source as `JDBC`, the database type as `mysql`, a secret for your authentication credentials, and other necessary configurations. You then specify `TEMPLATE` as the `Type` when you call [CreateDataSource](https://docs.aws.amazon.com/kendra/latest/APIReference/API_CreateDataSource.html).

You can use the template provided in this developer guide. See [Aurora (MySQL) JSON schema](#aurora-mysql-json).

The following table describes the parameters of the Aurora (MySQL) JSON schema.


| Configuration | Description | 
| --- | --- | 
| connectionConfiguration | Configuration information for the endpoint for the data source. | 
| repositoryEndpointMetadata | Required configuration information for connecting your data source.[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html) | 
| repositoryConfigurations | Configuration information for the content of the data source. For example, configuring specific types of content and field mappings. Specify the type of data source and the secret ARN. | 
|  document  |  A list of objects that map the attributes or field names of your database content to Amazon Kendra index field names. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html). | 
| additionalProperties | Additional configuration options for your content in your data source. Use to include or exclude specific content in your database data source. | 
| primaryKey  | Provide the primary key for the database table. This identifies a table within your database. | 
| titleColumn | Provide the name of the document title column within your database table. | 
| bodyColumn | Provide the name of the document title column within your database table. | 
| sqlQuery | Enter SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB. Amazon Kendra will crawl all database content that matches your query. | 
| timestampColumn | Enter the name of the column which contains time stamps. Amazon Kendra uses time stamp information to detect changes in your content and sync only changed content. | 
| timestampFormat | Enter the name of the column which contains time stamp formats to use to detect content changes and re-sync your content. | 
| timezone | Enter the name of the column which contains time zones for the content to be crawled. | 
| changeDetectingColumns | Enter the names of the columns that Amazon Kendra will use to detect content changes. Amazon Kendra will re-index content when there is a change in any of these columns | 
| allowedUsersColumns | Enter the name of the column which contains User IDs to be allowed access to content. | 
| allowedGroupsColumn | Enter the name of the column which contains User IDs to be allowed access to content. | 
| sourceURIColumn | Enter the name of the column which contains Source URLs to be indexed. | 
| isSslEnabled | Enter SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB. Amazon Kendra will crawl all database content that matches your query. | 
| type | The type of data source. Specify JDBC as your data source type. | 
| syncMode |  Specify how Amazon Kendra should update your index when your data source content changes. You can choose between: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | 
| secretArn | The Amazon Resource Name (ARN) of a Secrets Manager secret that contains user name and password required to connect to your database. The secret must contain a JSON structure with the following keys: <pre>{<br />    "user name": "database user name",<br />    "password": "password"<br />}</pre> | 
| version | The version of the template that is currently supported. | 

### Aurora (MySQL) JSON schema
<a name="aurora-mysql-json"></a>

```
{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "type": "object",
  "properties": {
    "connectionConfiguration": {
      "type": "object",
      "properties": {
        "repositoryEndpointMetadata": {
          "type": "object",
          "properties": {
            "dbType": {
              "type": "string",
              "enum": [
                "mysql",
                "db2",
                "postgresql",
                "oracle",
                "sqlserver"
              ]
            },
            "dbHost": {
              "type": "string"
            },
            "dbPort": {
              "type": "string"
            },
            "dbInstance": {
              "type": "string"
            }
          },
          "required": [
            "dbType",
            "dbHost",
            "dbPort",
            "dbInstance"
          ]
        }
      },
      "required": [
        "repositoryEndpointMetadata"
      ]
    },
    "repositoryConfigurations": {
      "type": "object",
      "properties": {
        "document": {
          "type": "object",
          "properties": {
            "fieldMappings": {
              "type": "array",
              "items": [
                {
                  "type": "object",
                  "properties": {
                    "indexFieldName": {
                      "type": "string"
                    },
                    "indexFieldType": {
                      "type": "string"
                    },
                    "dataSourceFieldName": {
                      "type": "string"
                    }
                  },
                  "required": [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required": [
            "fieldMappings"
          ]
        }
      },
      "required": [
      ]
    },
    "additionalProperties": {
      "type": "object",
      "properties": {
        "primaryKey": {
          "type": "string"
        },
        "titleColumn": {
          "type": "string"
        },
        "bodyColumn": {
          "type": "string"
        },
        "sqlQuery": {
          "type": "string",
          "not": {
            "pattern": ";+"
          }
        },
        "timestampColumn": {
          "type": "string"
        },
        "timestampFormat": {
          "type": "string"
        },
        "timezone": {
          "type": "string"
        },
        "changeDetectingColumns": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "allowedUsersColumn": {
          "type": "string"
        },
        "allowedGroupsColumn": {
          "type": "string"
        },
        "sourceURIColumn": {
          "type": "string"
        },
        "isSslEnabled": {
          "type": "boolean"
        }
      },
      "required": ["primaryKey", "titleColumn", "bodyColumn", "sqlQuery"]
    },
    "type" : {
      "type" : "string",
      "pattern": "JDBC"
    },
    "syncMode": {
      "type": "string",
      "enum": [
        "FORCED_FULL_CRAWL",
        "FULL_CRAWL",
        "CHANGE_LOG"
      ]
    },
    "secretArn": {
      "type": "string"
    }
  },
  "version": {
    "type": "string",
    "anyOf": [
      {
        "pattern": "1.0.0"
      }
    ]
  },
  "required": [
      "connectionConfiguration",
      "repositoryConfigurations",
      "syncMode",
      "additionalProperties",
      "secretArn",
      "type"
  ]
}
```

## Aurora (PostgreSQL) template schema
<a name="ds-aurora-postgresql-schema"></a>

You include a JSON that contains the data source schema as part of the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html) object. Specify the type of data source as `JDBC`, the database type as `postgresql`, a secret for your authentication credentials, and other necessary configurations. You then specify `TEMPLATE` as the `Type` when you call [CreateDataSource](https://docs.aws.amazon.com/kendra/latest/APIReference/API_CreateDataSource.html).

You can use the template provided in this developer guide. See [Aurora (PostgreSQL) JSON schema](#aurora-postgresql-json).

The following table describes the parameters of the Aurora (PostgreSQL) JSON schema.


| Configuration | Description | 
| --- | --- | 
| connectionConfiguration | Configuration information for the endpoint for the data source. | 
| repositoryEndpointMetadata | Required configuration information for connecting your data source.[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html) | 
| repositoryConfigurations | Configuration information for the content of the data source. For example, configuring specific types of content and field mappings. Specify the type of data source and the secret ARN. | 
|  document  |  A list of objects that map the attributes or field names of your database content to Amazon Kendra index field names. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html). | 
| additionalProperties | Additional configuration options for your content in your data source. Use to include or exclude specific content in your database data source. | 
| primaryKey  | Provide the primary key for the database table. This identifies a table within your database. | 
| titleColumn | Provide the name of the document title column within your database table. | 
| bodyColumn | Provide the name of the document title column within your database table. | 
| sqlQuery | Enter SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB. Amazon Kendra will crawl all database content that matches your query. | 
| timestampColumn | Enter the name of the column which contains time stamps. Amazon Kendra uses time stamp information to detect changes in your content and sync only changed content. | 
| timestampFormat | Enter the name of the column which contains time stamp formats to use to detect content changes and re-sync your content. | 
| timezone | Enter the name of the column which contains time zones for the content to be crawled. | 
| changeDetectingColumns | Enter the names of the columns that Amazon Kendra will use to detect content changes. Amazon Kendra will re-index content when there is a change in any of these columns | 
| allowedUsersColumns | Enter the name of the column which contains User IDs to be allowed access to content. | 
| allowedGroupsColumn | Enter the name of the column which contains User IDs to be allowed access to content. | 
| sourceURIColumn | Enter the name of the column which contains Source URLs to be indexed. | 
| isSslEnabled | Enter SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB. Amazon Kendra will crawl all database content that matches your query. | 
| type | The type of data source. Specify JDBC as your data source type. | 
| syncMode |  Specify how Amazon Kendra should update your index when your data source content changes. You can choose between: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | 
| secretArn | The Amazon Resource Name (ARN) of a Secrets Manager secret that contains user name and password required to connect to your database. The secret must contain a JSON structure with the following keys: <pre>{<br />    "user name": "database user name",<br />    "password": "password"<br />}</pre> | 
| version | The version of the template that is currently supported. | 

### Aurora (PostgreSQL) JSON schema
<a name="aurora-postgresql-json"></a>

```
{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "type": "object",
  "properties": {
    "connectionConfiguration": {
      "type": "object",
      "properties": {
        "repositoryEndpointMetadata": {
          "type": "object",
          "properties": {
            "dbType": {
              "type": "string",
              "enum": [
                "mysql",
                "db2",
                "postgresql",
                "oracle",
                "sqlserver"
              ]
            },
            "dbHost": {
              "type": "string"
            },
            "dbPort": {
              "type": "string"
            },
            "dbInstance": {
              "type": "string"
            }
          },
          "required": [
            "dbType",
            "dbHost",
            "dbPort",
            "dbInstance"
          ]
        }
      },
      "required": [
        "repositoryEndpointMetadata"
      ]
    },
    "repositoryConfigurations": {
      "type": "object",
      "properties": {
        "document": {
          "type": "object",
          "properties": {
            "fieldMappings": {
              "type": "array",
              "items": [
                {
                  "type": "object",
                  "properties": {
                    "indexFieldName": {
                      "type": "string"
                    },
                    "indexFieldType": {
                      "type": "string"
                    },
                    "dataSourceFieldName": {
                      "type": "string"
                    }
                  },
                  "required": [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required": [
            "fieldMappings"
          ]
        }
      },
      "required": [
      ]
    },
    "additionalProperties": {
      "type": "object",
      "properties": {
        "primaryKey": {
          "type": "string"
        },
        "titleColumn": {
          "type": "string"
        },
        "bodyColumn": {
          "type": "string"
        },
        "sqlQuery": {
          "type": "string",
          "not": {
            "pattern": ";+"
          }
        },
        "timestampColumn": {
          "type": "string"
        },
        "timestampFormat": {
          "type": "string"
        },
        "timezone": {
          "type": "string"
        },
        "changeDetectingColumns": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "allowedUsersColumn": {
          "type": "string"
        },
        "allowedGroupsColumn": {
          "type": "string"
        },
        "sourceURIColumn": {
          "type": "string"
        },
        "isSslEnabled": {
          "type": "boolean"
        }
      },
      "required": ["primaryKey", "titleColumn", "bodyColumn", "sqlQuery"]
    },
    "type" : {
      "type" : "string",
      "pattern": "JDBC"
    },
    "syncMode": {
      "type": "string",
      "enum": [
        "FORCED_FULL_CRAWL",
        "FULL_CRAWL",
        "CHANGE_LOG"
      ]
    },
    "secretArn": {
      "type": "string"
    }
  },
  "version": {
    "type": "string",
    "anyOf": [
      {
        "pattern": "1.0.0"
      }
    ]
  },
  "required": [
      "connectionConfiguration",
      "repositoryConfigurations",
      "syncMode",
      "additionalProperties",
      "secretArn",
      "type"
  ]
}
```

## Amazon RDS (Microsoft SQL Server) template schema
<a name="ds-rds-ms-sql-server-schema"></a>

You include a JSON that contains the data source schema as part of the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html) object. Specify the type of data source as `JDBC`, the database type as `sqlserver`, a secret for your authentication credentials, and other necessary configurations. You then specify `TEMPLATE` as the `Type` when you call [CreateDataSource](https://docs.aws.amazon.com/kendra/latest/APIReference/API_CreateDataSource.html).

You can use the template provided in this developer guide. See [Amazon RDS (Microsoft SQL Server) JSON schema](#rds-ms-sql-server-json).

The following table describes the parameters of the Amazon RDS (Microsoft SQL Server) JSON schema.


| Configuration | Description | 
| --- | --- | 
| connectionConfiguration | Configuration information for the endpoint for the data source. | 
| repositoryEndpointMetadata | Required configuration information for connecting your data source.[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html) | 
| repositoryConfigurations | Configuration information for the content of the data source. For example, configuring specific types of content and field mappings. Specify the type of data source and the secret ARN. | 
|  document  |  A list of objects that map the attributes or field names of your database content to Amazon Kendra index field names. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html). | 
| additionalProperties | Additional configuration options for your content in your data source. Use to include or exclude specific content in your database data source. | 
| primaryKey  | Provide the primary key for the database table. This identifies a table within your database. | 
| titleColumn | Provide the name of the document title column within your database table. | 
| bodyColumn | Provide the name of the document title column within your database table. | 
| sqlQuery | Enter SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB. Amazon Kendra will crawl all database content that matches your query. | 
| timestampColumn | Enter the name of the column which contains time stamps. Amazon Kendra uses time stamp information to detect changes in your content and sync only changed content. | 
| timestampFormat | Enter the name of the column which contains time stamp formats to use to detect content changes and re-sync your content. | 
| timezone | Enter the name of the column which contains time zones for the content to be crawled. | 
| changeDetectingColumns | Enter the names of the columns that Amazon Kendra will use to detect content changes. Amazon Kendra will re-index content when there is a change in any of these columns | 
| allowedUsersColumns | Enter the name of the column which contains User IDs to be allowed access to content. | 
| allowedGroupsColumn | Enter the name of the column which contains User IDs to be allowed access to content. | 
| sourceURIColumn | Enter the name of the column which contains Source URLs to be indexed. | 
| isSslEnabled | Enter SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB. Amazon Kendra will crawl all database content that matches your query. | 
| type | The type of data source. Specify JDBC as your data source type. | 
| syncMode |  Specify how Amazon Kendra should update your index when your data source content changes. You can choose between: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | 
| secretArn | The Amazon Resource Name (ARN) of a Secrets Manager secret that contains user name and password required to connect to your database. The secret must contain a JSON structure with the following keys: <pre>{<br />    "user name": "database user name",<br />    "password": "password"<br />}</pre> | 
| version | The version of the template that is currently supported. | 

### Amazon RDS (Microsoft SQL Server) JSON schema
<a name="rds-ms-sql-server-json"></a>

```
{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "type": "object",
  "properties": {
    "connectionConfiguration": {
      "type": "object",
      "properties": {
        "repositoryEndpointMetadata": {
          "type": "object",
          "properties": {
            "dbType": {
              "type": "string",
              "enum": [
                "mysql",
                "db2",
                "postgresql",
                "oracle",
                "sqlserver"
              ]
            },
            "dbHost": {
              "type": "string"
            },
            "dbPort": {
              "type": "string"
            },
            "dbInstance": {
              "type": "string"
            }
          },
          "required": [
            "dbType",
            "dbHost",
            "dbPort",
            "dbInstance"
          ]
        }
      },
      "required": [
        "repositoryEndpointMetadata"
      ]
    },
    "repositoryConfigurations": {
      "type": "object",
      "properties": {
        "document": {
          "type": "object",
          "properties": {
            "fieldMappings": {
              "type": "array",
              "items": [
                {
                  "type": "object",
                  "properties": {
                    "indexFieldName": {
                      "type": "string"
                    },
                    "indexFieldType": {
                      "type": "string"
                    },
                    "dataSourceFieldName": {
                      "type": "string"
                    }
                  },
                  "required": [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required": [
            "fieldMappings"
          ]
        }
      },
      "required": [
      ]
    },
    "additionalProperties": {
      "type": "object",
      "properties": {
        "primaryKey": {
          "type": "string"
        },
        "titleColumn": {
          "type": "string"
        },
        "bodyColumn": {
          "type": "string"
        },
        "sqlQuery": {
          "type": "string",
          "not": {
            "pattern": ";+"
          }
        },
        "timestampColumn": {
          "type": "string"
        },
        "timestampFormat": {
          "type": "string"
        },
        "timezone": {
          "type": "string"
        },
        "changeDetectingColumns": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "allowedUsersColumn": {
          "type": "string"
        },
        "allowedGroupsColumn": {
          "type": "string"
        },
        "sourceURIColumn": {
          "type": "string"
        },
        "isSslEnabled": {
          "type": "boolean"
        }
      },
      "required": ["primaryKey", "titleColumn", "bodyColumn", "sqlQuery"]
    },
    "type" : {
      "type" : "string",
      "pattern": "JDBC"
    },
    "syncMode": {
      "type": "string",
      "enum": [
        "FORCED_FULL_CRAWL",
        "FULL_CRAWL",
        "CHANGE_LOG"
      ]
    },
    "secretArn": {
      "type": "string"
    }
  },
  "version": {
    "type": "string",
    "anyOf": [
      {
        "pattern": "1.0.0"
      }
    ]
  },
  "required": [
      "connectionConfiguration",
      "repositoryConfigurations",
      "syncMode",
      "additionalProperties",
      "secretArn",
      "type"
  ]
}
```

## Amazon RDS (MySQL) template schema
<a name="ds-rds-mysql-schema"></a>

You include a JSON that contains the data source schema as part of the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html) object. Specify the type of data source as `JDBC`, the database type as `mysql`, a secret for your authentication credentials, and other necessary configurations. You then specify `TEMPLATE` as the `Type` when you call [CreateDataSource](https://docs.aws.amazon.com/kendra/latest/APIReference/API_CreateDataSource.html).

You can use the template provided in this developer guide. See [Amazon RDS (MySQL) JSON schema](#rds-mysql-json).

The following table describes the parameters of the Amazon RDS (MySQL) JSON schema.


| Configuration | Description | 
| --- | --- | 
| connectionConfiguration | Configuration information for the endpoint for the data source. | 
| repositoryEndpointMetadata | Required configuration information for connecting your data source.[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html) | 
| repositoryConfigurations | Configuration information for the content of the data source. For example, configuring specific types of content and field mappings. Specify the type of data source and the secret ARN. | 
|  document  |  A list of objects that map the attributes or field names of your database content to Amazon Kendra index field names. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html). | 
| additionalProperties | Additional configuration options for your content in your data source. Use to include or exclude specific content in your database data source. | 
| primaryKey  | Provide the primary key for the database table. This identifies a table within your database. | 
| titleColumn | Provide the name of the document title column within your database table. | 
| bodyColumn | Provide the name of the document title column within your database table. | 
| sqlQuery | Enter SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB. Amazon Kendra will crawl all database content that matches your query. | 
| timestampColumn | Enter the name of the column which contains time stamps. Amazon Kendra uses time stamp information to detect changes in your content and sync only changed content. | 
| timestampFormat | Enter the name of the column which contains time stamp formats to use to detect content changes and re-sync your content. | 
| timezone | Enter the name of the column which contains time zones for the content to be crawled. | 
| changeDetectingColumns | Enter the names of the columns that Amazon Kendra will use to detect content changes. Amazon Kendra will re-index content when there is a change in any of these columns | 
| allowedUsersColumns | Enter the name of the column which contains User IDs to be allowed access to content. | 
| allowedGroupsColumn | Enter the name of the column which contains User IDs to be allowed access to content. | 
| sourceURIColumn | Enter the name of the column which contains Source URLs to be indexed. | 
| isSslEnabled | Enter SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB. Amazon Kendra will crawl all database content that matches your query. | 
| type | The type of data source. Specify JDBC as your data source type. | 
| syncMode |  Specify how Amazon Kendra should update your index when your data source content changes. You can choose between: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | 
| secretArn | The Amazon Resource Name (ARN) of a Secrets Manager secret that contains user name and password required to connect to your database. The secret must contain a JSON structure with the following keys: <pre>{<br />    "user name": "database user name",<br />    "password": "password"<br />}</pre> | 
| version | The version of the template that is currently supported. | 

### Amazon RDS (MySQL) JSON schema
<a name="rds-mysql-json"></a>

```
{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "type": "object",
  "properties": {
    "connectionConfiguration": {
      "type": "object",
      "properties": {
        "repositoryEndpointMetadata": {
          "type": "object",
          "properties": {
            "dbType": {
              "type": "string",
              "enum": [
                "mysql",
                "db2",
                "postgresql",
                "oracle",
                "sqlserver"
              ]
            },
            "dbHost": {
              "type": "string"
            },
            "dbPort": {
              "type": "string"
            },
            "dbInstance": {
              "type": "string"
            }
          },
          "required": [
            "dbType",
            "dbHost",
            "dbPort",
            "dbInstance"
          ]
        }
      },
      "required": [
        "repositoryEndpointMetadata"
      ]
    },
    "repositoryConfigurations": {
      "type": "object",
      "properties": {
        "document": {
          "type": "object",
          "properties": {
            "fieldMappings": {
              "type": "array",
              "items": [
                {
                  "type": "object",
                  "properties": {
                    "indexFieldName": {
                      "type": "string"
                    },
                    "indexFieldType": {
                      "type": "string"
                    },
                    "dataSourceFieldName": {
                      "type": "string"
                    }
                  },
                  "required": [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required": [
            "fieldMappings"
          ]
        }
      },
      "required": [
      ]
    },
    "additionalProperties": {
      "type": "object",
      "properties": {
        "primaryKey": {
          "type": "string"
        },
        "titleColumn": {
          "type": "string"
        },
        "bodyColumn": {
          "type": "string"
        },
        "sqlQuery": {
          "type": "string",
          "not": {
            "pattern": ";+"
          }
        },
        "timestampColumn": {
          "type": "string"
        },
        "timestampFormat": {
          "type": "string"
        },
        "timezone": {
          "type": "string"
        },
        "changeDetectingColumns": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "allowedUsersColumn": {
          "type": "string"
        },
        "allowedGroupsColumn": {
          "type": "string"
        },
        "sourceURIColumn": {
          "type": "string"
        },
        "isSslEnabled": {
          "type": "boolean"
        }
      },
      "required": ["primaryKey", "titleColumn", "bodyColumn", "sqlQuery"]
    },
    "type" : {
      "type" : "string",
      "pattern": "JDBC"
    },
    "syncMode": {
      "type": "string",
      "enum": [
        "FORCED_FULL_CRAWL",
        "FULL_CRAWL",
        "CHANGE_LOG"
      ]
    },
    "secretArn": {
      "type": "string"
    }
  },
  "version": {
    "type": "string",
    "anyOf": [
      {
        "pattern": "1.0.0"
      }
    ]
  },
  "required": [
      "connectionConfiguration",
      "repositoryConfigurations",
      "syncMode",
      "additionalProperties",
      "secretArn",
      "type"
  ]
}
```

## Amazon RDS (Oracle) template schema
<a name="ds-rds-oracle-schema"></a>

You include a JSON that contains the data source schema as part of the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html) object. Specify the type of data source as `JDBC`, the database type as `oracle`, a secret for your authentication credentials, and other necessary configurations. You then specify `TEMPLATE` as the `Type` when you call [CreateDataSource](https://docs.aws.amazon.com/kendra/latest/APIReference/API_CreateDataSource.html).

You can use the template provided in this developer guide. See [Amazon RDS (Oracle) JSON schema](#rds-oracle-json).

The following table describes the parameters of the Amazon RDS (Oracle) JSON schema.


| Configuration | Description | 
| --- | --- | 
| connectionConfiguration | Configuration information for the endpoint for the data source. | 
| repositoryEndpointMetadata | Required configuration information for connecting your data source.[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html) | 
| repositoryConfigurations | Configuration information for the content of the data source. For example, configuring specific types of content and field mappings. Specify the type of data source and the secret ARN. | 
|  document  |  A list of objects that map the attributes or field names of your database content to Amazon Kendra index field names. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html). | 
| additionalProperties | Additional configuration options for your content in your data source. Use to include or exclude specific content in your database data source. | 
| primaryKey  | Provide the primary key for the database table. This identifies a table within your database. | 
| titleColumn | Provide the name of the document title column within your database table. | 
| bodyColumn | Provide the name of the document title column within your database table. | 
| sqlQuery | Enter SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB. Amazon Kendra will crawl all database content that matches your query. | 
| timestampColumn | Enter the name of the column which contains time stamps. Amazon Kendra uses time stamp information to detect changes in your content and sync only changed content. | 
| timestampFormat | Enter the name of the column which contains time stamp formats to use to detect content changes and re-sync your content. | 
| timezone | Enter the name of the column which contains time zones for the content to be crawled. | 
| changeDetectingColumns | Enter the names of the columns that Amazon Kendra will use to detect content changes. Amazon Kendra will re-index content when there is a change in any of these columns | 
| allowedUsersColumns | Enter the name of the column which contains User IDs to be allowed access to content. | 
| allowedGroupsColumn | Enter the name of the column which contains User IDs to be allowed access to content. | 
| sourceURIColumn | Enter the name of the column which contains Source URLs to be indexed. | 
| isSslEnabled | Enter SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB. Amazon Kendra will crawl all database content that matches your query. | 
| type | The type of data source. Specify JDBC as your data source type. | 
| syncMode |  Specify how Amazon Kendra should update your index when your data source content changes. You can choose between: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | 
| secretArn | The Amazon Resource Name (ARN) of a Secrets Manager secret that contains user name and password required to connect to your database. The secret must contain a JSON structure with the following keys: <pre>{<br />    "user name": "database user name",<br />    "password": "password"<br />}</pre> | 
| version | The version of the template that is currently supported. | 

### Amazon RDS (Oracle) JSON schema
<a name="rds-oracle-json"></a>

```
{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "type": "object",
  "properties": {
    "connectionConfiguration": {
      "type": "object",
      "properties": {
        "repositoryEndpointMetadata": {
          "type": "object",
          "properties": {
            "dbType": {
              "type": "string",
              "enum": [
                "mysql",
                "db2",
                "postgresql",
                "oracle",
                "sqlserver"
              ]
            },
            "dbHost": {
              "type": "string"
            },
            "dbPort": {
              "type": "string"
            },
            "dbInstance": {
              "type": "string"
            }
          },
          "required": [
            "dbType",
            "dbHost",
            "dbPort",
            "dbInstance"
          ]
        }
      },
      "required": [
        "repositoryEndpointMetadata"
      ]
    },
    "repositoryConfigurations": {
      "type": "object",
      "properties": {
        "document": {
          "type": "object",
          "properties": {
            "fieldMappings": {
              "type": "array",
              "items": [
                {
                  "type": "object",
                  "properties": {
                    "indexFieldName": {
                      "type": "string"
                    },
                    "indexFieldType": {
                      "type": "string"
                    },
                    "dataSourceFieldName": {
                      "type": "string"
                    }
                  },
                  "required": [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required": [
            "fieldMappings"
          ]
        }
      },
      "required": [
      ]
    },
    "additionalProperties": {
      "type": "object",
      "properties": {
        "primaryKey": {
          "type": "string"
        },
        "titleColumn": {
          "type": "string"
        },
        "bodyColumn": {
          "type": "string"
        },
        "sqlQuery": {
          "type": "string",
          "not": {
            "pattern": ";+"
          }
        },
        "timestampColumn": {
          "type": "string"
        },
        "timestampFormat": {
          "type": "string"
        },
        "timezone": {
          "type": "string"
        },
        "changeDetectingColumns": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "allowedUsersColumn": {
          "type": "string"
        },
        "allowedGroupsColumn": {
          "type": "string"
        },
        "sourceURIColumn": {
          "type": "string"
        },
        "isSslEnabled": {
          "type": "boolean"
        }
      },
      "required": ["primaryKey", "titleColumn", "bodyColumn", "sqlQuery"]
    },
    "type" : {
      "type" : "string",
      "pattern": "JDBC"
    },
    "syncMode": {
      "type": "string",
      "enum": [
        "FORCED_FULL_CRAWL",
        "FULL_CRAWL",
        "CHANGE_LOG"
      ]
    },
    "secretArn": {
      "type": "string"
    }
  },
  "version": {
    "type": "string",
    "anyOf": [
      {
        "pattern": "1.0.0"
      }
    ]
  },
  "required": [
      "connectionConfiguration",
      "repositoryConfigurations",
      "syncMode",
      "additionalProperties",
      "secretArn",
      "type"
  ]
}
```

## Amazon RDS (PostgreSQL) template schema
<a name="ds-rds-postgresql-schema"></a>

You include a JSON that contains the data source schema as part of the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html) object. Specify the type of data source as `JDBC`, the database type as `postgresql`, a secret for your authentication credentials, and other necessary configurations. You then specify `TEMPLATE` as the `Type` when you call [CreateDataSource](https://docs.aws.amazon.com/kendra/latest/APIReference/API_CreateDataSource.html).

You can use the template provided in this developer guide. See [Amazon RDS (PostgreSQL) JSON schema](#rds-postgresql-json).

The following table describes the parameters of the Amazon RDS (PostgreSQL) JSON schema.


| Configuration | Description | 
| --- | --- | 
| connectionConfiguration | Configuration information for the endpoint for the data source. | 
| repositoryEndpointMetadata | Required configuration information for connecting your data source.[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html) | 
| repositoryConfigurations | Configuration information for the content of the data source. For example, configuring specific types of content and field mappings. Specify the type of data source and the secret ARN. | 
|  document  |  A list of objects that map the attributes or field names of your database content to Amazon Kendra index field names. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html). | 
| additionalProperties | Additional configuration options for your content in your data source. Use to include or exclude specific content in your database data source. | 
| primaryKey  | Provide the primary key for the database table. This identifies a table within your database. | 
| titleColumn | Provide the name of the document title column within your database table. | 
| bodyColumn | Provide the name of the document title column within your database table. | 
| sqlQuery | Enter SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB. Amazon Kendra will crawl all database content that matches your query. | 
| timestampColumn | Enter the name of the column which contains time stamps. Amazon Kendra uses time stamp information to detect changes in your content and sync only changed content. | 
| timestampFormat | Enter the name of the column which contains time stamp formats to use to detect content changes and re-sync your content. | 
| timezone | Enter the name of the column which contains time zones for the content to be crawled. | 
| changeDetectingColumns | Enter the names of the columns that Amazon Kendra will use to detect content changes. Amazon Kendra will re-index content when there is a change in any of these columns | 
| allowedUsersColumns | Enter the name of the column which contains User IDs to be allowed access to content. | 
| allowedGroupsColumn | Enter the name of the column which contains User IDs to be allowed access to content. | 
| sourceURIColumn | Enter the name of the column which contains Source URLs to be indexed. | 
| isSslEnabled | Enter SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB. Amazon Kendra will crawl all database content that matches your query. | 
| type | The type of data source. Specify JDBC as your data source type. | 
| syncMode |  Specify how Amazon Kendra should update your index when your data source content changes. You can choose between: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | 
| secretArn | The Amazon Resource Name (ARN) of a Secrets Manager secret that contains user name and password required to connect to your database. The secret must contain a JSON structure with the following keys: <pre>{<br />    "user name": "database user name",<br />    "password": "password"<br />}</pre> | 
| version | The version of the template that is currently supported. | 

### Amazon RDS (PostgreSQL) JSON schema
<a name="rds-postgresql-json"></a>

```
{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "type": "object",
  "properties": {
    "connectionConfiguration": {
      "type": "object",
      "properties": {
        "repositoryEndpointMetadata": {
          "type": "object",
          "properties": {
            "dbType": {
              "type": "string",
              "enum": [
                "mysql",
                "db2",
                "postgresql",
                "oracle",
                "sqlserver"
              ]
            },
            "dbHost": {
              "type": "string"
            },
            "dbPort": {
              "type": "string"
            },
            "dbInstance": {
              "type": "string"
            }
          },
          "required": [
            "dbType",
            "dbHost",
            "dbPort",
            "dbInstance"
          ]
        }
      },
      "required": [
        "repositoryEndpointMetadata"
      ]
    },
    "repositoryConfigurations": {
      "type": "object",
      "properties": {
        "document": {
          "type": "object",
          "properties": {
            "fieldMappings": {
              "type": "array",
              "items": [
                {
                  "type": "object",
                  "properties": {
                    "indexFieldName": {
                      "type": "string"
                    },
                    "indexFieldType": {
                      "type": "string"
                    },
                    "dataSourceFieldName": {
                      "type": "string"
                    }
                  },
                  "required": [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required": [
            "fieldMappings"
          ]
        }
      },
      "required": [
      ]
    },
    "additionalProperties": {
      "type": "object",
      "properties": {
        "primaryKey": {
          "type": "string"
        },
        "titleColumn": {
          "type": "string"
        },
        "bodyColumn": {
          "type": "string"
        },
        "sqlQuery": {
          "type": "string",
          "not": {
            "pattern": ";+"
          }
        },
        "timestampColumn": {
          "type": "string"
        },
        "timestampFormat": {
          "type": "string"
        },
        "timezone": {
          "type": "string"
        },
        "changeDetectingColumns": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "allowedUsersColumn": {
          "type": "string"
        },
        "allowedGroupsColumn": {
          "type": "string"
        },
        "sourceURIColumn": {
          "type": "string"
        },
        "isSslEnabled": {
          "type": "boolean"
        }
      },
      "required": ["primaryKey", "titleColumn", "bodyColumn", "sqlQuery"]
    },
    "type" : {
      "type" : "string",
      "pattern": "JDBC"
    },
    "syncMode": {
      "type": "string",
      "enum": [
        "FORCED_FULL_CRAWL",
        "FULL_CRAWL",
        "CHANGE_LOG"
      ]
    },
    "secretArn": {
      "type": "string"
    }
  },
  "version": {
    "type": "string",
    "anyOf": [
      {
        "pattern": "1.0.0"
      }
    ]
  },
  "required": [
      "connectionConfiguration",
      "repositoryConfigurations",
      "syncMode",
      "additionalProperties",
      "secretArn",
      "type"
  ]
}
```

## Amazon S3 template schema
<a name="ds-s3-schema"></a>

You include a JSON that contains the data source schema as part of the template configuration. You provide the name of the S3 bucket as a part of the connection configuration or repository endpoint details. Also specify the type of data source as `S3`, and other necessary configurations. You then specify `TEMPLATE` as the `Type` when you call [CreateDataSource](https://docs.aws.amazon.com/kendra/latest/APIReference/API_CreateDataSource.html).

You can use the template provided in this developer guide. See [S3 JSON schema](#s3-json).

The following table describes the parameters of the Amazon S3 JSON schema.


| Configuration | Description | 
| --- | --- | 
| connectionConfiguration | Configuration information for the endpoint for the data source. | 
| repositoryEndpointMetadata | The endpoint information for the data source. | 
| BucketName | The name of your Amazon S3 bucket. | 
| repositoryConfigurations | Configuration information for the content of the data source. For example, configuring specific types of content and field mappings. | 
| additionalProperties | Additional configuration options for your content in your data source | 
| [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html) | A list of regular expression patterns to include or exclude specific files in your Amazon S3 data source. Files that match the patterns are included in the index. Files that don't match the patterns are excluded from the index. If a file matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence and the file isn't included in the index. | 
| aclConfigurationFilePath | The file path that controls access to documents in an Amazon Kendra index. | 
| metadataFilesPrefix | The location within your bucket for metadata files. | 
| syncMode |  Specify how Amazon Kendra should update your index when your data source content changes. You can choose between: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | 
| type | The type of data source. Specify S3 as your data source type. | 
| version | The version of the template that is supported. | 

### S3 JSON schema
<a name="s3-json"></a>

```
{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "type": "object",
  "properties": {
    "connectionConfiguration": {
      "type": "object",
      "properties": {
        "repositoryEndpointMetadata": {
          "type": "object",
          "properties": {
            "BucketName": {
              "type": "string"
            }
          },
          "required": [
            "BucketName"
          ]
        }
      },
      "required": [
        "repositoryEndpointMetadata"
      ]
    },
    "repositoryConfigurations": {
      "type": "object",
      "properties": {
        "document": {
          "type": "object",
          "properties": {
            "fieldMappings": {
              "type": "array",
              "items": [
                {
                  "type": "object",
                  "properties": {
                    "indexFieldName": {
                      "type": "string"
                    },
                    "indexFieldType": {
                      "type": "string",
                      "enum": [
                        "STRING"
                      ]
                    },
                    "dataSourceFieldName": {
                      "type": "string"
                    }
                  },
                  "required": [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required": [
            "fieldMappings"
          ]
        }
      },
      "required": [
        "document"
      ]
    },
    "additionalProperties": {
      "type": "object",
      "properties": {
        "inclusionPatterns": {
          "type": "array"
        },
        "exclusionPatterns": {
          "type": "array"
        },
        "inclusionPrefixes": {
          "type": "array"
        },
        "exclusionPrefixes": {
          "type": "array"
        },
        "aclConfigurationFilePath": {
          "type": "string"
        },
        "metadataFilesPrefix": {
          "type": "string"
        }
      }
    },
    "syncMode": {
      "type": "string",
      "enum": [
        "FULL_CRAWL",
        "FORCED_FULL_CRAWL"
      ]
    },
    "type": {
      "type": "string",
      "pattern": "S3"
    },
    "version": {
      "type": "string",
      "anyOf": [
        {
          "pattern": "1.0.0"
        }
      ]
    }
  },
  "required": [
    "connectionConfiguration",
    "type",
    "syncMode",
    "repositoryConfigurations"
  ]
}
```

## Amazon Kendra Web Crawler template schema
<a name="ds-schema-web-crawler"></a>

You include a JSON that contains the data source schema as part of the [https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) object.

You provide the seed or starting point URLs, or you can provide the sitemap URLs, as part of the connection configuration or repository endpoint details. Instead of manually listing all your URLs, you can provide the path to the Amazon S3 bucket that stores a text file for your list of seed URLs or sitemap XML files, which you can club together in a ZIP file in S3.

You also specify the type of data source as `WEBCRAWLERV2`, the website authentication credentials and authentication type if your websites require authentication, and other necessary configurations.

You then specify `TEMPLATE` as the `Type` when you call [https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html](https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html).

**Important**  
Web Crawler v2.0 connector creation is not supported by CloudFormation. Use the Web Crawler v1.0 connector if you need CloudFormation support.

*When selecting websites to index, you must adhere to the [Amazon Acceptable Use Policy](https://aws.amazon.com/aup/) and all other Amazon terms. Remember that you must only use Amazon Kendra Web Crawler to index your own web pages, or web pages that you have authorization to index. To learn how to stop Amazon Kendra Web Crawler from indexing your websites, see [Configuring the `robots.txt` file for Amazon Kendra Web Crawler](stop-web-crawler.md).*

You can use the template provided in this developer guide. See [Amazon Kendra Web Crawler JSON schema](#web-crawler-json).

The following table describes the parameters of the Amazon Kendra Web Crawler JSON schema.


| Configuration | Description | 
| --- | --- | 
| connectionConfiguration | Configuration information for the endpoint for the data source. | 
| repositoryEndpointMetadata | The endpoint information for the data source. | 
| siteMapUrls | The list of sitemap URLs for the websites that you want to crawl. You can list up to three sitemap URLs. | 
| s3SeedUrl | The S3 path to the text file that stores the list of seed or starting point URLs. For example, s3://bucket-name/directory/. Each URL in the text file must be formatted on a separate line. You can list up to 100 seed URLs in a file. | 
| s3SiteMapUrl | The S3 path to the sitemap XML files. For example, s3://bucket-name/directory/. You can list up to three sitemap XML files. You can club together multiple sitemap files into a ZIP file and store the ZIP file in your Amazon S3 bucket. | 
| seedUrlConnections | The list of seed or starting point URLs for the websites that you want to crawl.You can list up to 100 seed URLs. | 
| seedUrl | The seed or starting point URL. | 
| authentication | The authentication type if your websites require the same authentication, otherwise specify NoAuthentication. | 
| repositoryConfigurations | Configuration information for the content of the data source. For example, configuring specific types of content and field mappings. | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | A list of objects that map the attributes or field names of your web pages and web page files to Amazon Kendra index field names. For example, the HTML web page title tag can be mapped to the \$1document\$1title index field. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html). | 
| syncMode |  Specify how Amazon Kendra should update your index when your data source content changes. You can choose between: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | 
| additionalProperties | Additional configuration options for your content in your data source. | 
| rateLimit | The maximum number of URLs crawled per website host per minute. | 
| maxFileSize | The maximum size (in MB) of a web page or attachment to crawl. | 
| crawlDepth | The number of levels from the seed URL to crawl. For example, the seed URL page is depth 1 and any hyperlinks on this page that are also crawled are depth 2. | 
| maxLinksPerUrl | The maximum number of URLs on a web page to include when crawling a website. This number is per web page. As a website's web pages are crawled, any URLs that the webpages link to also are crawled. URLs on a web page are crawled in order of appearance. | 
| crawlSubDomain | true to crawl the website domains with subdomains. For example, if the seed URL is "abc.example.com", then "a.abc.example.com" and "b.abc.example.com" are also crawled. If you don't set crawlSubDomain or crawlAllDomain to true, then Amazon Kendra only crawls the domains of the websites that you want to crawl. | 
| crawlAllDomain | true to crawl the website domains with subdomains and other domains the web pages link to. If you don't set crawlSubDomain or crawlAllDomain to true, then Amazon Kendra only crawls the domains of the websites that you want to crawl. | 
| honorRobots | true to respect the robots.txt directives of the websites that you want to crawl. These directives control how Amazon Kendra Web Crawler crawls the websites, whether Amazon Kendra can crawl only specific content or not crawl any content. | 
| crawlAttachments | true to crawl files that the web pages link to. | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | A list of regular expression patterns to include crawling certain URLs and indexing any hyperlinks on these URL web pages. URLs that match the patterns are included in the index. URLs that don't match the patterns are excluded from the index. If a URL matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence, and the URL/website's web pages aren't included in the index. | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | A list of regular expression patterns to exclude crawling certain URLs and indexing any hyperlinks on these URL web pages. URLs that match the patterns are excluded from the index. URLs that don't match the patterns are included in the index. If a URL matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence, and the URL/website's web pages aren't included in the index. | 
| inclusionFileIndexPatterns | A list of regular expression patterns to include certain web page files. Files that match the patterns are included in the index. Files that don't match the patterns are excluded from the index. If a file matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence, and the file isn't included in the index. | 
| exclusionFileIndexPatterns | A list of regular expression patterns to exclude certain web page files. Files that match the patterns are excluded from the index. Files that don't match the patterns are included in the index. If a file matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence, and the file isn't included in the index. | 
| implicitWaitDuration |  implicitWaitDuration specifies how long the connector will wait, in seconds, before crawling a webpage. Range: 0-10  eg. "implicitWaitDuration": "5"  | 
| proxy | Configuration information required to connect to your internal websites via a web proxy. | 
| host | The host name of the proxy sever you want to use to connect to internal websites. For example, the host name of https://a.example.com/page1.html is "a.example.com". | 
| port | The port number of the proxy sever you want to use to connect to internal websites. For example, 443 is the standard port for HTTPS. | 
| secretArn (proxy) | If web proxy credentials are required to connect to a website host, you can create an AWS Secrets Manager secret that stores the credentials. Provide the Amazon Resource Name (ARN) of the secret. | 
| type | The type of data source. Specify WEBCRAWLERV2 as your data source type. | 
| secretArn |  The Amazon Resource Name (ARN) of an AWS Secrets Manager secret that's used if your websites require authentication to access the websites. You store the authentication credentials for the website in the secret that contains JSON key-value pairs. If you use basic, or NTML/Kerberos, enter the user name and password. The JSON keys in the secret must be `userName` and `password`. NTLM authentication protocol includes password hashing, and Kerberos authentication protocol includes password encryption. If you use SAML or form authentication, enter the user name and password, XPath for the user name field (and user name button if using SAML), XPaths for the password field and button, and the login page URL. The JSON keys in the secret must be `userName`, `password`, `userNameFieldXpath`, `userNameButtonXpath`, `passwordFieldXpath`, `passwordButtonXpath`, and `loginPageUrl`. You can find the XPaths (XML Path Language) of elements using your web browser's developer tools. XPaths usually follow this format: `//tagname[@Attribute='Value']`. Amazon Kendra also checks if the endpoint information (seed URLs) included in the secret is the same the endpoint information specified in your data source endpoint configuration details.  | 
| version | The version of this template that is currently supported. | 

### Amazon Kendra Web Crawler JSON schema
<a name="web-crawler-json"></a>

```
{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "type": "object",
  "properties": {
    "connectionConfiguration": {
      "type": "object",
      "properties": {
        "repositoryEndpointMetadata": {
          "type": "object",
          "properties": {
            "siteMapUrls": {
              "type": "array",
              "items":{
                "type": "string",
                "pattern": "https://.*"
              }
            },
            "s3SeedUrl": {
              "type": "string",
              "pattern": "s3:.*"
            },
            "s3SiteMapUrl": {
              "type": "string",
              "pattern": "s3:.*"
            },
            "seedUrlConnections": {
              "type": "array",
              "items": [
                {
                  "type": "object",
                  "properties": {
                    "seedUrl":{
                      "type": "string",
                      "pattern": "https://.*"
                    }
                  },
                  "required": [
                    "seedUrl"
                  ]
                }
              ]
            },
            "authentication": {
              "type": "string",
              "enum": [
                "NoAuthentication",
                "BasicAuth",
                "NTLM_Kerberos",
                "Form",
                "SAML"
              ]
            }
          }
        }
      },
      "required": [
        "repositoryEndpointMetadata"
      ]
    },
    "repositoryConfigurations": {
      "type": "object",
      "properties": {
        "webPage": {
          "type": "object",
          "properties": {
            "fieldMappings": {
              "type": "array",
              "items": [
                {
                  "type": "object",
                  "properties": {
                    "indexFieldName": {
                      "type": "string"
                    },
                    "indexFieldType": {
                      "type": "string",
                      "enum": [
                        "STRING",
                        "DATE",
                        "LONG"
                      ]
                    },
                    "dataSourceFieldName": {
                      "type": "string"
                    },
                    "dateFieldFormat": {
                      "type": "string",
                      "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                    }
                  },
                  "required": [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required": [
            "fieldMappings"
          ]
        },
        "attachment": {
          "type": "object",
          "properties": {
            "fieldMappings": {
              "type": "array",
              "items": [
                {
                  "type": "object",
                  "properties": {
                    "indexFieldName": {
                      "type": "string"
                    },
                    "indexFieldType": {
                      "type": "string",
                      "enum": [
                        "STRING",
                        "DATE",
                        "LONG"
                      ]
                    },
                    "dataSourceFieldName": {
                      "type": "string"
                    },
                    "dateFieldFormat": {
                      "type": "string",
                      "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                    }
                  },
                  "required": [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required": [
            "fieldMappings"
          ]
        }
      }
    },
    "syncMode": {
      "type": "string",
      "enum": [
        "FORCED_FULL_CRAWL",
        "FULL_CRAWL"
      ]
    },
    "additionalProperties": {
      "type": "object",
      "properties": {
        "rateLimit": {
          "type": "string",
          "default": "300"
        },
        "maxFileSize": {
          "type": "string",
          "default": "50"
        },
        "crawlDepth": {
          "type": "string",
          "default": "2"
        },
        "maxLinksPerUrl": {
          "type": "string",
          "default": "100"
        },
        "crawlSubDomain": {
          "type": "boolean",
          "default": false
        },
        "crawlAllDomain": {
          "type": "boolean",
          "default": false
        },
        "honorRobots": {
          "type": "boolean",
          "default": false
        },
        "crawlAttachments": {
          "type": "boolean",
          "default": false
        },
        "inclusionURLCrawlPatterns": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "exclusionURLCrawlPatterns": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "inclusionURLIndexPatterns": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "exclusionURLIndexPatterns": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "inclusionFileIndexPatterns": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "exclusionFileIndexPatterns": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "proxy": {
          "type": "object",
          "properties": {
            "host": {
              "type": "string"
            },
            "port": {
              "type": "string"
            },
            "secretArn": {
              "type": "string",
              "minLength": 20,
              "maxLength": 2048
            }
          }
        }
      },
      "implicitWaitDuration":  {
          "type":"object",
          "properties": {
            "innerNumber" : {
              "type": "number",
              "minimum": 0,
              "maximum": 10
            }
          }
        },
      "required": [
        "rateLimit",
        "maxFileSize",
        "crawlDepth",
        "crawlSubDomain",
        "crawlAllDomain",
        "maxLinksPerUrl",
        "honorRobots"
      ]
    },
    "type": {
      "type": "string",
      "pattern": "WEBCRAWLERV2"
    },
    "secretArn": {
      "type": "string",
      "minLength": 20,
      "maxLength": 2048
    }
  },
  "version": {
    "type": "string",
    "anyOf": [
      {
        "pattern": "1.0.0"
      }
    ]
  },
  "required": [
    "connectionConfiguration",
    "repositoryConfigurations",
    "syncMode",
    "type",
    "additionalProperties"
  ]
}
```

## Confluence template schema
<a name="ds-confluence-schema"></a>

You include a JSON that contains the data source schema as part of the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html) object. You provide the Confluence host URL, the hosting method, and the authentication type as a part of the connection configuration or repository endpoint details. Also specify the type of data source as `CONFLUENCEV2`, a secret for your authentication credentials, and other necessary configurations. You then specify `TEMPLATE` as the `Type` when you call [CreateDataSource](https://docs.aws.amazon.com/kendra/latest/APIReference/API_CreateDataSource.html).

You can use the template provided in this developer guide. See [Confluence JSON schema](#confluence-json).

The following table describes the parameters of the Confluence JSON schema.


| Configuration | Description | 
| --- | --- | 
| connectionConfiguration | Configuration information for the endpoint for the data source. | 
| repositoryEndpointMetadata | The endpoint information for the data source. | 
| hostUrl | The URL for your Confluence instance. For example, https://example.confluence.com. | 
| type | The hosting method for your Confluence instance, whether SAAS and ON\$1PREM. | 
| authType | The authentication method for your Confluence instance, whether Basic, OAuth2, or Personal-token. | 
| repositoryConfigurations | Configuration information for the content of the data source. For example, configuring specific types of content and field mappings. | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | A list of objects that map the attributes or field names of your Confluence spaces, pages, blogs, comments, and attachments to Amazon Kendra index field names. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html). The Confluence data source field names must exist in your Confluence custom metadata. | 
| additionalProperties | Additional configuration options for your content in your data source. | 
| isCrawlAcl | Configure true to crawl the access control list (ACL) information for your documents, if you have an ACL and want to use it for access control. Note that the ACL specifies which documents that users and groups can access. The ACL information is used to filter search results based on the user or their group access to documents. This means that if isCrawlACL is turned off, documents can be publicly searched. For more information, see [User context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources). | 
| fieldForUserId | Specify email if you want to use the user email for the user ID. email is used by default and is currently the only supported user ID type. | 
| [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html) | A list of regular expression patterns to include and/or exclude certain files in your Confluence data source. Files that match the patterns are included in the index. Files that don't match the patterns are excluded from the index. If a file matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence and the file isn't included in the index. | 
| proxyHost | The host name of the web proxy that you use, without the http:// or https:// protocol. | 
|  proxyPort  | The port number used by the host URL transport protocol. Must be a numeric value between 0 and 65535. | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | true to crawl files in your Confluence personal spaces, pages, blogs, page comments, page attachments, blog comments, and blog attachments. | 
| maxFileSizeInMegaBytes | Specify the file size limit in MBs that Amazon Kendra can crawl. Amazon Kendra crawls only the files within the size limit you define. The default file size is 50MB. The maximum file size should be greater than 0MB and less than or equal to 50MB. | 
| type | The type of data source. Specify CONFLUENCEV2 as your data source type. | 
| enableIdentityCrawler | true to use Amazon Kendra's identity crawler to sync identity/principal information on users and groups with access to certain documents. If identity crawler is turned off, all documents can be publicly searched. If you want to use access control for your documents and identity crawler is turned off, you can alternatively use the [https://docs.aws.amazon.com/kendra/latest/dg/API_PutPrincipalMapping.html](https://docs.aws.amazon.com/kendra/latest/dg/API_PutPrincipalMapping.html) API to upload user and group access information. | 
| syncMode | Specify how Amazon Kendra should update your index when your data source content changes. You can choose between: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | 
| secretARN | The Amazon Resource Name (ARN) of an AWS Secrets Manager secret that contains the key-value pairs required to connect to your Confluence. For information on these key-value pairs, see [Connection instructions for Confluence](https://docs.aws.amazon.com/kendra/latest/dg/data-source-v2-confluence.html#data-source-procedure-v2-confluence). | 
| version | The version of this template that is currently supported. | 

### Confluence JSON schema
<a name="confluence-json"></a>

```
{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "type": "object",
  "properties": {
    "connectionConfiguration": {
      "type": "object",
      "properties": {
        "repositoryEndpointMetadata": {
          "type": "object",
          "properties": {
            "hostUrl": {
              "type": "string",
              "pattern": "https:.*"
            },
            "type": {
              "type": "string",
              "enum": [
                "SAAS",
                "ON_PREM"
              ]
            },
            "authType": {
              "type": "string",
              "enum": [
                "Basic",
                "OAuth2",
                "Personal-token"
              ]
            }
          },
          "required": [
            "hostUrl",
            "type",
            "authType"
          ]
        }
      },
      "required": [
        "repositoryEndpointMetadata"
      ]
    },
    "repositoryConfigurations": {
      "type": "object",
      "properties": {
        "space": {
          "type": "object",
          "properties": {
            "fieldMappings": {
              "type": "array",
              "items": [
                {
                  "type": "object",
                  "properties": {
                    "indexFieldName": {
                      "type": "string"
                    },
                    "indexFieldType": {
                      "type": "string",
                      "enum": [
                        "STRING",
                        "STRING_LIST",
                        "DATE"
                      ]
                    },
                    "dataSourceFieldName": {
                      "type": "string"
                    },
                    "dateFieldFormat": {
                      "type": "string",
                      "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                    }
                  },
                  "required": [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required": [
            "fieldMappings"
          ]
        },
        "page": {
          "type": "object",
          "properties": {
            "fieldMappings": {
              "type": "array",
              "items": [
                {
                  "type": "object",
                  "properties": {
                    "indexFieldName": {
                      "type": "string"
                    },
                    "indexFieldType": {
                      "type": "string",
                      "enum": [
                        "STRING",
                        "STRING_LIST",
                        "DATE",
                        "LONG"
                      ]
                    },
                    "dataSourceFieldName": {
                      "type": "string"
                    },
                    "dateFieldFormat": {
                      "type": "string",
                      "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                    }
                  },
                  "required": [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required": [
            "fieldMappings"
          ]
        },
        "blog": {
          "type": "object",
          "properties": {
            "fieldMappings": {
              "type": "array",
              "items": [
                {
                  "type": "object",
                  "properties": {
                    "indexFieldName": {
                      "type": "string"
                    },
                    "indexFieldType": {
                      "type": "string",
                      "enum": [
                        "STRING",
                        "STRING_LIST",
                        "DATE",
                        "LONG"
                      ]
                    },
                    "dataSourceFieldName": {
                      "type": "string"
                    },
                    "dateFieldFormat": {
                      "type": "string",
                      "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                    }
                  },
                  "required": [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required": [
            "fieldMappings"
          ]
        },
        "comment": {
          "type": "object",
          "properties": {
            "fieldMappings": {
              "type": "array",
              "items": [
                {
                  "type": "object",
                  "properties": {
                    "indexFieldName": {
                      "type": "string"
                    },
                    "indexFieldType": {
                      "type": "string",
                      "enum": [
                        "STRING",
                        "STRING_LIST",
                        "DATE",
                        "LONG"
                      ]
                    },
                    "dataSourceFieldName": {
                      "type": "string"
                    },
                    "dateFieldFormat": {
                      "type": "string",
                      "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                    }
                  },
                  "required": [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required": [
            "fieldMappings"
          ]
        },
        "attachment": {
          "type": "object",
          "properties": {
            "fieldMappings": {
              "type": "array",
              "items": [
                {
                  "type": "object",
                  "properties": {
                    "indexFieldName": {
                      "type": "string"
                    },
                    "indexFieldType": {
                      "type": "string",
                      "enum": [
                        "STRING",
                        "STRING_LIST",
                        "DATE",
                        "LONG"
                      ]
                    },
                    "dataSourceFieldName": {
                      "type": "string"
                    },
                    "dateFieldFormat": {
                      "type": "string",
                      "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                    }
                  },
                  "required": [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required": [
            "fieldMappings"
          ]
        }
      }
    },
    "additionalProperties": {
      "type": "object",
      "properties": {
        "usersAclS3FilePath": {
          "type": "string"
        },
        "isCrawlAcl": {
          "type": "boolean"
        },
        "fieldForUserId": {
          "type": "string"
        },
        "inclusionSpaceKeyFilter": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "exclusionSpaceKeyFilter": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "pageTitleRegEX": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "blogTitleRegEX": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "commentTitleRegEX": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "attachmentTitleRegEX": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "isCrawlPersonalSpace": {
          "type": "boolean"
        },
        "isCrawlArchivedSpace": {
          "type": "boolean"
        },
        "isCrawlArchivedPage": {
          "type": "boolean"
        },
        "isCrawlPage": {
          "type": "boolean"
        },
        "isCrawlBlog": {
          "type": "boolean"
        },
        "isCrawlPageComment": {
          "type": "boolean"
        },
        "isCrawlPageAttachment": {
          "type": "boolean"
        },
        "isCrawlBlogComment": {
          "type": "boolean"
        },
        "isCrawlBlogAttachment": {
          "type": "boolean"
        },
        "maxFileSizeInMegaBytes":  {
          "type":"string"
        },
        "inclusionFileTypePatterns": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "exclusionFileTypePatterns": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "inclusionUrlPatterns": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "exclusionUrlPatterns": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "proxyHost": {
          "type": "string"
        },
        "proxyPort": {
          "type": "string"
        }
      },
      "required": []
    },
    "type": {
      "type": "string",
      "pattern": "CONFLUENCEV2"
    },
    "enableIdentityCrawler": {
      "type": "boolean"
    },
    "syncMode": {
      "type": "string",
      "enum": [
        "FULL_CRAWL",
        "FORCED_FULL_CRAWL"
      ]
    },
    "secretArn": {
      "type": "string",
      "minLength": 20,
      "maxLength": 2048
    }
  },
  "version": {
    "type": "string",
    "anyOf": [
      {
        "pattern": "1.0.0"
      }
    ]
  },
  "required": [
    "connectionConfiguration",
    "repositoryConfigurations",
    "syncMode",
    "additionalProperties",
    "secretArn",
    "type"
  ]
}
```

## Dropbox template schema
<a name="ds-dropbox-schema"></a>

You include a JSON that contains the data source schema as part of the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html) object. You provide the Dropbox app key, app secret, and access token as part of your secret that stores your authentication credentials. Also specify the type of data source as `DROPBOX`, the type of access token you want to use (temporary or permanent), and other necessary configurations. You then specify `TEMPLATE` as the `Type` when you call [CreateDataSource](https://docs.aws.amazon.com/kendra/latest/APIReference/API_CreateDataSource.html).

You can use the template provided in this developer guide. See [Dropbox JSON schema](#dropbox-json).

The following table describes the parameters of the Dropbox JSON schema.


| Configuration | Description | 
| --- | --- | 
| connectionConfiguration | Configuration information for the endpoint for the data source. | 
| repositoryEndpointMetadata | The endpoint information for the data source. This data source does not specify an endpoint in repositoryEndpointMetadata. Rather, the connection information is included in an AWS Secrets Manager secret that you provide the secretArn. | 
| repositoryConfigurations | Configuration information for the content of the data source. For example, configuring specific types of content and field mappings. | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | A list of objects that map the attributes or field names of your Dropbox files, Dropbox Paper, and shortcuts to Amazon Kendra index field names. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html). | 
| syncMode |  Specify how Amazon Kendra should update your index when your data source content changes. You can choose between: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | 
| enableIdentityCrawler | true to use Amazon Kendra's identity crawler to sync identity/principal information on users and groups with access to certain documents. If identity crawler is turned off, all documents can be publicly searched. If you want to use access control for your documents and identity crawler is turned off, you can alternatively use the [https://docs.aws.amazon.com/kendra/latest/dg/API_PutPrincipalMapping.html](https://docs.aws.amazon.com/kendra/latest/dg/API_PutPrincipalMapping.html) API to upload user and group access information. | 
| secretARN | The Amazon Resource Name (ARN) of an AWS Secrets Manager secret that contains the key-value pairs required to connect to your Dropbox. The secret must contain a JSON structure with the following keys: <pre>{<br />    "appKey": "Dropbox app key",<br />    "appSecret": "Dropbox app secret",<br />    "accesstoken": "temporary access token or refresh access token"<br />}</pre> | 
| additionalProperties | Additional configuration options for your content in your data source. | 
| isCrawlAcl | true to crawl the access control list (ACL) information for your documents, if you have an ACL and want to use it for access control. The ACL specifies which documents that users and groups can access. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see [User context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources). | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | A list of regular expression patterns to include certain file names and types in your Dropbox data source. Files that match the patterns are included in the index. Files that don't match the patterns are excluded from the index. If a file matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence and the file isn't included in the index. | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | A list of regular expression patterns to exclude certain file names and types in your Dropbox data source. Files that match the patterns are excluded from the index. Files that don't match the patterns are included in the index. If a file matches both an exclusion and inclusion pattern, the exclusion pattern takes precedence and the file isn't included in the index. | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | true to crawl files in your Dropbox, Dropbox Paper documents, Dropbox Paper templates, and web page shortcuts stored in your Dropbox. | 
| type | The type of data source. Specify DROPBOX as your data source type. | 
| tokenType | Specify your access token type: permanent or temporary access token. It's recommended that you create a refresh access token that never expires in Dropbox rather that relying on a one-time access token that expires after 4 hours. You create an app and a refresh access token in the Dropbox developer console and provide the access token in your secret. | 
| version | The version of this template that is currently supported. | 

### Dropbox JSON schema
<a name="dropbox-json"></a>

```
{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "type": "object",
  "properties": {
    "connectionConfiguration": {
      "type": "object",
      "properties": {
        "repositoryEndpointMetadata": {
          "type": "object",
          "properties": {
          }
        }
      },
      "required": [
        "repositoryEndpointMetadata"
      ]
    },
    "repositoryConfigurations": {
      "type": "object",
      "properties": {
        "file": {
          "type": "object",
          "properties": {
            "fieldMappings": {
              "type": "array",
              "items": {
                "anyOf": [
                  {
                    "type": "object",
                    "properties": {
                      "indexFieldName": {
                        "type": "string"
                      },
                      "indexFieldType": {
                        "type": "string",
                        "enum": [
                          "STRING",
                          "STRING_LIST",
                          "LONG",
                          "DATE"
                        ]
                      },
                      "dataSourceFieldName": {
                        "type": "string"
                      },
                      "dateFieldFormat": {
                        "type": "string",
                        "pattern": "dd-MM-yyyy HH:mm:ss"
                      }
                    },
                    "required": [
                      "indexFieldName",
                      "indexFieldType",
                      "dataSourceFieldName"
                    ]
                  }
                ]
              }
            }
          },
          "required": [
            "fieldMappings"
          ]
        },
        "paper": {
          "type": "object",
          "properties": {
            "fieldMappings": {
              "type": "array",
              "items": {
                "anyOf": [
                  {
                    "type": "object",
                    "properties": {
                      "indexFieldName": {
                        "type": "string"
                      },
                      "indexFieldType": {
                        "type": "string",
                        "enum": [
                          "STRING",
                          "STRING_LIST",
                          "LONG",
                          "DATE"
                        ]
                      },
                      "dataSourceFieldName": {
                        "type": "string"
                      },
                      "dateFieldFormat": {
                        "type": "string",
                        "pattern": "dd-MM-yyyy HH:mm:ss"
                      }
                    },
                    "required": [
                      "indexFieldName",
                      "indexFieldType",
                      "dataSourceFieldName"
                    ]
                  }
                ]
              }
            }
          },
          "required": [
            "fieldMappings"
          ]
        },
        "papert": {
          "type": "object",
          "properties": {
            "fieldMappings": {
              "type": "array",
              "items": {
                "anyOf": [
                  {
                    "type": "object",
                    "properties": {
                      "indexFieldName": {
                        "type": "string"
                      },
                      "indexFieldType": {
                        "type": "string",
                        "enum": [
                          "STRING",
                          "STRING_LIST",
                          "LONG",
                          "DATE"
                        ]
                      },
                      "dataSourceFieldName": {
                        "type": "string"
                      },
                      "dateFieldFormat": {
                        "type": "string",
                        "pattern": "dd-MM-yyyy HH:mm:ss"
                      }
                    },
                    "required": [
                      "indexFieldName",
                      "indexFieldType",
                      "dataSourceFieldName"
                    ]
                  }
                ]
              }
            }
          },
          "required": [
            "fieldMappings"
          ]
        },
        "shortcut": {
          "type": "object",
          "properties": {
            "fieldMappings": {
              "type": "array",
              "items": {
                "anyOf": [
                  {
                    "type": "object",
                    "properties": {
                      "indexFieldName": {
                        "type": "string"
                      },
                      "indexFieldType": {
                        "type": "string",
                        "enum": [
                          "STRING",
                          "STRING_LIST",
                          "LONG",
                          "DATE"
                        ]
                      },
                      "dataSourceFieldName": {
                        "type": "string"
                      },
                      "dateFieldFormat": {
                        "type": "string",
                        "pattern": "dd-MM-yyyy HH:mm:ss"
                      }
                    },
                    "required": [
                      "indexFieldName",
                      "indexFieldType",
                      "dataSourceFieldName"
                    ]
                  }
                ]
              }
            }
          },
          "required": [
            "fieldMappings"
          ]
        }
      }
    },
    "syncMode": {
      "type": "string",
      "enum": [
        "FULL_CRAWL",
        "FORCED_FULL_CRAWL",
        "CHANGE_LOG"
      ]
    },
    "enableIdentityCrawler": {
      "type": "boolean"
    },
    "secretArn": {
      "type": "string"
    },
    "additionalProperties": {
      "type": "object",
      "properties": {
        "isCrawlAcl": {
          "type": "boolean"
        },
        "inclusionFileNamePatterns": {
          "type": "array"
        },
        "exclusionFileNamePatterns": {
          "type": "array"
        },
        "inclusionFileTypePatterns": {
          "type": "array"
        },
        "exclusionFileTypePatterns": {
          "type": "array"
        },
        "crawlFile": {
          "type": "boolean"
        },
        "crawlPaper": {
          "type": "boolean"
        },
        "crawlPapert": {
          "type": "boolean"
        },
        "crawlShortcut": {
          "type": "boolean"
        }
      }
    },
    "type": {
      "type": "string",
      "pattern": "DROPBOX"
    },
    "tokenType": {
      "type": "string",
      "enum": [
        "PERMANENT",
        "TEMPORARY"
      ]
    },
    "version": {
      "type": "string",
      "anyOf": [
        {
          "pattern": "1.0.0"
        }
      ]
    }
  },
  "additionalProperties": false,
  "required": [
    "connectionConfiguration",
    "repositoryConfigurations",
    "additionalProperties",
    "syncMode",
    "enableIdentityCrawler",
    "secretArn",
    "type",
    "tokenType"
  ]
}
```

## Drupal template schema
<a name="ds-drupal-schema"></a>

You include a JSON that contains the data source schema as part of the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) object. You provide the Drupal host URL and the authentication type as part of the connection configuration or repository endpoint details. Also specify the type of data source as DRUPAL, a secret for your authentication credentials, and other necessary configurations. You then specify `TEMPLATE` as the `Type` when you call [CreateDataSource](https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html).

You can use the template provided in this developer guide. See [Drupal JSON schema](#drupal-json).

The following table describes the parameters of the Drupal JSON schema.


| Configuration | Description | 
| --- | --- | 
| connectionConfiguration | Configuration information for the endpoint for the data source. | 
| repositoryEndpointMetadata | The endpoint information for the data source. | 
| hostUrl | The host url of your Drupal website. For example, https://<hostname>/<drupalsitename>. | 
| repositoryConfigurations | Configuration information for the content of the data source. | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | A list of objects that map the attributes or field names of your Drupal files. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html). The Drupal data source field names must exist in your Drupal custom metadata. | 
| additionalProperties | Additional configuration options for your content in your data source. | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html) | A list of regular expression patterns to include certain files in your Drupal data source. Files that match the patterns are included in the index. Files that don't match the patterns are excluded from the index. If a file matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence and the file isn't included in the index. | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html) | A list of regular expression patterns to exclude certain files in your Drupal data source. Files that match the patterns are excluded from the index. Files that don't match the patterns are included in the index. If a file matches both an exclusion and inclusion pattern, the exclusion pattern takes precedence and the file isn't included in the index. | 
| contentDefinitions[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | Specify the content types to crawl and whether to crawl comments and attachments for your selected content types. | 
| type | The type of data source. Specify DRUPAL as your data source type. | 
| authType | The type of authentication that you use, whether BASIC-AUTH or OAUTH2. | 
| syncMode |  Specify how Amazon Kendra should update your index when your data source content changes. You can choose between: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | 
| enableIdentityCrawler | true to use Amazon Kendra's identity crawler to sync identity/principal information on users and groups with access to certain documents. If identity crawler is turned off, all documents can be publicly searched. If you want to use access control for your documents and identity crawler is turned off, you can alternatively use the [https://docs.aws.amazon.com/kendra/latest/dg/API_PutPrincipalMapping.html](https://docs.aws.amazon.com/kendra/latest/dg/API_PutPrincipalMapping.html) API to upload user and group access information. | 
| secretARN | The Amazon Resource Name (ARN) of an AWS Secrets Manager secret that contains the key-value pairs required to connect to your Drupal. The secret must contain a JSON structure with the following keys: **If using basic authentication:**<pre>{<br />    "username": "user name",<br />    "passwords": "password"<br />}</pre> **If using OAuth 2.0 authentication:**<pre>{<br />    "username": "user name",<br />    "password": "password",<br />    "clientId": "client id",<br />    "clientSecret": "client secret"<br />}</pre>  | 
| version | The version of this template that is currently supported. | 

### Drupal JSON schema
<a name="drupal-json"></a>

```
{
	"$schema": "http://json-schema.org/draft-04/schema#",
	"type": "object",
	"properties": {
		"connectionConfiguration": {
			"type": "object",
			"properties": {
				"repositoryEndpointMetadata": {
					"type": "object",
					"properties": {
						"hostUrl": {
							"type": "string",
							"pattern": "https:.*"
						}
					},
					"required": [
						"hostUrl"
					]
				}
			},
			"required": [
				"repositoryEndpointMetadata"
			]
		},
		"repositoryConfigurations": {
			"type": "object",
			"properties": {
				"content": {
					"type": "object",
					"properties": {
						"fieldMappings": {
							"type": "array",
							"items": [
								{
									"type": "object",
									"properties": {
										"indexFieldName": {
											"type": "string"
										},
										"indexFieldType": {
											"type": "string",
											"enum": [
												"STRING",
												"DATE"
											]
										},
										"dataSourceFieldName": {
											"type": "string"
										},
										"dateFieldFormat": {
											"type": "string",
											"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
										}
									},
									"required": [
										"indexFieldName",
										"indexFieldType",
										"dataSourceFieldName"
									]
								}
							]
						}
					},
					"required": [
						"fieldMappings"
					]
				},
				"comment": {
					"type": "object",
					"properties": {
						"fieldMappings": {
							"type": "array",
							"items": [
								{
									"type": "object",
									"properties": {
										"indexFieldName": {
											"type": "string"
										},
										"indexFieldType": {
											"type": "string",
											"enum": [
												"STRING",
												"DATE"
											]
										},
										"dataSourceFieldName": {
											"type": "string"
										},
										"dateFieldFormat": {
											"type": "string",
											"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
										}
									},
									"required": [
										"indexFieldName",
										"indexFieldType",
										"dataSourceFieldName"
									]
								}
							]
						}
					},
					"required": [
						"fieldMappings"
					]
				},
				"attachment": {
					"type": "object",
					"properties": {
						"fieldMappings": {
							"type": "array",
							"items": [
								{
									"type": "object",
									"properties": {
										"indexFieldName": {
											"type": "string"
										},
										"indexFieldType": {
											"type": "string",
											"enum": [
												"STRING",
												"DATE"
											]
										},
										"dataSourceFieldName": {
											"type": "string"
										},
										"dateFieldFormat": {
											"type": "string",
											"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
										}
									},
									"required": [
										"indexFieldName",
										"indexFieldType",
										"dataSourceFieldName"
									]
								}
							]
						}
					},
					"required": [
						"fieldMappings"
					]
				}
			}
		},
		"additionalProperties": {
			"type": "object",
			"properties": {
				"isCrawlArticle": {
					"type": "boolean"
				},
				"isCrawlBasicPage": {
					"type": "boolean"
				},
				"isCrawlBasicBlock": {
					"type": "boolean"
				},
				"crawlCustomContentTypesList": {
					"type": "array",
					"items": {
						"type": "string"
					}
				},
				"crawlCustomBlockTypesList": {
					"type": "array",
					"items": {
						"type": "string"
					}
				},
				"filePath": {
					"anyOf": [
						{
							"type": "string",
							"pattern": "s3:.*"
						},
						{
							"type": "string",
							"pattern": ""
						}
					]
				},
				"inclusionFileNamePatterns": {
					"type": "array",
					"items": {
						"type": "string"
					}
				},
				"exclusionFileNamePatterns": {
					"type": "array",
					"items": {
						"type": "string"
					}
				},
				"articleTitleInclusionPatterns": {
					"type": "array",
					"items": {
						"type": "string"
					}
				},
				"articleTitleExclusionPatterns": {
					"type": "array",
					"items": {
						"type": "string"
					}
				},
				"pageTitleInclusionPatterns": {
					"type": "array",
					"items": {
						"type": "string"
					}
				},
				"pageTitleExclusionPatterns": {
					"type": "array",
					"items": {
						"type": "string"
					}
				},
				"customContentTitleInclusionPatterns": {
					"type": "array",
					"items": {
						"type": "string"
					}
				},
				"customContentTitleExclusionPatterns": {
					"type": "array",
					"items": {
						"type": "string"
					}
				},
				"basicBlockTitleInclusionPatterns": {
					"type": "array",
					"items": {
						"type": "string"
					}
				},
				"basicBlockTitleExclusionPatterns": {
					"type": "array",
					"items": {
						"type": "string"
					}
				},
				"customBlockTitleInclusionPatterns": {
					"type": "array",
					"items": {
						"type": "string"
					}
				},
				"customBlockTitleExclusionPatterns": {
					"type": "array",
					"items": {
						"type": "string"
					}
				},
				"contentDefinitions": {
					"type": "array",
					"items": {
						"properties": {
							"contentType": {
								"type": "string"
							},
							"fieldDefinition": {
								"type": "array",
								"items": [
									{
										"type": "object",
										"properties": {
											"machineName": {
												"type": "string"
											},
											"type": {
												"type": "string"
											}
										},
										"required": [
											"machineName",
											"type"
										]
									}
								]
							},
							"isCrawlComments": {
								"type": "boolean"
							},
							"isCrawlFiles": {
								"type": "boolean"
							}
						}
					},
					"required": [
						"contentType",
						"fieldDefinition",
						"isCrawlComments",
						"isCrawlFiles"
					]
				}
			},
			"required": []
		},
		"type": {
			"type": "string",
			"pattern": "DRUPAL"
		},
		"authType": {
			"type": "string",
			"enum": [
				"BASIC-AUTH",
				"OAUTH2"
			]
		},
		"syncMode": {
			"type": "string",
			"enum": [
				"FORCED_FULL_CRAWL",
				"FULL_CRAWL",
				"CHANGE_LOG"
			]
		},
		"enableIdentityCrawler": {
			"type": "boolean"
		},
		"secretArn": {
			"type": "string",
			"minLength": 20,
			"maxLength": 2048
		}
	},
	"version": {
		"type": "string",
		"anyOf": [
			{
				"pattern": "1.0.0"
			}
		]
	},
	"required": [
		"connectionConfiguration",
		"repositoryConfigurations",
		"syncMode",
		"additionalProperties",
		"secretArn",
		"type"
	]
}
```

## GitHub template schema
<a name="ds-github-schema"></a>

You include a JSON that contains the data source schema as part of the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) object. You provide the GitHub host URL, the organization name, and whether you use GitHub cloud or GitHub on-premises as part of the connection configuration or repository endpoint details. Also specify the type of data source as `GITHUB`, a secret for your authentication credentials, and other necessary configurations. You then specify `TEMPLATE` as the `Type` when you call [CreateDataSource](https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html).

You can use the template provided in this developer guide. See [GitHub JSON schema](#github-json).

The following table describes the parameters of the GitHub JSON schema.


| Configuration | Description | 
| --- | --- | 
| connectionConfiguration | Configuration information for the endpoint for the data source. | 
| repositoryEndpointMetadata | The endpoint information for the data source. | 
| type | Specify the type as either SAAS or ON\$1PREMISE. | 
| hostUrl | The GitHub host URL. For example, if you use GitHub SaaS/Enterprise Cloud: https://api.github.com. Or, if you use GitHub on-premises/Enterprise Server: https://on-prem-host-url/api/v3/. | 
| organizationName | You can find your organization name when you log in to GitHub desktop and go to Your organizations under your profile picture dropdown. | 
| repositoryConfigurations | Configuration information for the content of the data source. For example, configuring specific types of content and field mappings. | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | A list of objects that map the attributes or field names of your GitHub content to Amazon Kendra index field names. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html). | 
| additionalProperties | Additional configuration options for your content in your data source. | 
| isCrawlAcl | true to crawl the access control list (ACL) information for your documents, if you have an ACL and want to use it for access control. The ACL specifies which documents that users and groups can access and search. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see [User context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources). | 
| fieldForUserId | Specify the type of user ID that you want to use for ACL crawling. Specify either email if you want to use the user email for the user ID, or username if you want to use the user name for the user ID. If you don't specify an option then email is used by default. | 
| repositoryFilter | A list of names of the specific repositories and branch names you want to index. | 
| crawlRepository | true to crawl repositories. | 
| crawlRepositoryDocuments | true to crawl repository documents. | 
| crawlIssue | true to crawl issues. | 
| crawlIssueComment | true to crawl issue comments. | 
| crawlIssueCommentAttachment | true to crawl issue comment attachments. | 
| crawlPullRequest | true to crawl pull requests. | 
| crawlPullRequestComment | true to crawl pull request comments. | 
| crawlPullRequestCommentAttachment | true to crawl pull request comment attachments. | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | A list of regular expression patterns to include certain content in your GitHub data source. Content that matches the patterns are included in the index. Content that doesn't match the patterns are excluded from the index. If any content matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence, and the content isn't included in the index. | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | A list of regular expression patterns to exclude certain content in your GitHub data source. Content that matches the patterns are excluded from the index. Content that doesn't match the patterns are included in the index. If any content matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence, and the content isn't included in the index. | 
| type | The type of data source. Specify GITHUB as your data source type. | 
| enableIdentityCrawler | true to use Amazon Kendra's identity crawler to sync identity/principal information on users and groups with access to certain documents. If identity crawler is turned off, all documents can be publicly searched. If you want to use access control for your documents and identity crawler is turned off, you can alternatively use the [https://docs.aws.amazon.com/kendra/latest/dg/API_PutPrincipalMapping.html](https://docs.aws.amazon.com/kendra/latest/dg/API_PutPrincipalMapping.html) API to upload user and group access information. | 
| syncMode |  Specify how Amazon Kendra should update your index when your data source content changes. You can choose between: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | 
| secretArn |  The Amazon Resource Name (ARN) of an AWS Secrets Manager secret that contains the key-value pairs required to connect to your GitHub. The secret must contain a JSON structure with the following keys: <pre>{<br />    "personalToken": "token"<br />}</pre>  | 
| version | The version of this template that's currently supported. | 

### GitHub JSON schema
<a name="github-json"></a>

The following is the GitHub JSON schema:

```
{
    "$schema": "http://json-schema.org/draft-04/schema#",
    "type": "object",
    "properties": {
        "connectionConfiguration": {
            "type": "object",
            "properties": {
                "repositoryEndpointMetadata": {
                    "type": "object",
                    "properties": {
                        "type": {
                            "type": "string"
                        },
                        "hostUrl": {
                            "type": "string",
                            "pattern": "https://.*"
                        },
                        "organizationName": {
                            "type": "string"
                        }
                    },
                    "required": [
                        "type",
                        "hostUrl",
                        "organizationName"
                    ]
                }
            },
            "required": [
                "repositoryEndpointMetadata"
            ]
        },
        "repositoryConfigurations": {
            "type": "object",
            "properties": {
                "ghRepository": {
                    "type": "object",
                    "properties": {
                        "fieldMappings": {
                            "type": "array",
                            "items": [
                                {
                                    "type": "object",
                                    "properties": {
                                        "indexFieldName": {
                                            "type": "string"
                                        },
                                        "indexFieldType": {
                                            "type": "string",
                                            "enum": [
                                                "STRING",
                                                "STRING_LIST",
                                                "DATE"
                                            ]
                                        },
                                        "dataSourceFieldName": {
                                            "type": "string"
                                        },
                                        "dateFieldFormat": {
                                            "type": "string",
                                            "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                                        }
                                    },
                                    "required": [
                                        "indexFieldName",
                                        "indexFieldType",
                                        "dataSourceFieldName"
                                    ]
                                }
                            ]
                        }
                    },
                    "required": [
                        "fieldMappings"
                    ]
                },
                "ghCommit": {
                    "type": "object",
                    "properties": {
                        "fieldMappings": {
                            "type": "array",
                            "items": [
                                {
                                    "type": "object",
                                    "properties": {
                                        "indexFieldName": {
                                            "type": "string"
                                        },
                                        "indexFieldType": {
                                            "type": "string",
                                            "enum": [
                                                "STRING",
                                                "STRING_LIST",
                                                "DATE"
                                            ]
                                        },
                                        "dataSourceFieldName": {
                                            "type": "string"
                                        },
                                        "dateFieldFormat": {
                                            "type": "string",
                                            "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                                        }
                                    },
                                    "required": [
                                        "indexFieldName",
                                        "indexFieldType",
                                        "dataSourceFieldName"
                                    ]
                                }
                            ]
                        }
                    },
                    "required": [
                        "fieldMappings"
                    ]
                },
                "ghIssueDocument": {
                    "type": "object",
                    "properties": {
                        "fieldMappings": {
                            "type": "array",
                            "items": [
                                {
                                    "type": "object",
                                    "properties": {
                                        "indexFieldName": {
                                            "type": "string"
                                        },
                                        "indexFieldType": {
                                            "type": "string",
                                            "enum": [
                                                "STRING",
                                                "STRING_LIST",
                                                "DATE"
                                            ]
                                        },
                                        "dataSourceFieldName": {
                                            "type": "string"
                                        },
                                        "dateFieldFormat": {
                                            "type": "string",
                                            "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                                        }
                                    },
                                    "required": [
                                        "indexFieldName",
                                        "indexFieldType",
                                        "dataSourceFieldName"
                                    ]
                                }
                            ]
                        }
                    },
                    "required": [
                        "fieldMappings"
                    ]
                },
                "ghIssueComment": {
                    "type": "object",
                    "properties": {
                        "fieldMappings": {
                            "type": "array",
                            "items": [
                                {
                                    "type": "object",
                                    "properties": {
                                        "indexFieldName": {
                                            "type": "string"
                                        },
                                        "indexFieldType": {
                                            "type": "string",
                                            "enum": [
                                                "STRING",
                                                "STRING_LIST",
                                                "DATE"
                                            ]
                                        },
                                        "dataSourceFieldName": {
                                            "type": "string"
                                        },
                                        "dateFieldFormat": {
                                            "type": "string",
                                            "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                                        }
                                    },
                                    "required": [
                                        "indexFieldName",
                                        "indexFieldType",
                                        "dataSourceFieldName"
                                    ]
                                }
                            ]
                        }
                    },
                    "required": [
                        "fieldMappings"
                    ]
                },
                "ghIssueAttachment": {
                    "type": "object",
                    "properties": {
                        "fieldMappings": {
                            "type": "array",
                            "items": [
                                {
                                    "type": "object",
                                    "properties": {
                                        "indexFieldName": {
                                            "type": "string"
                                        },
                                        "indexFieldType": {
                                            "type": "string",
                                            "enum": [
                                                "STRING",
                                                "STRING_LIST",
                                                "DATE"
                                            ]
                                        },
                                        "dataSourceFieldName": {
                                            "type": "string"
                                        },
                                        "dateFieldFormat": {
                                            "type": "string",
                                            "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                                        }
                                    },
                                    "required": [
                                        "indexFieldName",
                                        "indexFieldType",
                                        "dataSourceFieldName"
                                    ]
                                }
                            ]
                        }
                    },
                    "required": [
                        "fieldMappings"
                    ]
                },
                "ghPRDocument": {
                    "type": "object",
                    "properties": {
                        "fieldMappings": {
                            "type": "array",
                            "items": [
                                {
                                    "type": "object",
                                    "properties": {
                                        "indexFieldName": {
                                            "type": "string"
                                        },
                                        "indexFieldType": {
                                            "type": "string",
                                            "enum": [
                                                "STRING",
                                                "STRING_LIST",
                                                "DATE"
                                            ]
                                        },
                                        "dataSourceFieldName": {
                                            "type": "string"
                                        },
                                        "dateFieldFormat": {
                                            "type": "string",
                                            "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                                        }
                                    },
                                    "required": [
                                        "indexFieldName",
                                        "indexFieldType",
                                        "dataSourceFieldName"
                                    ]
                                }
                            ]
                        }
                    },
                    "required": [
                        "fieldMappings"
                    ]
                },
                "ghPRComment": {
                    "type": "object",
                    "properties": {
                        "fieldMappings": {
                            "type": "array",
                            "items": [
                                {
                                    "type": "object",
                                    "properties": {
                                        "indexFieldName": {
                                            "type": "string"
                                        },
                                        "indexFieldType": {
                                            "type": "string",
                                            "enum": [
                                                "STRING",
                                                "STRING_LIST",
                                                "DATE"
                                            ]
                                        },
                                        "dataSourceFieldName": {
                                            "type": "string"
                                        },
                                        "dateFieldFormat": {
                                            "type": "string",
                                            "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                                        }
                                    },
                                    "required": [
                                        "indexFieldName",
                                        "indexFieldType",
                                        "dataSourceFieldName"
                                    ]
                                }
                            ]
                        }
                    },
                    "required": [
                        "fieldMappings"
                    ]
                },
                "ghPRAttachment": {
                    "type": "object",
                    "properties": {
                        "fieldMappings": {
                            "type": "array",
                            "items": [
                                {
                                    "type": "object",
                                    "properties": {
                                        "indexFieldName": {
                                            "type": "string"
                                        },
                                        "indexFieldType": {
                                            "type": "string",
                                            "enum": [
                                                "STRING",
                                                "STRING_LIST",
                                                "DATE"
                                            ]
                                        },
                                        "dataSourceFieldName": {
                                            "type": "string"
                                        },
                                        "dateFieldFormat": {
                                            "type": "string",
                                            "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                                        }
                                    },
                                    "required": [
                                        "indexFieldName",
                                        "indexFieldType",
                                        "dataSourceFieldName"
                                    ]
                                }
                            ]
                        }
                    },
                    "required": [
                        "fieldMappings"
                    ]
                }
            }
        },
        "additionalProperties": {
            "type": "object",
            "properties": {
                "isCrawlAcl": {
                    "type": "boolean"
                },
                "fieldForUserId": {
                    "type": "string"
                },
                "crawlRepository": {
                    "type": "boolean"
                },
                "crawlRepositoryDocuments": {
                    "type": "boolean"
                },
                "crawlIssue": {
                    "type": "boolean"
                },
                "crawlIssueComment": {
                    "type": "boolean"
                },
                "crawlIssueCommentAttachment": {
                    "type": "boolean"
                },
                "crawlPullRequest": {
                    "type": "boolean"
                },
                "crawlPullRequestComment": {
                    "type": "boolean"
                },
                "crawlPullRequestCommentAttachment": {
                    "type": "boolean"
                },
                "repositoryFilter": {
                    "type": "array",
                    "items": [
                        {
                            "type": "object",
                            "properties": {
                                "repositoryName": {
                                    "type": "string"
                                },
                                "branchNameList": {
                                    "type": "array",
                                    "items": {
                                        "type": "string"
                                    }
                                }
                            }
                        }
                    ]
                },
                "inclusionFolderNamePatterns": {
                    "type": "array",
                    "items": {
                        "type": "string"
                    }
                },
                "inclusionFileTypePatterns": {
                    "type": "array",
                    "items": {
                        "type": "string"
                    }
                },
                "inclusionFileNamePatterns": {
                    "type": "array",
                    "items": {
                        "type": "string"
                    }
                },
                "exclusionFolderNamePatterns": {
                    "type": "array",
                    "items": {
                        "type": "string"
                    }
                },
                "exclusionFileTypePatterns": {
                    "type": "array",
                    "items": {
                        "type": "string"
                    }
                },
                "exclusionFileNamePatterns": {
                    "type": "array",
                    "items": {
                        "type": "string"
                    }
                }
            },
            "required": []
        },
        "type": {
            "type": "string",
            "pattern": "GITHUB"
        },
        "syncMode": {
            "type": "string",
            "enum": [
                "FULL_CRAWL",
                "FORCED_FULL_CRAWL",
                "CHANGE_LOG"
            ]
        },
        "enableIdentityCrawler": {
            "type": "boolean"
        },
        "secretArn": {
            "type": "string",
            "minLength": 20,
            "maxLength": 2048
        }
    },
    "version": {
        "type": "string",
        "anyOf": [
            {
                "pattern": "1.0.0"
            }
        ]
    },
    "required": [
        "connectionConfiguration",
        "repositoryConfigurations",
        "syncMode",
        "additionalProperties",
        "enableIdentityCrawler"
    ]
}
```

## Gmail template schema
<a name="ds-gmail-schema"></a>

You include a JSON that contains the data source schema as part of the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html) object. Specify the type of data source as `GMAIL`, a secret for your authentication credentials, and other necessary configurations. You then specify `TEMPLATE` as the `Type` when you call [CreateDataSource](https://docs.aws.amazon.com/kendra/latest/APIReference/API_CreateDataSource.html).

You can use the template provided in this developer guide. See [Gmail JSON schema](#gmail-json).

The following table describes the parameters of the Gmail JSON schema.


| Configuration | Description | 
| --- | --- | 
| connectionConfiguration | Configuration information for the endpoint for the data source. | 
| repositoryEndpointMetadata | The endpoint information for the data source. This data source does not specify an endpoint in repositoryEndpointMetadata. Rather, the connection information is included in an AWS Secrets Manager secret that you provide the secretArn. | 
| repositoryConfigurations | Configuration information for the content of the data source. For example, configuring specific types of content and field mappings. Specify the type of data source and the secret ARN. | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  |  A list of objects that map the attributes or field names of your Gmail messages and attachments to Amazon Kendra index field names. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html). | 
| additionalProperties | Additional configuration options for your content in your data source. | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  |  A list of regular expression patterns to include or exclude messages with specific subject names in your Gmail data source. Files that match the patterns are included in the index. If a file matches both an inclusion and an exclusion pattern, the exclusion pattern takes precedence, and the file isn't included in the index. | 
| beforeDateFilter | Specify messages and attachments to be included before a certain date.  | 
| afterDateFilter | Specify messages and attachments to be included after a certain date. | 
| isCrawlAttachment | A Boolean value to choose whether you want to crawl attachments. Messages are automatically crawled. | 
| type | The type of data source. Specify GMAIL as your data source type. | 
| shouldCrawlDraftMessages | A Boolean value to choose whether you want to crawl draft messages. | 
| syncMode | Specify how Amazon Kendra should update your index when your data source content changes. You can choose between: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  Because there is no API to update permanently deleted Gmail messages, any new, modified, or deleted content sync:   Won't remove messages that were permanently deleted from Gmail from your Amazon Kendra index   Won't sync changes in Gmail email labels   To sync your Gmail data source label changes and permanently deleted email messages to your Amazon Kendra index, you must run full crawls periodically.   | 
| secretARN | The Amazon Resource Name (ARN) of a Secrets Manager secret that contains the key-value pairs required to connect to your Gmail. The secret must contain a JSON structure with the following keys: <pre>{<br />    "adminAccountEmailId": "service account email",<br />    "clientEmailId": "user account email",<br />    "privateKey": "private key"<br />}</pre> | 
| version | The version of the template that is currently supported. | 

### Gmail JSON schema
<a name="gmail-json"></a>

```
{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "type": "object",
  "properties": {
    "connectionConfiguration": {
      "type": "object",
      "properties": {
      }
    },
    "repositoryConfigurations": {
      "type": "object",
      "properties": {
        "message": {
          "type": "object",
          "properties": {
            "fieldMappings": {
              "type": "array",
              "items": [
                {
                  "type": "object",
                  "properties": {
                    "indexFieldName": {
                      "type": "string"
                    },
                    "indexFieldType": {
                      "type": "string",
                      "enum": ["STRING", "STRING_LIST", "DATE"]
                    },
                    "dataSourceFieldName": {
                      "type": "string"
                    },
                    "dateFieldFormat": {
                      "type": "string"
                    }
                  },
                  "required": [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          }
        },
        "attachments": {
          "type": "object",
          "properties": {
            "fieldMappings": {
              "type": "array",
              "items": [
                {
                  "type": "object",
                  "properties": {
                    "indexFieldName": {
                      "type": "string"
                    },
                    "indexFieldType": {
                      "type": "string",
                      "enum": ["STRING"]
                    },
                    "dataSourceFieldName": {
                      "type": "string"
                    }
                  },
                  "required": [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          }
        }
      },
      "required": []
    },
    "additionalProperties": {
      "type": "object",
      "properties": {
        "inclusionLabelNamePatterns": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "exclusionLabelNamePatterns": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "inclusionAttachmentTypePatterns": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "exclusionAttachmentTypePatterns": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "inclusionAttachmentNamePatterns": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "exclusionAttachmentNamePatterns": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "inclusionSubjectFilter": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "exclusionSubjectFilter": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "isSubjectAnd": {
          "type": "boolean"
        },
        "inclusionFromFilter": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "exclusionFromFilter": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "inclusionToFilter": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "exclusionToFilter": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "inclusionCcFilter": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "exclusionCcFilter": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "inclusionBccFilter": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "exclusionBccFilter": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "beforeDateFilter": {
          "anyOf": [
            {
              "type": "string",
              "pattern": "^[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}Z$"
            },
            {
              "type": "string",
              "pattern": ""
            }
          ]
        },
        "afterDateFilter": {
          "anyOf": [
            {
              "type": "string",
              "pattern": "^[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}Z$"
            },
            {
              "type": "string",
              "pattern": ""
            }
          ]
        },
        "isCrawlAttachment": {
          "type": "boolean"
        },
        "shouldCrawlDraftMessages": {
          "type": "boolean"
        }
      },
      "required": [
        "isCrawlAttachment",
        "shouldCrawlDraftMessages"
      ]
    },
    "type" : {
      "type" : "string",
      "pattern": "GMAIL"
    },
    "syncMode": {
      "type": "string",
      "enum": [
        "FORCED_FULL_CRAWL",
        "FULL_CRAWL"
      ]
    },
    "secretArn": {
      "type": "string"
    },
    "version": {
      "type": "string",
      "anyOf": [
        {
          "pattern": "1.0.0"
        }
      ]
    }
  },
  "required": [
    "connectionConfiguration",
    "repositoryConfigurations",
    "additionalProperties",
    "syncMode",
    "secretArn",
    "type"
  ]
}
```

## Google Drive template schema
<a name="ds-googledrive-schema"></a>

You include a JSON that contains the data source schema as part of the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html) object. Specify the type of data source as `GOOGLEDRIVE2`, a secret for your authentication credentials, and other necessary configurations. You then specify `TEMPLATE` as the `Type` when you call [CreateDataSource](https://docs.aws.amazon.com/kendra/latest/APIReference/API_CreateDataSource.html).

You can use the template provided in this developer guide. See [Google Drive JSON schema](#googledrive-json).

The following table describes the parameters of the Google Drive JSON schema.


| Configuration | Description | 
| --- | --- | 
| connectionConfiguration | Configuration information for the data source. | 
| repositoryEndpointMetadata | The endpoint information for the data source. This data source does not specify an endpoint. You choose your authentication type: serviceAccount and OAuth2. The connection information is included in an AWS Secrets Manager secret that you provide the secretArn. | 
| authType | Choose between serviceAccount and OAuth2 based on your use case. | 
| repositoryConfigurations | Configuration information for the content of the data source. For example, configuring specific types of content and field mappings. | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  |  A list of objects that map the attributes or field names of your Google Drive to Amazon Kendra index field names. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html). | 
| additionalProperties | Additional configuration options for your content in your data source | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | Specify a file size limit in MBs that Amazon Kendra should crawl. | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | true to crawl comments in your Google Drive data source. | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | true to crawl MyDrive and Shared With Me Drives in your Google Drive data source. | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | true to crawl Shared Drives in your Google Drive data source. | 
| isCrawlAcl | true to crawl the access control list (ACL) information for your documents, if you have an ACL and want to use it for access control. The ACL specifies which documents that users and groups can access and search. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see [User context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources). | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | A list of regular expression patterns to exclude certain files in your Google Drive data source. Files that match the patterns are excluded from the index. Files that don't match the patterns are included in the index. If a file matches both an exclusion and inclusion pattern, the exclusion pattern takes precedence, and the file isn't included in the index. | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | A list of regular expression patterns to include certain files in your Google Drive data source. Files that match the patterns are included in the index. Files that don't match the patterns are excluded from the index. If a file matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence, and the file isn't included in the index. | 
| type | The type of data source. Specify GOOOGLEDRIVEV2 as your data source type. | 
| enableIdentityCrawler | true to use Amazon Kendra's identity crawler to sync identity/principal information on users and groups with access to certain documents. If identity crawler is turned off, all documents can be publicly searched. If you want to use access control for your documents and identity crawler is turned off, you can alternatively use the [https://docs.aws.amazon.com/kendra/latest/dg/API_PutPrincipalMapping.html](https://docs.aws.amazon.com/kendra/latest/dg/API_PutPrincipalMapping.html) API to upload user and group access information. | 
| syncMode |  Specify how Amazon Kendra should update your index when your data source content changes. You can choose between: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | 
| secretARN | The Amazon Resource Name (ARN) of an AWS Secrets Manager secret that contains the key-value pairs required to connect to your Google Drive. The secret must contain a JSON structure with the following keys: ****If using Google Service Account authentication:<pre>{<br />    "clientEmail": "user account email",<br />    "adminAccountEmail": "service account email",<br />    "privateKey": "private key"<br />}</pre> ****If using OAuth 2.0 authentication: <pre>{<br />    "clientID": "OAuth client ID",<br />    "clientSecret": "client secret",<br />    "refreshToken": "refresh token"<br />}</pre> | 
| version | The version of this template that is currently supported. | 

### Google Drive JSON schema
<a name="googledrive-json"></a>

```
{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "type": "object",
  "properties": {
    "connectionConfiguration": {
      "type": "object",
      "properties": {
        "repositoryEndpointMetadata": {
          "type": "object",
          "properties": {
            "authType": {
              "type": "string",
              "enum": [
                "serviceAccount",
                "OAuth2"
              ]
            }
          },
          "required": [
            "authType"
          ]
        }
      },
      "required": [
        "repositoryEndpointMetadata"
      ]
    },
    "repositoryConfigurations": {
      "type": "object",
      "properties": {
        "file": {
          "type": "object",
          "properties": {
            "fieldMappings": {
              "type": "array",
              "items": [
                {
                  "type": "object",
                  "properties": {
                    "indexFieldName": {
                      "type": "string"
                    },
                    "indexFieldType": {
                      "type": "string",
                      "enum": [
                        "STRING",
                        "DATE",
                        "STRING_LIST",
                        "LONG"
                      ]
                    },
                    "dataSourceFieldName": {
                      "type": "string"
                    },
                    "dateFieldFormat": {
                      "type": "string",
                      "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                    }
                  },
                  "required": [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required": [
            "fieldMappings"
          ]
        },
        "comment": {
          "type": "object",
          "properties": {
            "fieldMappings": {
              "type": "array",
              "items": [
                {
                  "type": "object",
                  "properties": {
                    "indexFieldName": {
                      "type": "string"
                    },
                    "indexFieldType": {
                      "type": "string",
                      "enum": [
                        "STRING",
                        "DATE",
                        "STRING_LIST"
                      ]
                    },
                    "dataSourceFieldName": {
                      "type": "string"
                    },
                    "dateFieldFormat": {
                      "type": "string",
                      "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                    }
                  },
                  "required": [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required": [
            "fieldMappings"
          ]
        }
      }
    },
    "additionalProperties": {
      "type": "object",
      "properties": {
        "maxFileSizeInMegaBytes": {
          "type": "string"
        },
        "isCrawlComment": {
          "type": "boolean"
        },
        "isCrawlMyDriveAndSharedWithMe": {
          "type": "boolean"
        },
        "isCrawlSharedDrives": {
          "type": "boolean"
        },
        "isCrawlAcl": {
          "type": "boolean"
        },
        "excludeUserAccounts": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "excludeSharedDrives": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "excludeMimeTypes": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "includeUserAccounts": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "includeSharedDrives": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "includeMimeTypes": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "includeTargetAudienceGroup": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "inclusionFileTypePatterns": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "inclusionFileNamePatterns": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "exclusionFileTypePatterns": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "exclusionFileNamePatterns": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "inclusionFilePathFilter": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "exclusionFilePathFilter": {
          "type": "array",
          "items": {
            "type": "string"
          }
        }
      }
    },
    "type": {
      "type": "string",
      "pattern": "GOOGLEDRIVEV2"
    },
    "enableIdentityCrawler": {
      "type": "boolean"
    },
    "syncMode": {
      "type": "string",
      "enum": [
        "FORCED_FULL_CRAWL",
        "FULL_CRAWL",
        "CHANGE_LOG"
      ]
    },
    "secretArn": {
      "type": "string",
      "minLength": 20,
      "maxLength": 2048
    }
  },
  "version": {
    "type": "string",
    "anyOf": [
      {
        "pattern": "1.0.0"
      }
    ]
  },
  "required": [
    "connectionConfiguration",
    "repositoryConfigurations",
    "syncMode",
    "additionalProperties",
    "secretArn",
    "type"
  ]
}
```

## IBM DB2 template schema
<a name="ds-ibm-db2-schema"></a>

You include a JSON that contains the data source schema as part of the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html) object. Specify the type of data source as `JDBC`, the database type as `db2`, a secret for your authentication credentials, and other necessary configurations. You then specify `TEMPLATE` as the `Type` when you call [CreateDataSource](https://docs.aws.amazon.com/kendra/latest/APIReference/API_CreateDataSource.html).

You can use the template provided in this developer guide. See [IBM DB2 JSON schema](#ibm-db2-json).

The following table describes the parameters of the IBM DB2 JSON schema.


| Configuration | Description | 
| --- | --- | 
| connectionConfiguration | Configuration information for the endpoint for the data source. | 
| repositoryEndpointMetadata | Required configuration information for connecting your data source.[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html) | 
| repositoryConfigurations | Configuration information for the content of the data source. For example, configuring specific types of content and field mappings. Specify the type of data source and the secret ARN. | 
|  document  |  A list of objects that map the attributes or field names of your database content to Amazon Kendra index field names. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html). | 
| additionalProperties | Additional configuration options for your content in your data source. Use to include or exclude specific content in your database data source. | 
| primaryKey  | Provide the primary key for the database table. This identifies a table within your database. | 
| titleColumn | Provide the name of the document title column within your database table. | 
| bodyColumn | Provide the name of the document title column within your database table. | 
| sqlQuery | Enter SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB. Amazon Kendra will crawl all database content that matches your query. | 
| timestampColumn | Enter the name of the column which contains time stamps. Amazon Kendra uses time stamp information to detect changes in your content and sync only changed content. | 
| timestampFormat | Enter the name of the column which contains time stamp formats to use to detect content changes and re-sync your content. | 
| timezone | Enter the name of the column which contains time zones for the content to be crawled. | 
| changeDetectingColumns | Enter the names of the columns that Amazon Kendra will use to detect content changes. Amazon Kendra will re-index content when there is a change in any of these columns | 
| allowedUsersColumns | Enter the name of the column which contains User IDs to be allowed access to content. | 
| allowedGroupsColumn | Enter the name of the column which contains User IDs to be allowed access to content. | 
| sourceURIColumn | Enter the name of the column which contains Source URLs to be indexed. | 
| isSslEnabled | Enter SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB. Amazon Kendra will crawl all database content that matches your query. | 
| type | The type of data source. Specify JDBC as your data source type. | 
| syncMode |  Specify how Amazon Kendra should update your index when your data source content changes. You can choose between: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | 
| secretArn | The Amazon Resource Name (ARN) of a Secrets Manager secret that contains user name and password required to connect to your database. The secret must contain a JSON structure with the following keys: <pre>{<br />    "user name": "database user name",<br />    "password": "password"<br />}</pre> | 
| version | The version of the template that is currently supported. | 

### IBM DB2 JSON schema
<a name="ibm-db2-json"></a>

```
{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "type": "object",
  "properties": {
    "connectionConfiguration": {
      "type": "object",
      "properties": {
        "repositoryEndpointMetadata": {
          "type": "object",
          "properties": {
            "dbType": {
              "type": "string",
              "enum": [
                "mysql",
                "db2",
                "postgresql",
                "oracle",
                "sqlserver"
              ]
            },
            "dbHost": {
              "type": "string"
            },
            "dbPort": {
              "type": "string"
            },
            "dbInstance": {
              "type": "string"
            }
          },
          "required": [
            "dbType",
            "dbHost",
            "dbPort",
            "dbInstance"
          ]
        }
      },
      "required": [
        "repositoryEndpointMetadata"
      ]
    },
    "repositoryConfigurations": {
      "type": "object",
      "properties": {
        "document": {
          "type": "object",
          "properties": {
            "fieldMappings": {
              "type": "array",
              "items": [
                {
                  "type": "object",
                  "properties": {
                    "indexFieldName": {
                      "type": "string"
                    },
                    "indexFieldType": {
                      "type": "string"
                    },
                    "dataSourceFieldName": {
                      "type": "string"
                    }
                  },
                  "required": [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required": [
            "fieldMappings"
          ]
        }
      },
      "required": [
      ]
    },
    "additionalProperties": {
      "type": "object",
      "properties": {
        "primaryKey": {
          "type": "string"
        },
        "titleColumn": {
          "type": "string"
        },
        "bodyColumn": {
          "type": "string"
        },
        "sqlQuery": {
          "type": "string",
          "not": {
            "pattern": ";+"
          }
        },
        "timestampColumn": {
          "type": "string"
        },
        "timestampFormat": {
          "type": "string"
        },
        "timezone": {
          "type": "string"
        },
        "changeDetectingColumns": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "allowedUsersColumn": {
          "type": "string"
        },
        "allowedGroupsColumn": {
          "type": "string"
        },
        "sourceURIColumn": {
          "type": "string"
        },
        "isSslEnabled": {
          "type": "boolean"
        }
      },
      "required": ["primaryKey", "titleColumn", "bodyColumn", "sqlQuery"]
    },
    "type" : {
      "type" : "string",
      "pattern": "JDBC"
    },
    "syncMode": {
      "type": "string",
      "enum": [
        "FORCED_FULL_CRAWL",
        "FULL_CRAWL",
        "CHANGE_LOG"
      ]
    },
    "secretArn": {
      "type": "string"
    }
  },
  "version": {
    "type": "string",
    "anyOf": [
      {
        "pattern": "1.0.0"
      }
    ]
  },
  "required": [
      "connectionConfiguration",
      "repositoryConfigurations",
      "syncMode",
      "additionalProperties",
      "secretArn",
      "type"
  ]
}
```

## Microsoft Exchange template schema
<a name="ds-msexchange-schema"></a>

You include a JSON that contains the data source schema as part of the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html) object. You provide the tenant ID as as a part of the connection configuration or repository endpoint details. Also specify the type of data source as `MSEXCHANGE`, a secret for your authentication credentials, and other necessary configurations. You then specify `TEMPLATE` as the `Type` when you call [CreateDataSource](https://docs.aws.amazon.com/kendra/latest/APIReference/API_CreateDataSource.html).

You can use the template provided in this developer guide. See [Microsoft Exchange JSON schema](#msexchange-json).

The following table describes the parameters of the Microsoft Exchange JSON schema.


| Configuration | Description | 
| --- | --- | 
| connectionConfiguration | Configuration information for the endpoint for the data source. | 
| repositoryEndpointMetadata | The endpoint information for the data source. | 
| tenantId | The Microsoft 365 tenant ID. You can find your tenant ID in the Properties of your Azure Active Directory Portal or in your OAuth application. | 
| repositoryConfigurations | Configuration information for the content of the data source. For example, configuring specific types of content and field mappings. | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | A list of objects that map the attributes or field names of your Microsoft Exchange data source to Amazon Kendra index fields. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html). | 
| additionalProperties | Additional configuration options for content in your data source | 
| inclusionPatterns | A list of regular expression patterns to include certain files in your Microsoft Exchange data source. Files that match the patterns are included in the index. Files that don't match the patterns are excluded from the index. If a file matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence and the file isn't included in the index. | 
| exclusionPatterns | A list of regular expression patterns to exclude certain files in your Microsoft Exchange data source. Files that match the patterns are excluded from the index. Files that don't match the patterns are included in the index. If a file matches both an exclusion and inclusion pattern, the exclusion pattern takes precedence and the file isn't included in the index. | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | A list of regular expression patterns to include certain users and user files in your Microsofot Exchange data source. Users that match the patterns are included in the index. Users that don't match the patterns are excluded from the index. If a user matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence and the user isn't included in the index. | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | A list of regular expression patterns to exclude certain users and user files in your Microsoft Exchange data source. Users that match the patterns are excluded from the index. Users that don't match the patterns are included in the index. If a user matches both an exclusion and inclusion pattern, the exclusion pattern takes precedence and the user isn't included in the index. | 
| s3bucketName | The name of your S3 bucket if that you want to use. | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | true to crawl these types of content and access control information your Microsoft Exchange data source. | 
| startCalendarDateTime | You can configure a specific start date-time for your calendar content. | 
| endCalendarDateTime | You can configure a specific end date-time for calendar content. | 
| subject | You can configure a specific subject line for your mail content. | 
| emailFrom | You can configure a specific email for your 'From' or sender mail content. | 
| emailTo | You can configure a specific email for your 'To' or recipient mail content. | 
| syncMode |  Specify how Amazon Kendra should update your index when your data source content changes. You can choose between: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | 
| type | The type of data source. Specify MSEXCHANGE as your data source type. | 
| secretARN | The Amazon Resource Name (ARN) of an AWS Secrets Manager secret that contains the key-value pairs required to connect to your Microsoft Exchange. This includes your client ID and your client secret that is generated when you create an OAuth application in the Azure portal. | 
| version | The version of this template that is currently supported. | 

### Microsoft Exchange JSON schema
<a name="msexchange-json"></a>

```
{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "type": "object",
  "properties": {
    "connectionConfiguration": {
      "type": "object",
      "properties": {
        "repositoryEndpointMetadata": {
          "type": "object",
          "properties": {
            "tenantId": {
              "type": "string",
              "pattern": "^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$",
              "minLength": 36,
              "maxLength": 36
            }
          },
          "required": ["tenantId"]
        }
      }
    },
    "repositoryConfigurations": {
      "type": "object",
      "properties": {
        "email": {
          "type": "object",
          "properties": {
            "fieldMappings": {
              "type": "array",
              "items": [
                {
                  "type": "object",
                  "properties": {
                    "indexFieldName": {
                      "type": "string"
                    },
                    "indexFieldType": {
                      "type": "string",
                      "enum": ["STRING", "STRING_LIST", "DATE"]
                    },
                    "dataSourceFieldName": {
                      "type": "string"
                    },
                    "dateFieldFormat": {
                      "type": "string",
                      "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                    }
                  },
                  "required": [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required": [
            "fieldMappings"
          ]
        },
        "attachment": {
          "type": "object",
          "properties": {
            "fieldMappings": {
              "type": "array",
              "items": [
                {
                  "type": "object",
                  "properties": {
                    "indexFieldName": {
                      "type": "string"
                    },
                    "indexFieldType": {
                      "type": "string",
                      "enum": ["STRING", "DATE","LONG"]
                    },
                    "dataSourceFieldName": {
                      "type": "string"
                    },
                    "dateFieldFormat": {
                      "type": "string",
                      "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                    }
                  },
                  "required": [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required": [
            "fieldMappings"
          ]
        },
        "calendar": {
          "type": "object",
          "properties": {
            "fieldMappings": {
              "type": "array",
              "items": [
                {
                  "type": "object",
                  "properties": {
                    "indexFieldName": {
                      "type": "string"
                    },
                    "indexFieldType": {
                      "type": "string",
                      "enum": ["STRING", "STRING_LIST", "DATE"]
                    },
                    "dataSourceFieldName": {
                      "type": "string"
                    },
                    "dateFieldFormat": {
                      "type": "string",
                      "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                    }
                  },
                  "required": [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required": [
            "fieldMappings"
          ]
        },
        "contacts": {
          "type": "object",
          "properties": {
            "fieldMappings": {
              "type": "array",
              "items": [
                {
                  "type": "object",
                  "properties": {
                    "indexFieldName": {
                      "type": "string"
                    },
                    "indexFieldType": {
                      "type": "string",
                      "enum": ["STRING", "STRING_LIST", "DATE"]
                    },
                    "dataSourceFieldName": {
                      "type": "string"
                    },
                    "dateFieldFormat": {
                      "type": "string",
                      "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                    }
                  },
                  "required": [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required": [
            "fieldMappings"
          ]
        },
        "notes": {
          "type": "object",
          "properties": {
            "fieldMappings": {
              "type": "array",
              "items": [
                {
                  "type": "object",
                  "properties": {
                    "indexFieldName": {
                      "type": "string"
                    },
                    "indexFieldType": {
                      "type": "string",
                      "enum": ["STRING", "DATE"]
                    },
                    "dataSourceFieldName": {
                      "type": "string"
                    },
                    "dateFieldFormat": {
                      "type": "string",
                      "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                    }
                  },
                  "required": [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required": [
            "fieldMappings"
          ]
        }
      },
      "required": ["email"
      ]
    },
    "additionalProperties": {
      "type": "object",
      "properties": {
        "inclusionPatterns": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "exclusionPatterns": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "inclusionUsersList": {
          "type": "array",
          "items": {
            "type": "string",
            "format": "email"
          }
        },
        "exclusionUsersList": {
          "type": "array",
          "items": {
            "type": "string",
            "format": "email"
          }
        },
        "s3bucketName": {
          "type": "string"
        },
        "inclusionUsersFileName": {
          "type": "string"
        },
        "exclusionUsersFileName": {
          "type": "string"
        },
        "inclusionDomainUsers": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "exclusionDomainUsers": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "crawlCalendar": {
          "type": "boolean"
        },
        "crawlNotes": {
          "type": "boolean"
        },
        "crawlContacts": {
          "type": "boolean"
        },
        "crawlFolderAcl": {
          "type": "boolean"
        },
        "startCalendarDateTime": {
          "anyOf": [
            {
              "type": "string",
              "pattern": "^[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}Z$"
            },
            {
              "type": "string",
              "pattern": ""
            }
          ]
        },
        "endCalendarDateTime": {
          "anyOf": [
            {
            "type": "string",
            "pattern": "^[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}Z$"
            },
            {
              "type": "string",
              "pattern": ""
            }
          ]
        },
        "subject": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "emailFrom": {
          "type": "array",
          "items": {
            "type": "string",
            "format": "email"
          }
        },
        "emailTo": {
          "type": "array",
          "items": {
            "type": "string",
            "format": "email"
          }
        }
      },
      "required": [
      ]
    },
    "syncMode": {
      "type": "string",
      "enum": [
        "FORCED_FULL_CRAWL",
        "FULL_CRAWL",
        "CHANGE_LOG"
      ]
    },
    "type" : {
      "type" : "string",
      "pattern": "MSEXCHANGE"
    },
    "secretArn": {
      "type": "string"
    }
  },
  "version": {
    "type": "string",
    "anyOf": [
      {
        "pattern": "1.0.0"
      }
    ]
  },
  "required": [
    "connectionConfiguration",
    "repositoryConfigurations",
    "syncMode",
    "additionalProperties",
    "secretArn",
    "type"
  ]
}
```

## Microsoft OneDrive template schema
<a name="ds-onedrive-schema"></a>

You include a JSON that contains the data source schema as part of the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html) object. You provide the tenant ID as part of the connection configuration or repository endpoint details. Also specify the type of data source as `ONEDRIVEV2`, and a secret for your authentication credentials, and other necessary configurations. You then specify `TEMPLATE` as the `Type` when you call [CreateDataSource](https://docs.aws.amazon.com/kendra/latest/APIReference/API_CreateDataSource.html).

You can use the template provided in this developer guide. See [Microsoft OneDrive JSON schema](#onedrive-json).

The following table describes the parameters of the Microsoft OneDrive JSON schema.


| Configuration | Description | 
| --- | --- | 
| connectionConfiguration | Configuration information for the endpoint for the data source. | 
| repositoryEndpointMetadata | The endpoint information for the data source. | 
| tenantId | The Microsoft 365 tenant ID. You can find your tenant ID in the Properties of your Azure Active Directory Portal or in your OAuth application. | 
| repositoryConfigurations | Configuration information for the content of the data source. For example, configuring specific types of content and field mappings. | 
| file | A list of objects that map the attributes or field names of your Microsoft OneDrive files to Amazon Kendra index field names. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html). | 
| additionalProperties | Additional configuration options for your content in your data source | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | You can choose to index specific files, OneNote sections, OneNote pages, and filter by user name. | 
| isUserNameOnS3 | true to provide a list of user names in a file stored in an Amazon S3. | 
| type | The type of data source. Specify ONEDRIVEV2 as your data source type. | 
| enableIdentityCrawler | true to use Amazon Kendra's identity crawler to sync identity/principal information on users and groups with access to certain documents. If identity crawler is turned off, all documents can be publicly searched. If you want to use access control for your documents and identity crawler is turned off, you can alternatively use the [https://docs.aws.amazon.com/kendra/latest/dg/API_PutPrincipalMapping.html](https://docs.aws.amazon.com/kendra/latest/dg/API_PutPrincipalMapping.html) API to upload user and group access information. | 
| type | The type of data source. Specify ONEDRIVEV2 as your data source type. | 
| syncMode |  Specify how Amazon Kendra should update your index when your data source content changes. You can choose between: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | 
| secretARN | The Amazon Resource Name (ARN) of an AWS Secrets Manager secret that contains the key-value pairs required to connect to your Microsoft OneDrive. The secret must contain a JSON structure with the following keys: <pre>{<br />    "clientId": "client ID",<br />    "clientSecret": "client secret"<br />}</pre> | 
| version | The version of this template that is currently supported. | 

### Microsoft OneDrive JSON schema
<a name="onedrive-json"></a>

```
{
	"$schema": "http://json-schema.org/draft-04/schema#",
	"type": "object",
	"properties": {
		"connectionConfiguration": {
			"type": "object",
			"properties": {
				"repositoryEndpointMetadata": {
					"type": "object",
					"properties": {
						"tenantId": {
							"type": "string",
							"pattern": "^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$",
							"minLength": 36,
							"maxLength": 36
						}
					},
					"required": [
						"tenantId"
					]
				}
			},
			"required": [
				"repositoryEndpointMetadata"
			]
		},
		"repositoryConfigurations": {
			"type": "object",
			"properties": {
				"file": {
					"type": "object",
					"properties": {
						"fieldMappings": {
							"type": "array",
							"items": [
								{
									"type": "object",
									"properties": {
										"indexFieldName": {
											"type": "string"
										},
										"indexFieldType": {
											"type": "string",
											"enum": [
												"STRING",
												"STRING_LIST",
												"DATE",
												"LONG"
											]
										},
										"dataSourceFieldName": {
											"type": "string"
										},
										"dateFieldFormat": {
											"type": "string",
											"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
										}
									},
									"required": [
										"indexFieldName",
										"indexFieldType",
										"dataSourceFieldName"
									]
								}
							]
						}
					},
					"required": [
						"fieldMappings"
					]
				}
			}
		},
		"additionalProperties": {
			"type": "object",
			"properties": {
				"userNameFilter": {
					"type": "array",
					"items": {
						"type": "string"
					}
				},
				"userFilterPath": {
					"type": "string"
				},
				"isUserNameOnS3": {
					"type": "boolean"
				},
				"inclusionFileTypePatterns": {
					"type": "array",
					"items": {
						"type": "string"
					}
				},
				"exclusionFileTypePatterns": {
					"type": "array",
					"items": {
						"type": "string"
					}
				},
				"inclusionFileNamePatterns": {
					"type": "array",
					"items": {
						"type": "string"
					}
				},
				"exclusionFileNamePatterns": {
					"type": "array",
					"items": {
						"type": "string"
					}
				},
				"inclusionFilePathPatterns": {
					"type": "array",
					"items": {
						"type": "string"
					}
				},
				"exclusionFilePathPatterns": {
					"type": "array",
					"items": {
						"type": "string"
					}
				},
				"inclusionOneNoteSectionNamePatterns": {
					"type": "array",
					"items": {
						"type": "string"
					}
				},
				"exclusionOneNoteSectionNamePatterns": {
					"type": "array",
					"items": {
						"type": "string"
					}
				},
				"inclusionOneNotePageNamePatterns": {
					"type": "array",
					"items": {
						"type": "string"
					}
				},
				"exclusionOneNotePageNamePatterns": {
					"type": "array",
					"items": {
						"type": "string"
					}
				}
			},
			"required": []
		},

		"enableIdentityCrawler": {
			"type": "boolean"
		},
		"type": {
			"type": "string",
			"pattern": "ONEDRIVEV2"
		},
		"syncMode": {
			"type": "string",
			"enum": [
				"FULL_CRAWL",
				"FORCED_FULL_CRAWL",
				"CHANGE_LOG"
			]
		},
		"secretArn": {
			"type": "string",
			"minLength": 20,
			"maxLength": 2048
		}
	},
	"version": {
		"type": "string",
		"anyOf": [
			{
				"pattern": "1.0.0"
			}
		]
	},
	"required": [
		"connectionConfiguration",
		"repositoryConfigurations",
		"syncMode",
		"additionalProperties",
		"secretArn",
		"type"
	]
}
```

## Microsoft SharePoint template schema
<a name="ds-schema-sharepoint"></a>

You include a JSON that contains the data source schema as part of [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html) object. You provide the SharePoint site URL/URLs, domain, and also a tenant ID if required as a part of the connection configuration or repository endpoint details. Also specify the type of data source as `SHAREPOINTV2`, a secret for your authentication credentials, and other necessary configurations. You then specify `TEMPLATE` as the **Type** when you call [CreateDataSource](https://docs.aws.amazon.com/kendra/latest/APIReference/API_CreateDataSource.html).

You can use the template provided in this developer guide. See [SharePoint JSON schema](#sharepoint-json).

The following table describes the parameters of the Microsoft SharePoint JSON schema.


| Configuration | Description | 
| --- | --- | 
| connectionConfiguration | Configuration information for the endpoint for the data source | 
| repositoryEndpointMetadata | The endpoint information for the data source | 
| tenantId | The tenant id of your SharePoint account. | 
| domain | The domain of your SharePoint account. | 
| siteUrls | The host URLs of your SharePoint account. | 
| repositoryAdditionalProperties | Additional properties to connect with the repository/data source endpoint. | 
| s3bucketName | The name of the Amazon S3 bucket that stores your Azure AD self-signed X.509 certificate. | 
| s3certificateName | The name of the Azure AD self-signed X.509 certificate stored in your Amazon S3 bucket. | 
| authType | The type of authentication that you use, whether OAuth2, OAuth2Certificate, OAuth2App, Basic, OAuth2\$1RefreshToken, NTLM, or Kerberos. | 
| version | The SharePoint version that you use, whether Server or Online. | 
| onPremVersion | The SharePoint Server version that you use, whether 2013, 2016 2019, or SubscriptionEdition. | 
| repositoryConfigurations | Configuration information for the content of the data source. For example, configuring specific types of content and field mappings. | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | A list of objects that map the attributes or field names of your SharePoint content to Amazon Kendra index field names. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html). | 
| additionalProperties | Additional configuration options for your content in your data source. | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html) | A list of regular expression patterns to include/exclude certain content in your SharePoint data source. Content itmes that match the inclusion patterns are included in the index. Content items that don't match the inclusion patterns are excluded from the index. If a file matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence, and the file isn't included in the index. | 
| [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html) | true to crawl these types of content. | 
| crawlAcl | true to crawl the access control list (ACL) information for your documents, if you have an ACL and want to use it for access control. The ACL specifies which documents that users and groups can access and search. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see [User context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources). | 
| fieldForUserId | Specify either email if you want to use the user email for the user ID, or userPrincipalName if you want to use a user name for the user ID. If you don't specify an option then email is used by default. | 
| aclConfiguration | Specify either ACLWithLDAPEmailFmt, ACLWithManualEmailFmt, or ACLWithUsernameFmtM. | 
| emailDomain | The domain of the email. For example, "amazon.com". | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | true to crawl group mapping information. | 
| proxyHost | The host name of the web proxy that you use, without the http:// or https:// protocol. | 
| proxyPort | The port number used by the host URL transport protocol. Must be a numeric value between 0 and 65535. | 
| type | Specify SHAREPOINTV2 as your data source type | 
| enableIdentityCrawler | true to use Amazon Kendra's identity crawler to sync identity/principal information on users and groups with access to certain documents. If identity crawler is turned off, all documents can be publicly searched. If you want to use access control for your documents and identity crawler is turned off, you can alternatively use the [https://docs.aws.amazon.com/kendra/latest/dg/API_PutPrincipalMapping.html](https://docs.aws.amazon.com/kendra/latest/dg/API_PutPrincipalMapping.html) API to upload user and group access information. | 
| syncMode |  Specify how Amazon Kendra should update your index when your data source content changes. You can choose between: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | 
| secretARN | The Amazon Resource Name (ARN) of an AWS Secrets Manager secret that contains the key-value pairs required to connect to your SharePoint. For information on these key-value pairs, see [Connection instructions for SharePoint Online and SharePoint Server](https://docs.aws.amazon.com/kendra/latest/dg/data-source-v2-sharepoint.html#data-source-procedure-v2-sharepoint). | 
| version | The version of this template that is currently supported. | 

## SharePoint JSON schema
<a name="sharepoint-json"></a>

```
{
	"$schema": "http://json-schema.org/draft-04/schema#",
	"type": "object",
	"properties": {
		"connectionConfiguration": {
			"type": "object",
			"properties": {
				"repositoryEndpointMetadata": {
					"type": "object",
					"properties": {
						"tenantId": {
							"type": "string",
							"pattern": "^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$",
							"minLength": 36,
							"maxLength": 36
						},
						"domain": {
							"type": "string"
						},
						"siteUrls": {
							"type": "array",
							"items": {
								"type": "string",
								"pattern": "https://.*"
							}
						},
						"repositoryAdditionalProperties": {
							"type": "object",
							"properties": {
								"s3bucketName": {
									"type": "string"
								},
								"s3certificateName": {
									"type": "string"
								},
								"authType": {
									"type": "string",
									"enum": [
										"OAuth2",
										"OAuth2Certificate",
										"OAuth2App",
										"Basic",
										"OAuth2_RefreshToken",
										"NTLM",
										"Kerberos"
									]
								},
								"version": {
									"type": "string",
									"enum": [
										"Server",
										"Online"
									]
								},
								"onPremVersion": {
									"type": "string",
									"enum": [
										"",
										"2013",
										"2016",
										"2019",
										"SubscriptionEdition"
									]
								}
							},
							"required": [
								"authType",
								"version"
							]
						}
					},
					"required": [
						"siteUrls",
						"domain",
						"repositoryAdditionalProperties"
					]
				}
			},
			"required": [
				"repositoryEndpointMetadata"
			]
		},
		"repositoryConfigurations": {
			"type": "object",
			"properties": {
				"event": {
					"type": "object",
					"properties": {
						"fieldMappings": {
							"type": "array",
							"items": [
								{
									"type": "object",
									"properties": {
										"indexFieldName": {
											"type": "string"
										},
										"indexFieldType": {
											"type": "string",
											"enum": [
												"STRING",
												"STRING_LIST",
												"DATE"
											]
										},
										"dataSourceFieldName": {
											"type": "string"
										},
										"dateFieldFormat": {
											"type": "string",
											"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
										}
									},
									"required": [
										"indexFieldName",
										"indexFieldType",
										"dataSourceFieldName"
									]
								}
							]
						}
					},
					"required": [
						"fieldMappings"
					]
				},
				"page": {
					"type": "object",
					"properties": {
						"fieldMappings": {
							"type": "array",
							"items": [
								{
									"type": "object",
									"properties": {
										"indexFieldName": {
											"type": "string"
										},
										"indexFieldType": {
											"type": "string",
											"enum": [
												"STRING",
												"DATE",
												"LONG"
											]
										},
										"dataSourceFieldName": {
											"type": "string"
										},
										"dateFieldFormat": {
											"type": "string",
											"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
										}
									},
									"required": [
										"indexFieldName",
										"indexFieldType",
										"dataSourceFieldName"
									]
								}
							]
						}
					},
					"required": [
						"fieldMappings"
					]
				},
				"file": {
					"type": "object",
					"properties": {
						"fieldMappings": {
							"type": "array",
							"items": [
								{
									"type": "object",
									"properties": {
										"indexFieldName": {
											"type": "string"
										},
										"indexFieldType": {
											"type": "string",
											"enum": [
												"STRING",
												"DATE",
												"LONG"
											]
										},
										"dataSourceFieldName": {
											"type": "string"
										},
										"dateFieldFormat": {
											"type": "string",
											"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
										}
									},
									"required": [
										"indexFieldName",
										"indexFieldType",
										"dataSourceFieldName"
									]
								}
							]
						}
					},
					"required": [
						"fieldMappings"
					]
				},
				"link": {
					"type": "object",
					"properties": {
						"fieldMappings": {
							"type": "array",
							"items": [
								{
									"type": "object",
									"properties": {
										"indexFieldName": {
											"type": "string"
										},
										"indexFieldType": {
											"type": "string",
											"enum": [
												"STRING",
												"STRING_LIST",
												"DATE"
											]
										},
										"dataSourceFieldName": {
											"type": "string"
										},
										"dateFieldFormat": {
											"type": "string",
											"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
										}
									},
									"required": [
										"indexFieldName",
										"indexFieldType",
										"dataSourceFieldName"
									]
								}
							]
						}
					},
					"required": [
						"fieldMappings"
					]
				},
				"attachment": {
					"type": "object",
					"properties": {
						"fieldMappings": {
							"type": "array",
							"items": [
								{
									"type": "object",
									"properties": {
										"indexFieldName": {
											"type": "string"
										},
										"indexFieldType": {
											"type": "string",
											"enum": [
												"STRING",
												"STRING_LIST",
												"DATE"
											]
										},
										"dataSourceFieldName": {
											"type": "string"
										},
										"dateFieldFormat": {
											"type": "string",
											"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
										}
									},
									"required": [
										"indexFieldName",
										"indexFieldType",
										"dataSourceFieldName"
									]
								}
							]
						}
					},
					"required": [
						"fieldMappings"
					]
				},
				"comment": {
					"type": "object",
					"properties": {
						"fieldMappings": {
							"type": "array",
							"items": [
								{
									"type": "object",
									"properties": {
										"indexFieldName": {
											"type": "string"
										},
										"indexFieldType": {
											"type": "string",
											"enum": [
												"STRING",
												"STRING_LIST",
												"DATE"
											]
										},
										"dataSourceFieldName": {
											"type": "string"
										},
										"dateFieldFormat": {
											"type": "string",
											"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
										}
									},
									"required": [
										"indexFieldName",
										"indexFieldType",
										"dataSourceFieldName"
									]
								}
							]
						}
					},
					"required": [
						"fieldMappings"
					]
				}
			}
		},
		"additionalProperties": {
			"type": "object",
			"properties": {
				"eventTitleFilterRegEx": {
					"type": "array",
					"items": {
						"type": "string"
					}
				},
				"pageTitleFilterRegEx": {
					"type": "array",
					"items": {
						"type": "string"
					}
				},
				"linkTitleFilterRegEx": {
					"type": "array",
					"items": {
						"type": "string"
					}
				},
				"inclusionFilePath": {
					"type": "array",
					"items": {
						"type": "string"
					}
				},
				"exclusionFilePath": {
					"type": "array",
					"items": {
						"type": "string"
					}
				},
				"inclusionFileTypePatterns": {
					"type": "array",
					"items": {
						"type": "string"
					}
				},
				"exclusionFileTypePatterns": {
					"type": "array",
					"items": {
						"type": "string"
					}
				},
				"inclusionFileNamePatterns": {
					"type": "array",
					"items": {
						"type": "string"
					}
				},
				"exclusionFileNamePatterns": {
					"type": "array",
					"items": {
						"type": "string"
					}
				},
				"inclusionOneNoteSectionNamePatterns": {
					"type": "array",
					"items": {
						"type": "string"
					}
				},
				"exclusionOneNoteSectionNamePatterns": {
					"type": "array",
					"items": {
						"type": "string"
					}
				},
				"inclusionOneNotePageNamePatterns": {
					"type": "array",
					"items": {
						"type": "string"
					}
				},
				"exclusionOneNotePageNamePatterns": {
					"type": "array",
					"items": {
						"type": "string"
					}
				},
				"crawlFiles": {
					"type": "boolean"
				},
				"crawlPages": {
					"type": "boolean"
				},
				"crawlEvents": {
					"type": "boolean"
				},
				"crawlComments": {
					"type": "boolean"
				},
				"crawlLinks": {
					"type": "boolean"
				},
				"crawlAttachments": {
					"type": "boolean"
				},
				"crawlListData": {
					"type": "boolean"
				},
				"crawlAcl": {
					"type": "boolean"
				},
				"fieldForUserId": {
					"type": "string"
				},
				"aclConfiguration": {
					"type": "string",
					"enum": [
						"ACLWithLDAPEmailFmt",
						"ACLWithManualEmailFmt",
						"ACLWithUsernameFmt"
					]
				},
				"emailDomain": {
					"type": "string"
				},
				"isCrawlLocalGroupMapping": {
					"type": "boolean"
				},
				"isCrawlAdGroupMapping": {
					"type": "boolean"
				},
				"proxyHost": {
					"type": "string"
				},
				"proxyPort": {
					"type": "string"
				}
			},
			"required": [
			]
		},
		"type": {
			"type": "string",
			"pattern": "SHAREPOINTV2"
		},
		"enableIdentityCrawler": {
			"type": "boolean"
		},
		"syncMode": {
			"type": "string",
			"enum": [
				"FULL_CRAWL",
				"FORCED_FULL_CRAWL",
				"CHANGE_LOG"
			]
		},
		"secretArn": {
			"type": "string",
			"minLength": 20,
			"maxLength": 2048
		}
	},
	"version": {
		"type": "string",
		"anyOf": [
			{
				"pattern": "1.0.0"
			}
		]
	},
	"required": [
		"connectionConfiguration",
		"repositoryConfigurations",
		"enableIdentityCrawler",
		"syncMode",
		"additionalProperties",
		"secretArn",
		"type"
	]
}
```

## Microsoft SQL Server template schema
<a name="ds-ms-sql-server-schema"></a>

You include a JSON that contains the data source schema as part of the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html) object. Specify the type of data source as `JDBC`, the database type as `sqlserver`, a secret for your authentication credentials, and other necessary configurations. You then specify `TEMPLATE` as the `Type` when you call [CreateDataSource](https://docs.aws.amazon.com/kendra/latest/APIReference/API_CreateDataSource.html).

You can use the template provided in this developer guide. See [Microsoft SQL Server JSON schema](#ms-sql-server-json).

The following table describes the parameters of the Micorosft SQL Server JSON schema.


| Configuration | Description | 
| --- | --- | 
| connectionConfiguration | Configuration information for the endpoint for the data source. | 
| repositoryEndpointMetadata | Required configuration information for connecting your data source.[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html) | 
| repositoryConfigurations | Configuration information for the content of the data source. For example, configuring specific types of content and field mappings. Specify the type of data source and the secret ARN. | 
|  document  |  A list of objects that map the attributes or field names of your database content to Amazon Kendra index field names. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html). | 
| additionalProperties | Additional configuration options for your content in your data source. Use to include or exclude specific content in your database data source. | 
| primaryKey  | Provide the primary key for the database table. This identifies a table within your database. | 
| titleColumn | Provide the name of the document title column within your database table. | 
| bodyColumn | Provide the name of the document title column within your database table. | 
| sqlQuery | Enter SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB. Amazon Kendra will crawl all database content that matches your query. | 
| timestampColumn | Enter the name of the column which contains time stamps. Amazon Kendra uses time stamp information to detect changes in your content and sync only changed content. | 
| timestampFormat | Enter the name of the column which contains time stamp formats to use to detect content changes and re-sync your content. | 
| timezone | Enter the name of the column which contains time zones for the content to be crawled. | 
| changeDetectingColumns | Enter the names of the columns that Amazon Kendra will use to detect content changes. Amazon Kendra will re-index content when there is a change in any of these columns | 
| allowedUsersColumns | Enter the name of the column which contains User IDs to be allowed access to content. | 
| allowedGroupsColumn | Enter the name of the column which contains User IDs to be allowed access to content. | 
| sourceURIColumn | Enter the name of the column which contains Source URLs to be indexed. | 
| isSslEnabled | Enter SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB. Amazon Kendra will crawl all database content that matches your query. | 
| type | The type of data source. Specify JDBC as your data source type. | 
| syncMode |  Specify how Amazon Kendra should update your index when your data source content changes. You can choose between: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | 
| secretArn | The Amazon Resource Name (ARN) of a Secrets Manager secret that contains user name and password required to connect to your database. The secret must contain a JSON structure with the following keys: <pre>{<br />    "user name": "database user name",<br />    "password": "password"<br />}</pre> | 
| version | The version of the template that is currently supported. | 

### Microsoft SQL Server JSON schema
<a name="ms-sql-server-json"></a>

```
{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "type": "object",
  "properties": {
    "connectionConfiguration": {
      "type": "object",
      "properties": {
        "repositoryEndpointMetadata": {
          "type": "object",
          "properties": {
            "dbType": {
              "type": "string",
              "enum": [
                "mysql",
                "db2",
                "postgresql",
                "oracle",
                "sqlserver"
              ]
            },
            "dbHost": {
              "type": "string"
            },
            "dbPort": {
              "type": "string"
            },
            "dbInstance": {
              "type": "string"
            }
          },
          "required": [
            "dbType",
            "dbHost",
            "dbPort",
            "dbInstance"
          ]
        }
      },
      "required": [
        "repositoryEndpointMetadata"
      ]
    },
    "repositoryConfigurations": {
      "type": "object",
      "properties": {
        "document": {
          "type": "object",
          "properties": {
            "fieldMappings": {
              "type": "array",
              "items": [
                {
                  "type": "object",
                  "properties": {
                    "indexFieldName": {
                      "type": "string"
                    },
                    "indexFieldType": {
                      "type": "string"
                    },
                    "dataSourceFieldName": {
                      "type": "string"
                    }
                  },
                  "required": [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required": [
            "fieldMappings"
          ]
        }
      },
      "required": [
      ]
    },
    "additionalProperties": {
      "type": "object",
      "properties": {
        "primaryKey": {
          "type": "string"
        },
        "titleColumn": {
          "type": "string"
        },
        "bodyColumn": {
          "type": "string"
        },
        "sqlQuery": {
          "type": "string",
          "not": {
            "pattern": ";+"
          }
        },
        "timestampColumn": {
          "type": "string"
        },
        "timestampFormat": {
          "type": "string"
        },
        "timezone": {
          "type": "string"
        },
        "changeDetectingColumns": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "allowedUsersColumn": {
          "type": "string"
        },
        "allowedGroupsColumn": {
          "type": "string"
        },
        "sourceURIColumn": {
          "type": "string"
        },
        "isSslEnabled": {
          "type": "boolean"
        }
      },
      "required": ["primaryKey", "titleColumn", "bodyColumn", "sqlQuery"]
    },
    "type" : {
      "type" : "string",
      "pattern": "JDBC"
    },
    "syncMode": {
      "type": "string",
      "enum": [
        "FORCED_FULL_CRAWL",
        "FULL_CRAWL",
        "CHANGE_LOG"
      ]
    },
    "secretArn": {
      "type": "string"
    }
  },
  "version": {
    "type": "string",
    "anyOf": [
      {
        "pattern": "1.0.0"
      }
    ]
  },
  "required": [
      "connectionConfiguration",
      "repositoryConfigurations",
      "syncMode",
      "additionalProperties",
      "secretArn",
      "type"
  ]
}
```

## Microsoft Teams template schema
<a name="ds-msteams-schema"></a>

You include a JSON that contains the data source schema as part of the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html) object. You provide the tenant ID as a part of the connection configuration or repository endpoint details. Also specify the type of data source as `MSTEAMS`, a secret for your authentication credentials, and other necessary configurations. You then specify `TEMPLATE` as the `Type` when you call [CreateDataSource](https://docs.aws.amazon.com/kendra/latest/APIReference/API_CreateDataSource.html).

You can use the template provided in this developer guide. See [Microsoft Teams JSON schema](#msteams-json).

The following table describes the parameters of the Microsoft Teams JSON schema.


| Configuration | Description | 
| --- | --- | 
| connectionConfiguration | Configuration information for endpoint for the data source. | 
| repositoryEndpointMetadata | The endpoint information for the data source. | 
| tenantId | The Microsoft 365 tenant ID. You can find your tenant ID in the Properties of your Azure Active Directory Portal or in your OAuth application. | 
| repositoryConfigurations | Configuration information for the content of the data source. For example, configuring specific types of content and field mappings. | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | A list of objects that map the attributes or field names of your Microsoft Teams content to Amazon Kendra index field names. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html). | 
| additionalProperties | Additional configuration options for your content in your data source. | 
| paymentModel | Specifies what type of payment model to use with your Microsoft Teams data source. Model A payment models are restricted to licensing and payment models that require security compliance. Model B payment models are suitable for licensing and payment models that do not require security compliance. | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | A list of regular expression patterns to include certain content in your Microsoft Teams data source. Content that matches the patterns are included in the index. Content that doesn't match the patterns are excluded from the index. If content matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence, and the content isn't included in the index. | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | A list of regular expression patterns to exclude certain content in your Microsoft Teams data source. Content that matches the patterns are excluded from the index. Content that doesn't match the patterns are included in the index. If content matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence, and the content isn't included in the index. | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | true to crawl these types of content in your Microsoft Teams data source. | 
| startCalendarDateTime | You can configure a specific start date-time for your calendar content. | 
| endCalendarDateTime | You can configure a specific end date-time for calendar content. | 
| type | The type of data source. Specify MSTEAMS as your data source type. | 
| enableIdentityCrawler | true to use Amazon Kendra's identity crawler to sync identity/principal information on users and groups with access to certain documents. If identity crawler is turned off, all documents can be publicly searched. If you want to use access control for your documents and identity crawler is turned off, you can alternatively use the [https://docs.aws.amazon.com/kendra/latest/dg/API_PutPrincipalMapping.html](https://docs.aws.amazon.com/kendra/latest/dg/API_PutPrincipalMapping.html) API to upload user and group access information. | 
| syncMode |  Specify how Amazon Kendra should update your index when your data source content changes. You can choose between: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | 
| secretArn | The Amazon Resource Name (ARN) of an AWS Secrets Manager secret that contains the key-value pairs required to connect to your Microsoft Teams. This includes your client ID and client secret that is generated when you create an OAuth application in the Azure portal. | 
| version | The version of this template that is currently supported. | 

### Microsoft Teams JSON schema
<a name="msteams-json"></a>

```
{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "type": "object",
  "properties": {
    "connectionConfiguration": {
      "type": "object",
      "properties": {
        "repositoryEndpointMetadata": {
          "type": "object",
          "properties": {
            "tenantId": {
              "type": "string",
              "pattern": "^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$",
              "minLength": 36,
              "maxLength": 36
            }
          },
          "required": [
            "tenantId"
          ]
        }
      },
      "required": [
        "repositoryEndpointMetadata"
      ]
    },
    "repositoryConfigurations": {
      "type": "object",
      "properties": {
        "chatMessage": {
          "type": "object",
          "properties": {
            "fieldMappings": {
              "type": "array",
              "items": [
                {
                  "type": "object",
                  "properties": {
                    "indexFieldName": {
                      "type": "string"
                    },
                    "indexFieldType": {
                      "type": "string",
                      "enum": [
                        "STRING",
                        "STRING_LIST",
                        "DATE"
                      ]
                    },
                    "dataSourceFieldName": {
                      "type": "string"
                    },
                    "dateFieldFormat": {
                      "type": "string",
                      "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                    }
                  },
                  "required": [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required": [
            "fieldMappings"
          ]
        },
        "chatAttachment": {
          "type": "object",
          "properties": {
            "fieldMappings": {
              "type": "array",
              "items": [
                {
                  "type": "object",
                  "properties": {
                    "indexFieldName": {
                      "type": "string"
                    },
                    "indexFieldType": {
                      "type": "string",
                      "enum": [
                        "STRING",
                        "DATE",
                        "LONG"
                      ]
                    },
                    "dataSourceFieldName": {
                      "type": "string"
                    },
                    "dateFieldFormat": {
                      "type": "string",
                      "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                    }
                  },
                  "required": [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required": [
            "fieldMappings"
          ]
        },
        "channelPost": {
          "type": "object",
          "properties": {
            "fieldMappings": {
              "type": "array",
              "items": [
                {
                  "type": "object",
                  "properties": {
                    "indexFieldName": {
                      "type": "string"
                    },
                    "indexFieldType": {
                      "type": "string",
                      "enum": [
                        "STRING",
                        "STRING_LIST",
                        "DATE"
                      ]
                    },
                    "dataSourceFieldName": {
                      "type": "string"
                    },
                    "dateFieldFormat": {
                      "type": "string",
                      "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                    }
                  },
                  "required": [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required": [
            "fieldMappings"
          ]
        },
        "channelWiki": {
          "type": "object",
          "properties": {
            "fieldMappings": {
              "type": "array",
              "items": [
                {
                  "type": "object",
                  "properties": {
                    "indexFieldName": {
                      "type": "string"
                    },
                    "indexFieldType": {
                      "type": "string",
                      "enum": [
                        "STRING",
                        "DATE",
                        "LONG"
                      ]
                    },
                    "dataSourceFieldName": {
                      "type": "string"
                    },
                    "dateFieldFormat": {
                      "type": "string",
                      "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                    }
                  },
                  "required": [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required": [
            "fieldMappings"
          ]
        },
        "channelAttachment": {
          "type": "object",
          "properties": {
            "fieldMappings": {
              "type": "array",
              "items": [
                {
                  "type": "object",
                  "properties": {
                    "indexFieldName": {
                      "type": "string"
                    },
                    "indexFieldType": {
                      "type": "string",
                      "enum": [
                        "STRING",
                        "DATE",
                        "LONG"
                      ]
                    },
                    "dataSourceFieldName": {
                      "type": "string"
                    },
                    "dateFieldFormat": {
                      "type": "string",
                      "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                    }
                  },
                  "required": [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required": [
            "fieldMappings"
          ]
        },
        "meetingChat": {
          "type": "object",
          "properties": {
            "fieldMappings": {
              "type": "array",
              "items": [
                {
                  "type": "object",
                  "properties": {
                    "indexFieldName": {
                      "type": "string"
                    },
                    "indexFieldType": {
                      "type": "string",
                      "enum": [
                        "STRING",
                        "STRING_LIST",
                        "DATE"
                      ]
                    },
                    "dataSourceFieldName": {
                      "type": "string"
                    },
                    "dateFieldFormat": {
                      "type": "string",
                      "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                    }
                  },
                  "required": [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required": [
            "fieldMappings"
          ]
        },
        "meetingFile": {
          "type": "object",
          "properties": {
            "fieldMappings": {
              "type": "array",
              "items": [
                {
                  "type": "object",
                  "properties": {
                    "indexFieldName": {
                      "type": "string"
                    },
                    "indexFieldType": {
                      "type": "string",
                      "enum": [
                        "STRING",
                        "DATE",
                        "LONG"
                      ]
                    },
                    "dataSourceFieldName": {
                      "type": "string"
                    },
                    "dateFieldFormat": {
                      "type": "string",
                      "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                    }
                  },
                  "required": [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required": [
            "fieldMappings"
          ]
        },
        "meetingNote": {
          "type": "object",
          "properties": {
            "fieldMappings": {
              "type": "array",
              "items": [
                {
                  "type": "object",
                  "properties": {
                    "indexFieldName": {
                      "type": "string"
                    },
                    "indexFieldType": {
                      "type": "string",
                      "enum": [
                        "STRING",
                        "DATE"
                      ]
                    },
                    "dataSourceFieldName": {
                      "type": "string"
                    },
                    "dateFieldFormat": {
                      "type": "string",
                      "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                    }
                  },
                  "required": [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required": [
            "fieldMappings"
          ]
        },
        "calendarMeeting": {
          "type": "object",
          "properties": {
            "fieldMappings": {
              "type": "array",
              "items": [
                {
                  "type": "object",
                  "properties": {
                    "indexFieldName": {
                      "type": "string"
                    },
                    "indexFieldType": {
                      "type": "string",
                      "enum": [
                        "STRING",
                        "DATE"
                      ]
                    },
                    "dataSourceFieldName": {
                      "type": "string"
                    },
                    "dateFieldFormat": {
                      "type": "string",
                      "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                    }
                  },
                  "required": [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required": [
            "fieldMappings"
          ]
        }
      }
    },
     "additionalProperties": {
      "type": "object",
      "properties": {
        "paymentModel": {
          "type": "string",
          "enum": [
            "A",
            "B",
            "Evaluation Mode"
          ]
        },
        "inclusionTeamNameFilter": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "exclusionTeamNameFilter": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "inclusionChannelNameFilter": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "exclusionChannelNameFilter": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "inclusionFileNamePatterns": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "exclusionFileNamePatterns": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "inclusionFileTypePatterns": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "exclusionFileTypePatterns": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "inclusionUserEmailFilter": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "inclusionOneNoteSectionNamePatterns": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "exclusionOneNoteSectionNamePatterns": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "inclusionOneNotePageNamePatterns": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "exclusionOneNotePageNamePatterns": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "isCrawlChatMessage": {
          "type": "boolean"
        },
        "isCrawlChatAttachment": {
          "type": "boolean"
        },
        "isCrawlChannelPost": {
          "type": "boolean"
        },
        "isCrawlChannelAttachment": {
          "type": "boolean"
        },
        "isCrawlChannelWiki": {
          "type": "boolean"
        },
        "isCrawlCalendarMeeting": {
          "type": "boolean"
        },
        "isCrawlMeetingChat": {
          "type": "boolean"
        },
        "isCrawlMeetingFile": {
          "type": "boolean"
        },
        "isCrawlMeetingNote": {
          "type": "boolean"
        },
        "startCalendarDateTime": {
          "anyOf": [
            {
              "type": "string",
              "pattern": "^[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}Z$"
            },
            {
              "type": "string",
              "pattern": ""
            }
          ]
        },
        "endCalendarDateTime": {
          "anyOf": [
            {
              "type": "string",
              "pattern": "^[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}Z$"
            },
            {
              "type": "string",
              "pattern": ""
            }
          ]
        }
      },
      "required": []
    },
    "type": {
      "type": "string",
      "pattern": "MSTEAMS"
    },
    "enableIdentityCrawler": {
      "type": "boolean"
    },
    "syncMode": {
      "type": "string",
      "enum": [
        "FORCED_FULL_CRAWL",
        "FULL_CRAWL",
        "CHANGE_LOG"
      ]
    },
    "secretArn": {
      "type": "string",
      "minLength": 20,
      "maxLength": 2048
    }
  },
  "version": {
    "type": "string",
    "anyOf": [
      {
        "pattern": "1.0.0"
      }
    ]
  },
  "required": [
    "connectionConfiguration",
    "repositoryConfigurations",
    "syncMode",
    "additionalProperties",
    "secretArn",
    "type"
  ]
}
```

## Microsoft Yammer template schema
<a name="ds-schema-yammer"></a>

You include a JSON that contains the data source schema as part of [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html) object. Specify the type of data source as `YAMMER`, a secret for your authentication credentials, and other necessary configurations. You then specify `TEMPLATE` as the **Type** when you call [CreateDataSource](https://docs.aws.amazon.com/kendra/latest/APIReference/API_CreateDataSource.html).

You can use the template provided in this developer guide.

The following table describes the parameters of the Microsoft Yammer JSON schema.


| Configuration | Description | 
| --- | --- | 
| connectionConfiguration | Configuration information for the data source. | 
| repositoryEndpointMetadata | The endpoint information for the data source. This data source does not specify an endpoint in repositoryEndpointMetadata. Rather, the connection information is included in an AWS Secrets Manager secret that you provide the secretArn. | 
| repositoryConfigurations | Configuration information for the content of the data source. For example, configuring specific types of content and field mappings. | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | A list of objects that map attributes or field names of Microsoft Yammer content to Amazon Kendra index field names. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html). | 
| additionalProperties | Additional configuration options for your content in your data source | 
| inclusionPatterns | A list of regular expression patterns to include certain files in your Microsoft Yammer data source. Files that match the patterns are included in the index. File that don't match the patterns are excluded from the index. If a file matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence and the file isn't included in the index. | 
| exclusionPatterns | A list of regular expression patterns to exclude certain files in your Microsoft Yammer data source. Files that match the patterns are excluded from the index. Files that don't match the patterns are included in the index. If a file matches both an exclusion and inclusion pattern, the exclusion pattern takes precedence and the file isn't included in the index. | 
| sinceDate | You can choose to configure a sinceDate parameter so that the Microsoft Yammer connector crawls content based on a specific sinceDate. | 
| communityNameFilter | You can choose to index specific community content. | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | true to crawl messages, message attachments, and private messages. | 
| type | Specify YAMMER as your data source type. | 
| secretARN | The Amazon Resource Name (ARN) of an AWS Secrets Manager secret that contains the key-value pairs required to connect to your Microsoft Yammer. This includes your Microsoft Yammer user name and password, and client ID and client secret that is generated when you create an OAuth application in the Azure portal. | 
| useChangeLog | true to use the Microsoft Yammer change log to determine which documents require updating in the index. | 
| syncMode |  Specify how Amazon Kendra should update your index when your data source content changes. You can choose between: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | 
| enableIdentityCrawler | true to use Amazon Kendra's identity crawler to sync identity/principal information on users and groups with access to certain documents. If identity crawler is turned off, all documents can be publicly searched. If you want to use access control for your documents and identity crawler is turned off, you can alternatively use the [https://docs.aws.amazon.com/kendra/latest/dg/API_PutPrincipalMapping.html](https://docs.aws.amazon.com/kendra/latest/dg/API_PutPrincipalMapping.html) API to upload user and group access information. | 

### Microsoft Yammer JSON schema
<a name="yammer-json"></a>

```
{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "type": "object",
  "properties": {
    "connectionConfiguration": {
      "type": "object",
      "properties": {
        "repositoryEndpointMetadata": {
          "type": "object",
          "properties": {
          }
        }
      },
      "required": [
        "repositoryEndpointMetadata"
      ]
    },
    "repositoryConfigurations": {
      "type": "object",
      "properties": {
        "community": {
          "type": "object",
          "properties": {
            "fieldMappings": {
              "type": "array",
              "items": {
                "anyOf": [
                  {
                    "type": "object",
                    "properties": {
                      "indexFieldName": {
                        "type": "string"
                      },
                      "indexFieldType": {
                        "type": "string",
                        "enum": [
                          "STRING",
                          "DATE"
                        ]
                      },
                      "dataSourceFieldName": {
                        "type": "string"
                      },
                      "dateFieldFormat": {
                        "type": "string",
                        "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                      }
                    },
                    "required": [
                      "indexFieldName",
                      "indexFieldType",
                      "dataSourceFieldName"
                    ]
                  }
                ]
              }
            }
          },
          "required": [
            "fieldMappings"
          ]
        },
        "user": {
          "type": "object",
          "properties": {
            "fieldMappings": {
              "type": "array",
              "items": {
                "anyOf": [
                  {
                    "type": "object",
                    "properties": {
                      "indexFieldName": {
                        "type": "string"
                      },
                      "indexFieldType": {
                        "type": "string",
                        "enum": [
                          "STRING",
                          "DATE"
                        ]
                      },
                      "dataSourceFieldName": {
                        "type": "string"
                      },
                      "dateFieldFormat": {
                        "type": "string",
                        "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                      }
                    },
                    "required": [
                      "indexFieldName",
                      "indexFieldType",
                      "dataSourceFieldName"
                    ]
                  }
                ]
              }
            }
          },
          "required": [
            "fieldMappings"
          ]
        },
        "message": {
          "type": "object",
          "properties": {
            "fieldMappings": {
              "type": "array",
              "items": {
                "anyOf": [
                  {
                    "type": "object",
                    "properties": {
                      "indexFieldName": {
                        "type": "string"
                      },
                      "indexFieldType": {
                        "type": "string",
                        "enum": [
                          "STRING",
                          "DATE"
                        ]
                      },
                      "dataSourceFieldName": {
                        "type": "string"
                      },
                      "dateFieldFormat": {
                        "type": "string",
                        "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                      }
                    },
                    "required": [
                      "indexFieldName",
                      "indexFieldType",
                      "dataSourceFieldName"
                    ]
                  }
                ]
              }
            }
          },
          "required": [
            "fieldMappings"
          ]
        },
        "attachment": {
          "type": "object",
          "properties": {
            "fieldMappings": {
              "type": "array",
              "items": {
                "anyOf": [
                  {
                    "type": "object",
                    "properties": {
                      "indexFieldName": {
                        "type": "string"
                      },
                      "indexFieldType": {
                        "type": "string",
                        "enum": [
                          "STRING",
                          "DATE"
                        ]
                      },
                      "dataSourceFieldName": {
                        "type": "string"
                      },
                      "dateFieldFormat": {
                        "type": "string",
                        "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                      }
                    },
                    "required": [
                      "indexFieldName",
                      "indexFieldType",
                      "dataSourceFieldName"
                    ]
                  }
                ]
              }
            }
          },
          "required": [
            "fieldMappings"
          ]
        }
      }
    },
    "additionalProperties": {
      "type": "object",
      "properties": {
        "inclusionPatterns": {
          "type": "array"
        },
        "exclusionPatterns": {
          "type": "array"
        },
        "sinceDate": {
          "type": "string",
          "pattern": "^(19|2[0-9])[0-9]{2}-(0[1-9]|1[012])-(0[1-9]|[12][0-9]|3[01])T(0[0-9]|1[0-9]|2[0-3]):([0-5][0-9]):([0-5][0-9])((\\+|-)(0[0-9]|1[0-9]|2[0-3]):([0-5][0-9]))?$"
        },
        "communityNameFilter": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "isCrawlMessage": {
          "type": "boolean"
        },
        "isCrawlAttachment": {
          "type": "boolean"
        },
        "isCrawlPrivateMessage": {
          "type": "boolean"
        }
      },
      "required": [
        "sinceDate"
      ]
    },
    "type": {
      "type": "string",
      "pattern": "YAMMER"
    },
    "secretArn": {
      "type": "string",
      "minLength": 20,
      "maxLength": 2048
    },
    "useChangeLog": {
      "type": "string",
      "enum": [
        "true",
        "false"
      ]
    },
    "syncMode": {
      "type": "string",
      "enum": [
        "FORCED_FULL_CRAWL",
        "FULL_CRAWL",
        "CHANGE_LOG"
      ]
    },
    "enableIdentityCrawler": {
      "type": "boolean"
    },
    "version": {
      "type": "string",
      "anyOf": [
        {
          "pattern": "1.0.0"
        }
      ]
    }
  },
  "required": [
    "connectionConfiguration",
    "repositoryConfigurations",
    "additionalProperties",
    "type",
    "secretArn",
    "syncMode"
  ]
}
```

## MySQL template schema
<a name="ds-mysql-schema"></a>

You include a JSON that contains the data source schema as part of the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html) object. Specify the type of data source as `JDBC`, the database type as `mysql`, a secret for your authentication credentials, and other necessary configurations. You then specify `TEMPLATE` as the `Type` when you call [CreateDataSource](https://docs.aws.amazon.com/kendra/latest/APIReference/API_CreateDataSource.html).

You can use the template provided in this developer guide. See [MySQL JSON schema](#mysql-json).

The following table describes the parameters of the MySQL JSON schema.


| Configuration | Description | 
| --- | --- | 
| connectionConfiguration | Configuration information for the endpoint for the data source. | 
| repositoryEndpointMetadata | Required configuration information for connecting your data source.[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html) | 
| repositoryConfigurations | Configuration information for the content of the data source. For example, configuring specific types of content and field mappings. Specify the type of data source and the secret ARN. | 
|  document  |  A list of objects that map the attributes or field names of your database content to Amazon Kendra index field names. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html). | 
| additionalProperties | Additional configuration options for your content in your data source. Use to include or exclude specific content in your database data source. | 
| primaryKey  | Provide the primary key for the database table. This identifies a table within your database. | 
| titleColumn | Provide the name of the document title column within your database table. | 
| bodyColumn | Provide the name of the document title column within your database table. | 
| sqlQuery | Enter SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB. Amazon Kendra will crawl all database content that matches your query. | 
| timestampColumn | Enter the name of the column which contains time stamps. Amazon Kendra uses time stamp information to detect changes in your content and sync only changed content. | 
| timestampFormat | Enter the name of the column which contains time stamp formats to use to detect content changes and re-sync your content. | 
| timezone | Enter the name of the column which contains time zones for the content to be crawled. | 
| changeDetectingColumns | Enter the names of the columns that Amazon Kendra will use to detect content changes. Amazon Kendra will re-index content when there is a change in any of these columns | 
| allowedUsersColumns | Enter the name of the column which contains User IDs to be allowed access to content. | 
| allowedGroupsColumn | Enter the name of the column which contains User IDs to be allowed access to content. | 
| sourceURIColumn | Enter the name of the column which contains Source URLs to be indexed. | 
| isSslEnabled | Enter SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB. Amazon Kendra will crawl all database content that matches your query. | 
| type | The type of data source. Specify JDBC as your data source type. | 
| syncMode |  Specify how Amazon Kendra should update your index when your data source content changes. You can choose between: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | 
| secretArn | The Amazon Resource Name (ARN) of a Secrets Manager secret that contains user name and password required to connect to your database. The secret must contain a JSON structure with the following keys: <pre>{<br />    "user name": "database user name",<br />    "password": "password"<br />}</pre> | 
| version | The version of the template that is currently supported. | 

### MySQL JSON schema
<a name="mysql-json"></a>

```
{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "type": "object",
  "properties": {
    "connectionConfiguration": {
      "type": "object",
      "properties": {
        "repositoryEndpointMetadata": {
          "type": "object",
          "properties": {
            "dbType": {
              "type": "string",
              "enum": [
                "mysql",
                "db2",
                "postgresql",
                "oracle",
                "sqlserver"
              ]
            },
            "dbHost": {
              "type": "string"
            },
            "dbPort": {
              "type": "string"
            },
            "dbInstance": {
              "type": "string"
            }
          },
          "required": [
            "dbType",
            "dbHost",
            "dbPort",
            "dbInstance"
          ]
        }
      },
      "required": [
        "repositoryEndpointMetadata"
      ]
    },
    "repositoryConfigurations": {
      "type": "object",
      "properties": {
        "document": {
          "type": "object",
          "properties": {
            "fieldMappings": {
              "type": "array",
              "items": [
                {
                  "type": "object",
                  "properties": {
                    "indexFieldName": {
                      "type": "string"
                    },
                    "indexFieldType": {
                      "type": "string"
                    },
                    "dataSourceFieldName": {
                      "type": "string"
                    }
                  },
                  "required": [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required": [
            "fieldMappings"
          ]
        }
      },
      "required": [
      ]
    },
    "additionalProperties": {
      "type": "object",
      "properties": {
        "primaryKey": {
          "type": "string"
        },
        "titleColumn": {
          "type": "string"
        },
        "bodyColumn": {
          "type": "string"
        },
        "sqlQuery": {
          "type": "string",
          "not": {
            "pattern": ";+"
          }
        },
        "timestampColumn": {
          "type": "string"
        },
        "timestampFormat": {
          "type": "string"
        },
        "timezone": {
          "type": "string"
        },
        "changeDetectingColumns": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "allowedUsersColumn": {
          "type": "string"
        },
        "allowedGroupsColumn": {
          "type": "string"
        },
        "sourceURIColumn": {
          "type": "string"
        },
        "isSslEnabled": {
          "type": "boolean"
        }
      },
      "required": ["primaryKey", "titleColumn", "bodyColumn", "sqlQuery"]
    },
    "type" : {
      "type" : "string",
      "pattern": "JDBC"
    },
    "syncMode": {
      "type": "string",
      "enum": [
        "FORCED_FULL_CRAWL",
        "FULL_CRAWL",
        "CHANGE_LOG"
      ]
    },
    "secretArn": {
      "type": "string"
    }
  },
  "version": {
    "type": "string",
    "anyOf": [
      {
        "pattern": "1.0.0"
      }
    ]
  },
  "required": [
      "connectionConfiguration",
      "repositoryConfigurations",
      "syncMode",
      "additionalProperties",
      "secretArn",
      "type"
  ]
}
```

## Oracle Database template schema
<a name="ds-oracle-database-schema"></a>

You include a JSON that contains the data source schema as part of the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html) object. Specify the type of data source as `JDBC`, the database type as `oracle`, a secret for your authentication credentials, and other necessary configurations. You then specify `TEMPLATE` as the `Type` when you call [CreateDataSource](https://docs.aws.amazon.com/kendra/latest/APIReference/API_CreateDataSource.html).

You can use the template provided in this developer guide. See [Oracle Database JSON schema](#oracle-database-json).

The following table describes the parameters of the Oracle Database JSON schema.


| Configuration | Description | 
| --- | --- | 
| connectionConfiguration | Configuration information for the endpoint for the data source. | 
| repositoryEndpointMetadata | Required configuration information for connecting your data source.[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html) | 
| repositoryConfigurations | Configuration information for the content of the data source. For example, configuring specific types of content and field mappings. Specify the type of data source and the secret ARN. | 
|  document  |  A list of objects that map the attributes or field names of your database content to Amazon Kendra index field names. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html). | 
| additionalProperties | Additional configuration options for your content in your data source. Use to include or exclude specific content in your database data source. | 
| primaryKey  | Provide the primary key for the database table. This identifies a table within your database. | 
| titleColumn | Provide the name of the document title column within your database table. | 
| bodyColumn | Provide the name of the document title column within your database table. | 
| sqlQuery | Enter SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB. Amazon Kendra will crawl all database content that matches your query. | 
| timestampColumn | Enter the name of the column which contains time stamps. Amazon Kendra uses time stamp information to detect changes in your content and sync only changed content. | 
| timestampFormat | Enter the name of the column which contains time stamp formats to use to detect content changes and re-sync your content. | 
| timezone | Enter the name of the column which contains time zones for the content to be crawled. | 
| changeDetectingColumns | Enter the names of the columns that Amazon Kendra will use to detect content changes. Amazon Kendra will re-index content when there is a change in any of these columns | 
| allowedUsersColumns | Enter the name of the column which contains User IDs to be allowed access to content. | 
| allowedGroupsColumn | Enter the name of the column which contains User IDs to be allowed access to content. | 
| sourceURIColumn | Enter the name of the column which contains Source URLs to be indexed. | 
| isSslEnabled | Enter SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB. Amazon Kendra will crawl all database content that matches your query. | 
| type | The type of data source. Specify JDBC as your data source type. | 
| syncMode |  Specify how Amazon Kendra should update your index when your data source content changes. You can choose between: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | 
| secretArn | The Amazon Resource Name (ARN) of a Secrets Manager secret that contains user name and password required to connect to your database. The secret must contain a JSON structure with the following keys: <pre>{<br />    "user name": "database user name",<br />    "password": "password"<br />}</pre> | 
| version | The version of the template that is currently supported. | 

### Oracle Database JSON schema
<a name="oracle-database-json"></a>

```
{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "type": "object",
  "properties": {
    "connectionConfiguration": {
      "type": "object",
      "properties": {
        "repositoryEndpointMetadata": {
          "type": "object",
          "properties": {
            "dbType": {
              "type": "string",
              "enum": [
                "mysql",
                "db2",
                "postgresql",
                "oracle",
                "sqlserver"
              ]
            },
            "dbHost": {
              "type": "string"
            },
            "dbPort": {
              "type": "string"
            },
            "dbInstance": {
              "type": "string"
            }
          },
          "required": [
            "dbType",
            "dbHost",
            "dbPort",
            "dbInstance"
          ]
        }
      },
      "required": [
        "repositoryEndpointMetadata"
      ]
    },
    "repositoryConfigurations": {
      "type": "object",
      "properties": {
        "document": {
          "type": "object",
          "properties": {
            "fieldMappings": {
              "type": "array",
              "items": [
                {
                  "type": "object",
                  "properties": {
                    "indexFieldName": {
                      "type": "string"
                    },
                    "indexFieldType": {
                      "type": "string"
                    },
                    "dataSourceFieldName": {
                      "type": "string"
                    }
                  },
                  "required": [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required": [
            "fieldMappings"
          ]
        }
      },
      "required": [
      ]
    },
    "additionalProperties": {
      "type": "object",
      "properties": {
        "primaryKey": {
          "type": "string"
        },
        "titleColumn": {
          "type": "string"
        },
        "bodyColumn": {
          "type": "string"
        },
        "sqlQuery": {
          "type": "string",
          "not": {
            "pattern": ";+"
          }
        },
        "timestampColumn": {
          "type": "string"
        },
        "timestampFormat": {
          "type": "string"
        },
        "timezone": {
          "type": "string"
        },
        "changeDetectingColumns": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "allowedUsersColumn": {
          "type": "string"
        },
        "allowedGroupsColumn": {
          "type": "string"
        },
        "sourceURIColumn": {
          "type": "string"
        },
        "isSslEnabled": {
          "type": "boolean"
        }
      },
      "required": ["primaryKey", "titleColumn", "bodyColumn", "sqlQuery"]
    },
    "type" : {
      "type" : "string",
      "pattern": "JDBC"
    },
    "syncMode": {
      "type": "string",
      "enum": [
        "FORCED_FULL_CRAWL",
        "FULL_CRAWL",
        "CHANGE_LOG"
      ]
    },
    "secretArn": {
      "type": "string"
    }
  },
  "version": {
    "type": "string",
    "anyOf": [
      {
        "pattern": "1.0.0"
      }
    ]
  },
  "required": [
      "connectionConfiguration",
      "repositoryConfigurations",
      "syncMode",
      "additionalProperties",
      "secretArn",
      "type"
  ]
}
```

## PostgreSQL template schema
<a name="ds-postgresql-schema"></a>

You include a JSON that contains the data source schema as part of the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html) object. Specify the type of data source as `JDBC`, the database type as `postgresql`, a secret for your authentication credentials, and other necessary configurations. You then specify `TEMPLATE` as the `Type` when you call [CreateDataSource](https://docs.aws.amazon.com/kendra/latest/APIReference/API_CreateDataSource.html).

You can use the template provided in this developer guide. See [PostgreSQL JSON schema](#postgresql-json).

The following table describes the parameters of the PostgreSQL JSON schema.


| Configuration | Description | 
| --- | --- | 
| connectionConfiguration | Configuration information for the endpoint for the data source. | 
| repositoryEndpointMetadata | Required configuration information for connecting your data source.[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html) | 
| repositoryConfigurations | Configuration information for the content of the data source. For example, configuring specific types of content and field mappings. Specify the type of data source and the secret ARN. | 
|  document  |  A list of objects that map the attributes or field names of your database content to Amazon Kendra index field names. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html). | 
| additionalProperties | Additional configuration options for your content in your data source. Use to include or exclude specific content in your database data source. | 
| primaryKey  | Provide the primary key for the database table. This identifies a table within your database. | 
| titleColumn | Provide the name of the document title column within your database table. | 
| bodyColumn | Provide the name of the document title column within your database table. | 
| sqlQuery | Enter SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB. Amazon Kendra will crawl all database content that matches your query. | 
| timestampColumn | Enter the name of the column which contains time stamps. Amazon Kendra uses time stamp information to detect changes in your content and sync only changed content. | 
| timestampFormat | Enter the name of the column which contains time stamp formats to use to detect content changes and re-sync your content. | 
| timezone | Enter the name of the column which contains time zones for the content to be crawled. | 
| changeDetectingColumns | Enter the names of the columns that Amazon Kendra will use to detect content changes. Amazon Kendra will re-index content when there is a change in any of these columns | 
| allowedUsersColumns | Enter the name of the column which contains User IDs to be allowed access to content. | 
| allowedGroupsColumn | Enter the name of the column which contains User IDs to be allowed access to content. | 
| sourceURIColumn | Enter the name of the column which contains Source URLs to be indexed. | 
| isSslEnabled | Enter SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB. Amazon Kendra will crawl all database content that matches your query. | 
| type | The type of data source. Specify JDBC as your data source type. | 
| syncMode |  Specify how Amazon Kendra should update your index when your data source content changes. You can choose between: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | 
| secretArn | The Amazon Resource Name (ARN) of a Secrets Manager secret that contains user name and password required to connect to your database. The secret must contain a JSON structure with the following keys: <pre>{<br />    "user name": "database user name",<br />    "password": "password"<br />}</pre> | 
| version | The version of the template that is currently supported. | 

### PostgreSQL JSON schema
<a name="postgresql-json"></a>

```
{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "type": "object",
  "properties": {
    "connectionConfiguration": {
      "type": "object",
      "properties": {
        "repositoryEndpointMetadata": {
          "type": "object",
          "properties": {
            "dbType": {
              "type": "string",
              "enum": [
                "mysql",
                "db2",
                "postgresql",
                "oracle",
                "sqlserver"
              ]
            },
            "dbHost": {
              "type": "string"
            },
            "dbPort": {
              "type": "string"
            },
            "dbInstance": {
              "type": "string"
            }
          },
          "required": [
            "dbType",
            "dbHost",
            "dbPort",
            "dbInstance"
          ]
        }
      },
      "required": [
        "repositoryEndpointMetadata"
      ]
    },
    "repositoryConfigurations": {
      "type": "object",
      "properties": {
        "document": {
          "type": "object",
          "properties": {
            "fieldMappings": {
              "type": "array",
              "items": [
                {
                  "type": "object",
                  "properties": {
                    "indexFieldName": {
                      "type": "string"
                    },
                    "indexFieldType": {
                      "type": "string"
                    },
                    "dataSourceFieldName": {
                      "type": "string"
                    }
                  },
                  "required": [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required": [
            "fieldMappings"
          ]
        }
      },
      "required": [
      ]
    },
    "additionalProperties": {
      "type": "object",
      "properties": {
        "primaryKey": {
          "type": "string"
        },
        "titleColumn": {
          "type": "string"
        },
        "bodyColumn": {
          "type": "string"
        },
        "sqlQuery": {
          "type": "string",
          "not": {
            "pattern": ";+"
          }
        },
        "timestampColumn": {
          "type": "string"
        },
        "timestampFormat": {
          "type": "string"
        },
        "timezone": {
          "type": "string"
        },
        "changeDetectingColumns": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "allowedUsersColumn": {
          "type": "string"
        },
        "allowedGroupsColumn": {
          "type": "string"
        },
        "sourceURIColumn": {
          "type": "string"
        },
        "isSslEnabled": {
          "type": "boolean"
        }
      },
      "required": ["primaryKey", "titleColumn", "bodyColumn", "sqlQuery"]
    },
    "type" : {
      "type" : "string",
      "pattern": "JDBC"
    },
    "syncMode": {
      "type": "string",
      "enum": [
        "FORCED_FULL_CRAWL",
        "FULL_CRAWL",
        "CHANGE_LOG"
      ]
    },
    "secretArn": {
      "type": "string"
    }
  },
  "version": {
    "type": "string",
    "anyOf": [
      {
        "pattern": "1.0.0"
      }
    ]
  },
  "required": [
      "connectionConfiguration",
      "repositoryConfigurations",
      "syncMode",
      "additionalProperties",
      "secretArn",
      "type"
  ]
}
```

## Salesforce template schema
<a name="ds-salesforce-schema"></a>

You include a JSON that contains the data source schema as part of the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html) object. You provide the Salesforce host URL as a part of the connection configuration or repository endpoint details. Also specify the type of data source as `SALESFORCEV2`, a secret for your authentication credentials, and other necessary configurations. You then specify `TEMPLATE` as the `Type` when you call [CreateDataSource](https://docs.aws.amazon.com/kendra/latest/APIReference/API_CreateDataSource.html).

You can use the template provided in this developer guide. See [Salesforce JSON schema](#salesforce-json).

The following table describes the parameters of the Salesforce JSON schema.


| Configuration | Description | 
| --- | --- | 
| connectionConfiguration | Configuration information for the endpoint for the data source. | 
| repositoryEndpointMetadata | The endpoint information for the data source. | 
| hostUrl | The URL of the Salesforce instance to be indexed. | 
| repositoryConfigurations | Configuration information for the content of the data source. For example, configuring specific types of content and field mappings. | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  |  A list of objects that map the attributes or field names of your Salesforce entities to Amazon Kendra index field names. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html). | 
| secretARN | The Amazon Resource Name (ARN) of an AWS Secrets Manager secret that contains the key-value pairs required to connect to your Salesforce. The secret must contain a JSON structure with the following keys: <pre>{<br />    "authenticationUrl": "OAUTH endpoint that Amazon Kendra connects to get an OAUTH token",<br />    "consumerKey": "Application public key generated when you created your Salesforce application",<br />    "consumerSecret": "Application private key generated when you created your Salesforce application",<br />    "password": "Password associated with the user logging in to the Salesforce instance",<br />    "securityToken": "Token associated with the user account logging in to the Salesforce instance",<br />    "username": "User name of the user logging in to the Salesforce instance"<br />}</pre> | 
| additionalProperties | Additional configuration options for your content in your data source | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | A collection of strings that specifies which entities to filter. | 
| inclusionPatterns [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | A list of regular expression patterns to include certain files in your Salesforce data source. Files that match the patterns are included in the index. Files that don't match the patterns are excluded from the index. If a file matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence and the file isn't included in the index. | 
| exclusionPatterns [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | A list of regular expression patterns to exclude certain files in your Salesforce data source. Files that match the patterns are excluded from the index. Files that don't match the patterns are included in the index. If a file matches both an exclusion and inclusion pattern, the exclusion pattern takes precedence and the file isn't included in the index. | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | true to crawl these types of files in your Salesforce account. | 
| type | The type of data source. Specify SALESFORCEV2 as your data source type. | 
| enableIdentityCrawler | true to use Amazon Kendra's identity crawler to sync identity/principal information on users and groups with access to certain documents. If identity crawler is turned off, all documents can be publicly searched. If you want to use access control for your documents and identity crawler is turned off, you can alternatively use the [https://docs.aws.amazon.com/kendra/latest/dg/API_PutPrincipalMapping.html](https://docs.aws.amazon.com/kendra/latest/dg/API_PutPrincipalMapping.html) API to upload user and group access information. | 
| syncMode |  Specify how Amazon Kendra should update your index when your data source content changes. You can choose between: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | 
| version | The version of this template that is currently supported. | 

### Salesforce JSON schema
<a name="salesforce-json"></a>

```
{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "type": "object",
  "properties":
  {
    "connectionConfiguration": {
      "type": "object",
      "properties":
      {
        "repositoryEndpointMetadata":
        {
          "type": "object",
          "properties":
          {
            "hostUrl":
            {
              "type": "string",
              "pattern": "https:.*"
            }
          },
          "required":
          [
            "hostUrl"
          ]
        }
      },
      "required":
      [
        "repositoryEndpointMetadata"
      ]
    },
    "repositoryConfigurations": {
      "type": "object",
      "properties":
      {
        "account":
        {
          "type": "object",
          "properties":
          {
            "fieldMappings":
            {
              "type": "array",
              "items":
              [
                {
                  "type": "object",
                  "properties":
                  {
                    "indexFieldName":
                    {
                      "type": "string"
                    },
                    "indexFieldType":
                    {
                      "type": "string",
                      "enum":
                      [
                        "STRING",
                        "STRING_LIST",
                        "DATE",
                        "LONG"
                      ]
                    },
                    "dataSourceFieldName":
                    {
                      "type": "string"
                    },
                    "dateFieldFormat":
                    {
                      "type": "string",
                      "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                    }
                  },
                  "required":
                  [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required":
          [
            "fieldMappings"
          ]
        },
        "contact":
        {
          "type": "object",
          "properties":
          {
            "fieldMappings":
            {
              "type": "array",
              "items":
              [
                {
                  "type": "object",
                  "properties":
                  {
                    "indexFieldName":
                    {
                      "type": "string"
                    },
                    "indexFieldType":
                    {
                      "type": "string",
                      "enum":
                      [
                        "STRING",
                        "STRING_LIST",
                        "DATE"
                      ]
                    },
                    "dataSourceFieldName":
                    {
                      "type": "string"
                    },
                    "dateFieldFormat":
                    {
                      "type": "string",
                      "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                    }
                  },
                  "required":
                  [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required":
          [
            "fieldMappings"
          ]
        },
        "campaign":
        {
          "type": "object",
          "properties":
          {
            "fieldMappings":
            {
              "type": "array",
              "items":
              [
                {
                  "type": "object",
                  "properties":
                  {
                    "indexFieldName":
                    {
                      "type": "string"
                    },
                    "indexFieldType":
                    {
                      "type": "string",
                      "enum":
                      [
                        "STRING",
                        "STRING_LIST",
                        "DATE",
                        "LONG"
                      ]
                    },
                    "dataSourceFieldName":
                    {
                      "type": "string"
                    },
                    "dateFieldFormat":
                    {
                      "type": "string",
                      "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                    }
                  },
                  "required":
                  [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required":
          [
            "fieldMappings"
          ]
        },
        "case":
        {
          "type": "object",
          "properties":
          {
            "fieldMappings":
            {
              "type": "array",
              "items":
              [
                {
                  "type": "object",
                  "properties":
                  {
                    "indexFieldName":
                    {
                      "type": "string"
                    },
                    "indexFieldType":
                    {
                      "type": "string",
                      "enum":
                      [
                        "STRING",
                        "STRING_LIST",
                        "DATE"
                      ]
                    },
                    "dataSourceFieldName":
                    {
                      "type": "string"
                    },
                    "dateFieldFormat":
                    {
                      "type": "string",
                      "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                    }
                  },
                  "required":
                  [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required":
          [
            "fieldMappings"
          ]
        },
        "product":
        {
          "type": "object",
          "properties":
          {
            "fieldMappings":
            {
              "type": "array",
              "items":
              [
                {
                  "type": "object",
                  "properties":
                  {
                    "indexFieldName":
                    {
                      "type": "string"
                    },
                    "indexFieldType":
                    {
                      "type": "string",
                      "enum":
                      [
                        "STRING",
                        "STRING_LIST",
                        "DATE"
                      ]
                    },
                    "dataSourceFieldName":
                    {
                      "type": "string"
                    },
                    "dateFieldFormat":
                    {
                      "type": "string",
                      "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                    }
                  },
                  "required":
                  [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required":
          [
            "fieldMappings"
          ]
        },
        "lead":
        {
          "type": "object",
          "properties":
          {
            "fieldMappings":
            {
              "type": "array",
              "items":
              [
                {
                  "type": "object",
                  "properties":
                  {
                    "indexFieldName":
                    {
                      "type": "string"
                    },
                    "indexFieldType":
                    {
                      "type": "string",
                      "enum":
                      [
                        "STRING",
                        "STRING_LIST",
                        "DATE",
                        "LONG"
                      ]
                    },
                    "dataSourceFieldName":
                    {
                      "type": "string"
                    },
                    "dateFieldFormat":
                    {
                      "type": "string",
                      "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                    }
                  },
                  "required":
                  [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required":
          [
            "fieldMappings"
          ]
        },
        "contract":
        {
          "type": "object",
          "properties":
          {
            "fieldMappings":
            {
              "type": "array",
              "items":
              [
                {
                  "type": "object",
                  "properties":
                  {
                    "indexFieldName":
                    {
                      "type": "string"
                    },
                    "indexFieldType":
                    {
                      "type": "string",
                      "enum":
                      [
                        "STRING",
                        "STRING_LIST",
                        "DATE"
                      ]
                    },
                    "dataSourceFieldName":
                    {
                      "type": "string"
                    },
                    "dateFieldFormat":
                    {
                      "type": "string",
                      "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                    }
                  },
                  "required":
                  [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required":
          [
            "fieldMappings"
          ]
        },
        "partner":
        {
          "type": "object",
          "properties":
          {
            "fieldMappings":
            {
              "type": "array",
              "items":
              [
                {
                  "type": "object",
                  "properties":
                  {
                    "indexFieldName":
                    {
                      "type": "string"
                    },
                    "indexFieldType":
                    {
                      "type": "string",
                      "enum":
                      [
                        "STRING",
                        "STRING_LIST",
                        "DATE"
                      ]
                    },
                    "dataSourceFieldName":
                    {
                      "type": "string"
                    },
                    "dateFieldFormat":
                    {
                      "type": "string",
                      "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                    }
                  },
                  "required":
                  [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required":
          [
            "fieldMappings"
          ]
        },
        "profile":
        {
          "type": "object",
          "properties":
          {
            "fieldMappings":
            {
              "type": "array",
              "items":
              [
                {
                  "type": "object",
                  "properties":
                  {
                    "indexFieldName":
                    {
                      "type": "string"
                    },
                    "indexFieldType":
                    {
                      "type": "string",
                      "enum":
                      [
                        "STRING",
                        "STRING_LIST",
                        "DATE"
                      ]
                    },
                    "dataSourceFieldName":
                    {
                      "type": "string"
                    },
                    "dateFieldFormat":
                    {
                      "type": "string",
                      "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                    }
                  },
                  "required":
                  [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required":
          [
            "fieldMappings"
          ]
        },
        "idea":
        {
          "type": "object",
          "properties":
          {
            "fieldMappings":
            {
              "type": "array",
              "items":
              [
                {
                  "type": "object",
                  "properties":
                  {
                    "indexFieldName":
                    {
                      "type": "string"
                    },
                    "indexFieldType":
                    {
                      "type": "string",
                      "enum":
                      [
                        "STRING",
                        "STRING_LIST",
                        "DATE",
                        "LONG"
                      ]
                    },
                    "dataSourceFieldName":
                    {
                      "type": "string"
                    },
                    "dateFieldFormat":
                    {
                      "type": "string",
                      "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                    }
                  },
                  "required":
                  [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required":
          [
            "fieldMappings"
          ]
        },
        "pricebook":
        {
          "type": "object",
          "properties":
          {
            "fieldMappings":
            {
              "type": "array",
              "items":
              [
                {
                  "type": "object",
                  "properties":
                  {
                    "indexFieldName":
                    {
                      "type": "string"
                    },
                    "indexFieldType":
                    {
                      "type": "string",
                      "enum":
                      [
                        "STRING",
                        "STRING_LIST",
                        "DATE"
                      ]
                    },
                    "dataSourceFieldName":
                    {
                      "type": "string"
                    },
                    "dateFieldFormat":
                    {
                      "type": "string",
                      "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                    }
                  },
                  "required":
                  [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required":
          [
            "fieldMappings"
          ]
        },
        "task":
        {
          "type": "object",
          "properties":
          {
            "fieldMappings":
            {
              "type": "array",
              "items":
              [
                {
                  "type": "object",
                  "properties":
                  {
                    "indexFieldName":
                    {
                      "type": "string"
                    },
                    "indexFieldType":
                    {
                      "type": "string",
                      "enum":
                      [
                        "STRING",
                        "STRING_LIST",
                        "DATE"
                      ]
                    },
                    "dataSourceFieldName":
                    {
                      "type": "string"
                    },
                    "dateFieldFormat":
                    {
                      "type": "string",
                      "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                    }
                  },
                  "required":
                  [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required":
          [
            "fieldMappings"
          ]
        },
        "solution":
        {
          "type": "object",
          "properties":
          {
            "fieldMappings":
            {
              "type": "array",
              "items":
              [
                {
                  "type": "object",
                  "properties":
                  {
                    "indexFieldName":
                    {
                      "type": "string"
                    },
                    "indexFieldType":
                    {
                      "type": "string",
                      "enum":
                      [
                        "STRING",
                        "STRING_LIST",
                        "DATE"
                      ]
                    },
                    "dataSourceFieldName":
                    {
                      "type": "string"
                    },
                    "dateFieldFormat":
                    {
                      "type": "string",
                      "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                    }
                  },
                  "required":
                  [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required":
          [
            "fieldMappings"
          ]
        },
        "attachment":
        {
          "type": "object",
          "properties":
          {
            "fieldMappings":
            {
              "type": "array",
              "items":
              [
                {
                  "type": "object",
                  "properties":
                  {
                    "indexFieldName":
                    {
                      "type": "string"
                    },
                    "indexFieldType":
                    {
                      "type": "string",
                      "enum":
                      [
                        "STRING",
                        "STRING_LIST",
                        "DATE",
                        "LONG"
                      ]
                    },
                    "dataSourceFieldName":
                    {
                      "type": "string"
                    },
                    "dateFieldFormat":
                    {
                      "type": "string",
                      "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                    }
                  },
                  "required":
                  [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required":
          [
            "fieldMappings"
          ]
        },
        "user":
        {
          "type": "object",
          "properties":
          {
            "fieldMappings":
            {
              "type": "array",
              "items":
              [
                {
                  "type": "object",
                  "properties":
                  {
                    "indexFieldName":
                    {
                      "type": "string"
                    },
                    "indexFieldType":
                    {
                      "type": "string",
                      "enum":
                      [
                        "STRING",
                        "STRING_LIST",
                        "DATE"
                      ]
                    },
                    "dataSourceFieldName":
                    {
                      "type": "string"
                    },
                    "dateFieldFormat":
                    {
                      "type": "string",
                      "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                    }
                  },
                  "required":
                  [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required":
          [
            "fieldMappings"
          ]
        },
        "document":
        {
          "type": "object",
          "properties":
          {
            "fieldMappings":
            {
              "type": "array",
              "items":
              [
                {
                  "type": "object",
                  "properties":
                  {
                    "indexFieldName":
                    {
                      "type": "string"
                    },
                    "indexFieldType":
                    {
                      "type": "string",
                      "enum":
                      [
                        "STRING",
                        "STRING_LIST",
                        "DATE",
                        "LONG"
                      ]
                    },
                    "dataSourceFieldName":
                    {
                      "type": "string"
                    },
                    "dateFieldFormat":
                    {
                      "type": "string",
                      "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                    }
                  },
                  "required":
                  [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required":
          [
            "fieldMappings"
          ]
        },
        "knowledgeArticles":
        {
          "type": "object",
          "properties":
          {
            "fieldMappings":
            {
              "type": "array",
              "items":
              [
                {
                  "type": "object",
                  "properties":
                  {
                    "indexFieldName":
                    {
                      "type": "string"
                    },
                    "indexFieldType":
                    {
                      "type": "string",
                      "enum":
                      [
                        "STRING",
                        "STRING_LIST",
                        "DATE"
                      ]
                    },
                    "dataSourceFieldName":
                    {
                      "type": "string"
                    },
                    "dateFieldFormat":
                    {
                      "type": "string",
                      "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                    }
                  },
                  "required":
                  [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required":
          [
            "fieldMappings"
          ]
        },
        "group":
        {
          "type": "object",
          "properties":
          {
            "fieldMappings":
            {
              "type": "array",
              "items":
              [
                {
                  "type": "object",
                  "properties":
                  {
                    "indexFieldName":
                    {
                      "type": "string"
                    },
                    "indexFieldType":
                    {
                      "type": "string",
                      "enum":
                      [
                        "STRING",
                        "STRING_LIST",
                        "DATE"
                      ]
                    },
                    "dataSourceFieldName":
                    {
                      "type": "string"
                    },
                    "dateFieldFormat":
                    {
                      "type": "string",
                      "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                    }
                  },
                  "required":
                  [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required":
          [
            "fieldMappings"
          ]
        },
        "opportunity":
        {
          "type": "object",
          "properties":
          {
            "fieldMappings":
            {
              "type": "array",
              "items":
              [
                {
                  "type": "object",
                  "properties":
                  {
                    "indexFieldName":
                    {
                      "type": "string"
                    },
                    "indexFieldType":
                    {
                      "type": "string",
                      "enum":
                      [
                        "STRING",
                        "STRING_LIST",
                        "DATE",
                        "LONG"
                      ]
                    },
                    "dataSourceFieldName":
                    {
                      "type": "string"
                    },
                    "dateFieldFormat":
                    {
                      "type": "string",
                      "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                    }
                  },
                  "required":
                  [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required":
          [
            "fieldMappings"
          ]
        },
        "chatter":
        {
          "type": "object",
          "properties":
          {
            "fieldMappings":
            {
              "type": "array",
              "items":
              [
                {
                  "type": "object",
                  "properties":
                  {
                    "indexFieldName":
                    {
                      "type": "string"
                    },
                    "indexFieldType":
                    {
                      "type": "string",
                      "enum":
                      [
                        "STRING",
                        "STRING_LIST",
                        "DATE"
                      ]
                    },
                    "dataSourceFieldName":
                    {
                      "type": "string"
                    },
                    "dateFieldFormat":
                    {
                      "type": "string",
                      "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                    }
                  },
                  "required":
                  [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required":
          [
            "fieldMappings"
          ]
        },
        "customEntity":
        {
          "type": "object",
          "properties":
          {
            "fieldMappings":
            {
              "type": "array",
              "items":
              [
                {
                  "type": "object",
                  "properties":
                  {
                    "indexFieldName":
                    {
                      "type": "string"
                    },
                    "indexFieldType":
                    {
                      "type": "string",
                      "enum":
                      [
                        "STRING",
                        "STRING_LIST",
                        "DATE"
                      ]
                    },
                    "dataSourceFieldName":
                    {
                      "type": "string"
                    },
                    "dateFieldFormat":
                    {
                      "type": "string",
                      "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                    }
                  },
                  "required":
                  [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required":
          [
            "fieldMappings"
          ]
        }
      }
    },
    "additionalProperties": {
      "type": "object",
      "properties":
      {
        "accountFilter":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "contactFilter":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "caseFilter":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "campaignFilter":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "contractFilter":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "groupFilter":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "leadFilter":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "productFilter":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "opportunityFilter":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "partnerFilter":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "pricebookFilter":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "ideaFilter":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "profileFilter":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "taskFilter":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "solutionFilter":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "userFilter":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "chatterFilter":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "documentFilter":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "knowledgeArticleFilter":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "customEntities":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "isCrawlAccount": {
          "type": "boolean"
        },
        "isCrawlContact": {
          "type": "boolean"
        },
        "isCrawlCase": {
          "type": "boolean"
        },
        "isCrawlCampaign": {
          "type": "boolean"
        },
        "isCrawlProduct": {
          "type": "boolean"
        },
        "isCrawlLead": {
          "type": "boolean"
        },
        "isCrawlContract": {
          "type": "boolean"
        },
        "isCrawlPartner": {
          "type": "boolean"
        },
        "isCrawlProfile": {
          "type": "boolean"
        },
        "isCrawlIdea": {
          "type": "boolean"
        },
        "isCrawlPricebook": {
          "type": "boolean"
        },
        "isCrawlDocument": {
          "type": "boolean"
        },
        "crawlSharedDocument": {
          "type": "boolean"
        },
        "isCrawlGroup": {
          "type": "boolean"
        },
        "isCrawlOpportunity": {
          "type": "boolean"
        },
        "isCrawlChatter": {
          "type": "boolean"
        },
        "isCrawlUser": {
          "type": "boolean"
        },
        "isCrawlSolution":{
          "type": "boolean"
        },
        "isCrawlTask":{
          "type": "boolean"
        },

        "isCrawlAccountAttachments": {
          "type": "boolean"
        },
        "isCrawlContactAttachments": {
          "type": "boolean"
        },
        "isCrawlCaseAttachments": {
          "type": "boolean"
        },
        "isCrawlCampaignAttachments": {
          "type": "boolean"
        },
        "isCrawlLeadAttachments": {
          "type": "boolean"
        },
        "isCrawlContractAttachments": {
          "type": "boolean"
        },
        "isCrawlGroupAttachments": {
          "type": "boolean"
        },
        "isCrawlOpportunityAttachments": {
          "type": "boolean"
        },
        "isCrawlChatterAttachments": {
          "type": "boolean"
        },
        "isCrawlSolutionAttachments":{
          "type": "boolean"
        },
        "isCrawlTaskAttachments":{
          "type": "boolean"
        },
        "isCrawlCustomEntityAttachments":{
          "type": "boolean"
        },
        "isCrawlKnowledgeArticles": {
          "type": "object",
          "properties":
          {
            "isCrawlDraft": {
              "type": "boolean"
            },
            "isCrawlPublish": {
              "type": "boolean"
            },
            "isCrawlArchived": {
              "type": "boolean"
            }
          }
        },
        "inclusionDocumentFileTypePatterns":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "exclusionDocumentFileTypePatterns": {
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "inclusionDocumentFileNamePatterns":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "exclusionDocumentFileNamePatterns": {
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "inclusionAccountFileTypePatterns": {
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "exclusionAccountFileTypePatterns": {
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "inclusionAccountFileNamePatterns":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "exclusionAccountFileNamePatterns":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "inclusionCampaignFileTypePatterns": {
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "exclusionCampaignFileTypePatterns": {
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "inclusionCampaignFileNamePatterns":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "exclusionCampaignFileNamePatterns":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "inclusionCaseFileTypePatterns":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "exclusionCaseFileTypePatterns":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "inclusionCaseFileNamePatterns":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "exclusionCaseFileNamePatterns":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "inclusionContactFileTypePatterns":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "exclusionContactFileTypePatterns":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "inclusionContactFileNamePatterns":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "exclusionContactFileNamePatterns":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "inclusionContractFileTypePatterns":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "exclusionContractFileTypePatterns":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "inclusionContractFileNamePatterns":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "exclusionContractFileNamePatterns":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "inclusionLeadFileTypePatterns":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "exclusionLeadFileTypePatterns":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "inclusionLeadFileNamePatterns":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "exclusionLeadFileNamePatterns":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "inclusionOpportunityFileTypePatterns":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "exclusionOpportunityFileTypePatterns":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "inclusionOpportunityFileNamePatterns":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "exclusionOpportunityFileNamePatterns":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "inclusionSolutionFileTypePatterns":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "exclusionSolutionFileTypePatterns":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "inclusionSolutionFileNamePatterns":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "exclusionSolutionFileNamePatterns":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "inclusionTaskFileTypePatterns":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "exclusionTaskFileTypePatterns":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "inclusionTaskFileNamePatterns":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "exclusionTaskFileNamePatterns":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "inclusionGroupFileTypePatterns":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "exclusionGroupFileTypePatterns":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "inclusionGroupFileNamePatterns":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "exclusionGroupFileNamePatterns":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "inclusionChatterFileTypePatterns":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "exclusionChatterFileTypePatterns":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "inclusionChatterFileNamePatterns":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "exclusionChatterFileNamePatterns":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "inclusionCustomEntityFileTypePatterns":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "exclusionCustomEntityFileTypePatterns":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "inclusionCustomEntityFileNamePatterns":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        },
        "exclusionCustomEntityFileNamePatterns":{
          "type": "array",
          "items":
          {
            "type": "string"
          }
        }
      },
      "required":
      []
    },
    "enableIdentityCrawler": {
      "type": "boolean"
    },
    "type": {
      "type": "string",
      "pattern": "SALESFORCEV2"
    },
    "syncMode": {
      "type": "string",
      "enum": [
        "FULL_CRAWL",
        "FORCED_FULL_CRAWL",
        "CHANGE_LOG"
      ]
    },
    "secretArn": {
      "type": "string",
      "minLength": 20,
      "maxLength": 2048
    }
  },
  "version": {
    "type": "string",
    "anyOf": [
      {
        "pattern": "1.0.0"
      }
    ]
  },
  "required": [
    "connectionConfiguration",
    "repositoryConfigurations",
    "syncMode",
    "additionalProperties",
    "secretArn",
    "type"
  ]
}
```

## ServiceNow template schema
<a name="ds-servicenow-schema"></a>

You include a JSON that contains the data source schema as part of the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html) object. You provide the ServiceNow host URL, authentication type, and instance version as a part of the connection configuration or repository endpoint details. Also specify the type of data source as `SERVICENOWV2`, a secret for your authentication credentials, and other necessary configurations. You then specify `TEMPLATE` as the `Type` when you call [CreateDataSource](https://docs.aws.amazon.com/kendra/latest/APIReference/API_CreateDataSource.html).

You can use the template provided in this developer guide. See [ServiceNow JSON schema](#servicenow-json).

The following table describes the parameters of the ServiceNow JSON schema.


| Configuration | Description | 
| --- | --- | 
| connectionConfiguration | Configuration information for the endpoint for the data source. | 
| repositoryEndpointMetadata | The endpoint information for the data source. | 
| hostUrl | The ServiceNow host URL. For example, your-domain.service-now.com. | 
| authType | The type of authentication that you use, whether basicAuth or OAuth2. | 
| servicenowInstanceVersion | The ServiceNow version that you use. You can choose between Tokyo, Sandiego, Rome, and Others. | 
| repositoryConfigurations | Configuration information for the content of the data source. For example, configuring specific types of content and field mappings. | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | A list of objects that map the attributes or field names of your ServiceNow knowledge articles, attachments, service catalog, and incidents to Amazon Kendra index field names. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html). The ServiceNow data source field names must exist in your ServiceNow custom metadata. | 
| additional properties | Additional configuration options for your content in your data source. | 
| maxFileSizeInMegaBytes | Specify the file size limit in MBs that Amazon Kendra will crawl. Amazon Kendra will crawl only the files within the size limit you define. The default file size is 50MB. The maximum file size should be greater than 0MB and less than or equal to 50MB. | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | A list of regular expression patterns to include and/or exclude certain files in your ServiceNow data source. Files that match the patterns are included in the index. Files that don't match the patterns are excluded from the index. If a file matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence and the file isn't included in the index. | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | true to crawl ServiceNow knowledge articles, service catalogs, incidents, and attachments. | 
| type | The type of data source. Specify SERVICENOWV2 as your data source type. | 
| enableIdentityCrawler | true to use Amazon Kendra's identity crawler to sync identity/principal information on users and groups with access to certain documents. If identity crawler is turned off, all documents can be publicly searched. If you want to use access control for your documents and identity crawler is turned off, you can alternatively use the [https://docs.aws.amazon.com/kendra/latest/dg/API_PutPrincipalMapping.html](https://docs.aws.amazon.com/kendra/latest/dg/API_PutPrincipalMapping.html) API to upload user and group access information. | 
| syncMode | Specify how Amazon Kendra should update your index when your data source content changes. You can choose between: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | 
| secretARN | The Amazon Resource Name (ARN) of an AWS Secrets Manager secret that contains the key-value pairs required to connect to your ServiceNow. The secret must contain a JSON structure with the following keys: <pre>{<br />    "username": "user name",<br />    "password": "password"<br />}</pre> If you use OAuth2 authentication, your secret must contain a JSON structure with the following keys: <pre>{<br />    "username": "user name",<br />    "password": "password",<br />    "clientId": "client id",<br />    "clientSecret": "client secret"         <br />}</pre>  | 
| version | The version of the template that is currently supported. | 

### ServiceNow JSON schema
<a name="servicenow-json"></a>

```
{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "type": "object",
  "properties": {
    "connectionConfiguration": {
      "type": "object",
      "properties": {
        "repositoryEndpointMetadata": {
          "type": "object",
          "properties": {
            "hostUrl": {
              "type": "string",
              "pattern": "^(?!(^(https?|ftp|file):\/\/))[a-z0-9-]+(.service-now.com|.servicenowservices.com)$",
              "minLength": 1,
              "maxLength": 2048
            },
            "authType": {
              "type": "string",
              "enum": [
                "basicAuth",
                "OAuth2"
              ]
            },
            "servicenowInstanceVersion": {
              "type": "string",
              "enum": [
                "Tokyo",
                "Sandiego",
                "Rome",
                "Others"
                ]
            }
          },
          "required": [
            "hostUrl",
            "authType",
            "servicenowInstanceVersion"
          ]
        }
      },
      "required": [
        "repositoryEndpointMetadata"
      ]
    },
    "repositoryConfigurations": {
      "type": "object",
      "properties": {
        "knowledgeArticle": {
          "type": "object",
          "properties": {
            "fieldMappings": {
              "type": "array",
              "items": [
                {
                  "type": "object",
                  "properties": {
                    "indexFieldName": {
                      "type": "string"
                    },
                    "indexFieldType": {
                      "type": "string",
                      "enum": [
                        "STRING",
                        "DATE",
                        "STRING_LIST"
                      ]
                    },
                    "dataSourceFieldName": {
                      "type": "string"
                    },
                    "dateFieldFormat": {
                      "type": "string",
                      "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                    }
                  },
                  "required": [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required": [
            "fieldMappings"
          ]
        },
        "attachment": {
          "type": "object",
          "properties": {
            "fieldMappings": {
              "type": "array",
              "items": [
                {
                  "type": "object",
                  "properties": {
                    "indexFieldName": {
                      "type": "string"
                    },
                    "indexFieldType": {
                      "type": "string",
                      "enum": [
                        "STRING",
                        "LONG",
                        "DATE",
                        "STRING_LIST"
                      ]
                    },
                    "dataSourceFieldName": {
                      "type": "string"
                    },
                    "dateFieldFormat": {
                      "type": "string",
                      "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                    }
                  },
                  "required": [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required": [
            "fieldMappings"
          ]
        },
        "serviceCatalog": {
          "type": "object",
          "properties": {
            "fieldMappings": {
              "type": "array",
              "items": [
                {
                  "type": "object",
                  "properties": {
                    "indexFieldName": {
                      "type": "string"
                    },
                    "indexFieldType": {
                      "type": "string",
                      "enum": [
                        "STRING",
                        "DATE",
                        "STRING_LIST"
                      ]
                    },
                    "dataSourceFieldName": {
                      "type": "string"
                    },
                    "dateFieldFormat": {
                      "type": "string",
                      "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                    }
                  },
                  "required": [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required": [
            "fieldMappings"
          ]
        },
        "incident": {
          "type": "object",
          "properties": {
            "fieldMappings": {
              "type": "array",
              "items": [
                {
                  "type": "object",
                  "properties": {
                    "indexFieldName": {
                      "type": "string"
                    },
                    "indexFieldType": {
                      "type": "string",
                      "enum": [
                        "STRING",
                        "DATE",
                        "STRING_LIST"
                      ]
                    },
                    "dataSourceFieldName": {
                      "type": "string"
                    },
                    "dateFieldFormat": {
                      "type": "string",
                      "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                    }
                  },
                  "required": [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required": [
            "fieldMappings"
          ]
        }
      }
    },
    "additionalProperties": {
      "type": "object",
      "properties": {
        "maxFileSizeInMegaBytes": {
          "type": "string"
        },
        "isCrawlKnowledgeArticle": {
          "type": "boolean"
        },
        "isCrawlKnowledgeArticleAttachment": {
          "type": "boolean"
        },
        "includePublicArticlesOnly": {
          "type": "boolean"
        },
        "knowledgeArticleFilter": {
          "type": "string"
        },
        "incidentQueryFilter": {
          "type": "string"
        },
        "serviceCatalogQueryFilter": {
          "type": "string"
        },
        "isCrawlServiceCatalog": {
          "type": "boolean"
        },
        "isCrawlServiceCatalogAttachment": {
          "type": "boolean"
        },
        "isCrawlActiveServiceCatalog": {
          "type": "boolean"
        },
        "isCrawlInactiveServiceCatalog": {
          "type": "boolean"
        },
        "isCrawlIncident": {
          "type": "boolean"
        },
        "isCrawlIncidentAttachment": {
          "type": "boolean"
        },
        "isCrawlActiveIncident": {
          "type": "boolean"
        },
        "isCrawlInactiveIncident": {
          "type": "boolean"
        },
        "applyACLForKnowledgeArticle": {
          "type": "boolean"
        },
        "applyACLForServiceCatalog": {
          "type": "boolean"
        },
        "applyACLForIncident": {
          "type": "boolean"
        },
        "incidentStateType": {
          "type": "array",
          "items": {
            "type": "string",
            "enum": [
              "Open",
              "Open - Unassigned",
              "Resolved",
              "All"
            ]
          }
        },
        "knowledgeArticleTitleRegExp": {
          "type": "string"
        },
        "serviceCatalogTitleRegExp": {
          "type": "string"
        },
        "incidentTitleRegExp": {
          "type": "string"
        },
        "inclusionFileTypePatterns": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "exclusionFileTypePatterns": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "inclusionFileNamePatterns": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "exclusionFileNamePatterns": {
          "type": "array",
          "items": {
            "type": "string"
          }
        }
      },
      "required": []
    },
    "type": {
      "type": "string",
      "pattern": "SERVICENOWV2"
    },
    "enableIdentityCrawler": {
      "type": "boolean"
    },
    "syncMode": {
      "type": "string",
      "enum": [
        "FORCED_FULL_CRAWL",
        "FULL_CRAWL"
      ]
    },
    "secretArn": {
      "type": "string",
      "minLength": 20,
      "maxLength": 2048
    }
  },
  "version": {
    "type": "string",
    "anyOf": [
      {
        "pattern": "1.0.0"
      }
    ]
  },
  "required": [
    "connectionConfiguration",
    "repositoryConfigurations",
    "syncMode",
    "additionalProperties",
    "secretArn",
    "type"
  ]
}
```

## Slack template schema
<a name="ds-schema-slack"></a>

You include a JSON that contains the data source schema as part of [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html) object. You provide the host URL as a part of the connection configuration or repository endpoint details. Also specify the type of data source as `SLACK`, a secret for your authentication credentials, and other necessary configurations. You then specify `TEMPLATE` as the `Type` when you call [CreateDataSource](https://docs.aws.amazon.com/kendra/latest/APIReference/API_CreateDataSource.html).

You can use the template provided in this developer guide. See [Slack JSON schema](#slack-json).

The following table describes the parameters of the Slack JSON schema.


| Configuration | Description | 
| --- | --- | 
| connectionConfiguration | Configuration information for the endpoint for the data source. | 
| repositoryEndpointMetadata | The endpoint information for the data source. | 
| teamId | The Slack team ID you copied from your Slack main page URL. | 
| repositoryConfigurations | Configuration information for the content of the data source. For example, configuring specific types of content and field mappings. | 
| All | A list of objects that map the attributes or field names of your Slack content to Amazon Kendra index field names.  | 
| additionalProperties | Additional configuration options for your content in your data source. | 
| inclusionPatterns | A list of regular expression patterns to include specific content in your Slack data source. Content that matches the patterns are included in the index. Content that doesn't match the patterns are excluded from the index. If any content matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence, and the content isn't included in the index. | 
| exclusionPatterns | A list of regular expression patterns to exclude specific content in your Slack data source. Content that matches the patterns are excluded from the index. Content that doen't match the patterns are included in the index. If any content matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence, and the content isn't included in the index. | 
| crawlBotMessages | true to crawl bot messages. | 
| excludeArchived | true to exclude crawling of archived messages. | 
| conversationType | The type of conversation that you want to index whether PUBLIC\$1CHANNEL, PRIVATE\$1CHANNEL, GROUP\$1MESSAGE and DIRECT\$1MESSAGE. | 
| channelFilter | The type of channel that you want to index whether private\$1channel or public\$1channel. | 
| sinceDate | You can choose to configure a sinceDate parameter so that the Slack connector crawls content based on a specific sinceDate. | 
| lookBack | You can choose to configure a lookBack parameter so that the Slack connector crawls updated or deleted content upto a specified number of hours before your last connector sync. | 
| syncMode |  Specify how Amazon Kendra should update your index when your data source content changes. You can choose between: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | 
| type | The type of data source. Specify SLACK as your data source type. | 
| enableIdentityCrawler | true to use Amazon Kendra's identity crawler to sync identity/principal information on users and groups with access to certain documents. If identity crawler is turned off, all documents can be publicly searched. If you want to use access control for your documents and identity crawler is turned off, you can alternatively use the [https://docs.aws.amazon.com/kendra/latest/dg/API_PutPrincipalMapping.html](https://docs.aws.amazon.com/kendra/latest/dg/API_PutPrincipalMapping.html) API to upload user and group access information. | 
| secretArn |  The Amazon Resource Name (ARN) of an AWS Secrets Manager secret that contains the key-value pairs required to connect to your Slack. The secret must contain a JSON structure with the following keys: <pre>{<br />    "slackToken": "token"<br />}</pre>  | 
| version | The version of this template that's currently supported. | 

### Slack JSON schema
<a name="slack-json"></a>

```
{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "type": "object",
  "properties": {
    "connectionConfiguration": {
      "type": "object",
      "properties": {
        "repositoryEndpointMetadata": {
          "type": "object",
          "properties": {
            "teamId": {
              "type": "string"
            }
          },
          "required": ["teamId"]
        }
      }
    },
    "repositoryConfigurations": {
      "type": "object",
      "properties": {
        "All": {
          "type": "object",
          "properties": {
            "fieldMappings": {
              "type": "array",
              "items": [
                {
                  "type": "object",
                  "properties": {
                    "indexFieldName": {
                      "type": "string"
                    },
                    "indexFieldType": {
                      "type": "string",
                      "enum": ["STRING", "STRING_LIST", "DATE","LONG"]
                    },
                    "dataSourceFieldName": {
                      "type": "string"
                    },
                    "dateFieldFormat": {
                      "type": "string",
                      "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"
                    }
                  },
                  "required": [
                    "indexFieldName",
                    "indexFieldType",
                    "dataSourceFieldName"
                  ]
                }
              ]
            }
          },
          "required": [
            "fieldMappings"
          ]
        }
      },
      "required": [
      ]
    },
    "additionalProperties": {
      "type": "object",
      "properties": {
        "exclusionPatterns": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "inclusionPatterns": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "crawlBotMessages": {
          "type": "boolean"
        },
        "excludeArchived": {
          "type": "boolean"
        },
        "conversationType": {
          "type": "array",
          "items": {
            "type": "string",
            "enum": [
              "PUBLIC_CHANNEL",
              "PRIVATE_CHANNEL",
              "GROUP_MESSAGE",
              "DIRECT_MESSAGE"
            ]
          }
        },
        "channelFilter": {
            "type": "object",
            "properties": {
              "private_channel": {
                "type": "array",
                "items": {
                  "type": "string"
                }
              },
              "public_channel": {
                "type": "array",
                "items": {
                  "type": "string"
                }
              }
          }
        },
        "channelIdFilter": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "sinceDate": {
          "anyOf": [
            {
              "type": "string",
              "pattern": "^[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}Z$"
            },
            {
              "type": "string",
              "pattern": ""
            }
          ]
        },
        "lookBack": {
          "type": "string",
          "pattern": "^[0-9]*$"
        }
      },
      "required": [
      ]
    },
    "syncMode": {
      "type": "string",
      "enum": [
        "FORCED_FULL_CRAWL",
        "FULL_CRAWL",
        "CHANGE_LOG"
      ]
    },
    "type" : {
      "type" : "string",
      "pattern": "SLACK"
    },
    "enableIdentityCrawler": {
      "type": "boolean"
    },
    "secretArn": {
      "type": "string"
    }
  },
  "version": {
    "type": "string",
    "anyOf": [
      {
        "pattern": "1.0.0"
      }
    ]
  },
  "required": [
    "connectionConfiguration",
    "repositoryConfigurations",
    "syncMode",
    "additionalProperties",
    "secretArn",
    "type",
    "enableIdentityCrawler"
  ]
}
```

## Zendesk template schema
<a name="ds-schema-zendesk"></a>

You include a JSON that contains the data source schema as part of [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html) object. You provide the host URL as a part of the connection configuration or repository endpoint details. Also specify the type of data source as `ZENDESK`, a secret for your authentication credentials, and other necessary configurations. You then specify `TEMPLATE` as the `Type` when you call [CreateDataSource](https://docs.aws.amazon.com/kendra/latest/APIReference/API_CreateDataSource.html).

You can use the template provided in this developer guide. See [Zendesk JSON schema](#zendesk-json).

The following table describes the parameters of the Zendesk JSON schema.


| Configuration | Description | 
| --- | --- | 
| connectionConfiguration | Configuration information for the endpoint for the data source. | 
| repositoryEndpointMetadata | The endpoint information for the data source. | 
| hostURL | The Zendesk host URL. For example, https://yoursubdomain.zendesk.com. | 
| repositoryConfigurations | Configuration information for the content of the data source. For example, configuring specific types of content and field mappings. | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | A list of objects that map attributes or field names of Zendesk tickets to Amazon Kendra index field names. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html). | 
| secretARN | The Amazon Resource Name (ARN) of an AWS Secrets Manager secret that contains the key-value pairs required to connect to your Zendesk. The secret must contain a JSON structure with the following keys: host URL, client ID, client secret, user name, and password. | 
| additionalProperties | Additional configuration options for your content in your data source | 
| organizationNameFilter | You can choose to index tickets that exist within a specific Organization. | 
| sinceDate | You can choose to configure a sinceDate parameter so that the Zendesk connector crawls content based on a specific sinceDate. | 
| inclusionPatterns | A list of regular expression patterns to include certain files in your Zendesk data source. Files that match the patterns are included in the index. Files that don't match the patterns are excluded from the index. If a file matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence, and the file isn't included in the index. | 
| exclusionPatterns | A list of regular expression patterns to exclude certain files in your Zendesk data source. Files that match the patterns are excluded from the index. Files that don't match the patterns are included in the index. If a file matches both an exclusion and inclusion pattern, the exclusion pattern takes precedence, and the file isn't included in the index. | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html)  | Input "true" to crawl these types of content. | 
| type | Specify ZENDESK as your data source type. | 
| useChangeLog | Input "true" to use the Zendesk change log to determine which documents require updating in the index. Depending on the change log's size, it might be faster to scan the documents in Zendesk. If you are syncing your Zendesk data source with your index for the first time, all documents are scanned. | 

### Zendesk JSON schema
<a name="zendesk-json"></a>

```
{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "type": "object",
  "properties": {
    "connectionConfiguration": {
      "type": "object",
      "properties": {
        "repositoryEndpointMetadata": {
          "type": "object",
          "properties": {
            "hostUrl": {
              "type": "string",
              "pattern": "https:.*"
            }
          },
          "required": [
            "hostUrl"
          ]
        }
      },
      "required": [
        "repositoryEndpointMetadata"
      ]
    },
    "repositoryConfigurations": {
      "type": "object",
      "properties": {
        "ticket": {
          "type": "object",
          "properties": {
            "fieldMappings": {
              "type": "array",
              "items": {
                "anyOf": [
                  {
                    "type": "object",
                    "properties": {
                      "indexFieldName": {
                        "type": "string"
                      },
                      "indexFieldType": {
                        "type": "string",
                        "enum": ["STRING", "STRING_LIST", "LONG", "DATE"]
                      },
                      "dataSourceFieldName": {
                        "type": "string"
                      },
                      "dateFieldFormat": {
                        "type": "string",
                        "pattern": "dd-MM-yyyy HH:mm:ss"
                      }
                    },
                    "required": [
                      "indexFieldName",
                      "indexFieldType",
                      "dataSourceFieldName"

                    ]
                  }
                ]
              }
            }
          },
          "required": [
            "fieldMappings"
          ]
        },
        "ticketComment": {
          "type": "object",
          "properties": {
            "fieldMappings": {
              "type": "array",
              "items": {
                "anyOf": [
                  {
                    "type": "object",
                    "properties": {
                      "indexFieldName": {
                        "type": "string"
                      },
                      "indexFieldType": {
                        "type": "string",
                        "enum": ["STRING", "STRING_LIST", "LONG", "DATE"]
                      },
                      "dataSourceFieldName": {
                        "type": "string"
                      },
                      "dateFieldFormat": {
                        "type": "string",
                        "pattern": "dd-MM-yyyy HH:mm:ss"
                      }
                    },
                    "required": [
                      "indexFieldName",
                      "indexFieldType",
                      "dataSourceFieldName"

                    ]
                  }
                ]
              }
            }
          },
          "required": [
            "fieldMappings"
          ]
        },
        "ticketCommentAttachment": {
          "type": "object",
          "properties": {
            "fieldMappings": {
              "type": "array",
              "items": {
                "anyOf": [
                  {
                    "type": "object",
                    "properties": {
                      "indexFieldName": {
                        "type": "string"
                      },
                      "indexFieldType": {
                        "type": "string",
                        "enum": ["STRING", "STRING_LIST", "LONG", "DATE"]
                      },
                      "dataSourceFieldName": {
                        "type": "string"
                      },
                      "dateFieldFormat": {
                        "type": "string",
                        "pattern": "dd-MM-yyyy HH:mm:ss"
                      }
                    },
                    "required": [
                      "indexFieldName",
                      "indexFieldType",
                      "dataSourceFieldName"
                    ]
                  }
                ]
              }
            }
          },
          "required": [
            "fieldMappings"
          ]
        },
        "article": {
          "type": "object",
          "properties": {
            "fieldMappings": {
              "type": "array",
              "items": {
                "anyOf": [
                  {
                    "type": "object",
                    "properties": {
                      "indexFieldName": {
                        "type": "string"
                      },
                      "indexFieldType": {
                        "type": "string",
                        "enum": ["STRING", "STRING_LIST", "LONG", "DATE"]
                      },
                      "dataSourceFieldName": {
                        "type": "string"
                      },
                      "dateFieldFormat": {
                        "type": "string",
                        "pattern": "dd-MM-yyyy HH:mm:ss"
                      }
                    },
                    "required": [
                      "indexFieldName",
                      "indexFieldType",
                      "dataSourceFieldName"
                    ]
                  }
                ]
              }
            }
          },
          "required": [
            "fieldMappings"
          ]
        },
        "communityPostComment": {
          "type": "object",
          "properties": {
            "fieldMappings": {
              "type": "array",
              "items": {
                "anyOf": [
                  {
                    "type": "object",
                    "properties": {
                      "indexFieldName": {
                        "type": "string"
                      },
                      "indexFieldType": {
                        "type": "string",
                        "enum": ["STRING", "STRING_LIST", "LONG", "DATE"]
                      },
                      "dataSourceFieldName": {
                        "type": "string"
                      },
                      "dateFieldFormat": {
                        "type": "string",
                        "pattern": "dd-MM-yyyy HH:mm:ss"
                      }
                    },
                    "required": [
                      "indexFieldName",
                      "indexFieldType",
                      "dataSourceFieldName"
                    ]
                  }
                ]
              }
            }
          },
          "required": [
            "fieldMappings"
          ]
        },
        "articleComment": {
          "type": "object",
          "properties": {
            "fieldMappings": {
              "type": "array",
              "items": {
                "anyOf": [
                  {
                    "type": "object",
                    "properties": {
                      "indexFieldName": {
                        "type": "string"
                      },
                      "indexFieldType": {
                        "type": "string",
                        "enum": ["STRING", "STRING_LIST", "LONG", "DATE"]
                      },
                      "dataSourceFieldName": {
                        "type": "string"
                      },
                      "dateFieldFormat": {
                        "type": "string",
                        "pattern": "dd-MM-yyyy HH:mm:ss"
                      }
                    },
                    "required": [
                      "indexFieldName",
                      "indexFieldType",
                      "dataSourceFieldName"
                    ]
                  }
                ]
              }
            }
          },
          "required": [
            "fieldMappings"
          ]
        },
        "articleAttachment": {
          "type": "object",
          "properties": {
            "fieldMappings": {
              "type": "array",
              "items": {
                "anyOf": [
                  {
                    "type": "object",
                    "properties": {
                      "indexFieldName": {
                        "type": "string"
                      },
                      "indexFieldType": {
                        "type": "string",
                        "enum": ["STRING", "STRING_LIST", "LONG", "DATE"]
                      },
                      "dataSourceFieldName": {
                        "type": "string"
                      },
                      "dateFieldFormat": {
                        "type": "string",
                        "pattern": "dd-MM-yyyy HH:mm:ss"
                      }
                    },
                    "required": [
                      "indexFieldName",
                      "indexFieldType",
                      "dataSourceFieldName"
                    ]
                  }
                ]
              }
            }
          },
          "required": [
            "fieldMappings"
          ]
        },
        "communityTopic": {
          "type": "object",
          "properties": {
            "fieldMappings": {
              "type": "array",
              "items": {
                "anyOf": [
                  {
                    "type": "object",
                    "properties": {
                      "indexFieldName": {
                        "type": "string"
                      },
                      "indexFieldType": {
                        "type": "string",
                        "enum": ["STRING", "STRING_LIST", "LONG", "DATE"]
                      },
                      "dataSourceFieldName": {
                        "type": "string"
                      },
                      "dateFieldFormat": {
                        "type": "string",
                        "pattern": "dd-MM-yyyy HH:mm:ss"
                      }
                    },
                    "required": [
                      "indexFieldName",
                      "indexFieldType",
                      "dataSourceFieldName"
                    ]
                  }
                ]
              }
            }
          },
          "required": [
            "fieldMappings"
          ]
        }
      }
    },
    "secretArn": {
      "type": "string",
      "minLength": 20,
      "maxLength": 2048
    },
    "additionalProperties": {
      "type": "object",
      "properties": {
        "organizationNameFilter": {
          "type": "array"
        },
        "sinceDate": {
          "type": "string",
          "pattern": "^[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}$"
        },
        "inclusionPatterns": {
          "type": "array"
        },
        "exclusionPatterns": {
          "type": "array"
        },
        "isCrawTicket": {
          "type": "string"
        },
        "isCrawTicketComment": {
          "type": "string"
        },
        "isCrawTicketCommentAttachment": {
          "type": "string"
        },
        "isCrawlArticle": {
          "type": "string"
        },
        "isCrawlArticleAttachment": {
          "type": "string"
        },
        "isCrawlArticleComment": {
          "type": "string"
        },
        "isCrawlCommunityTopic": {
          "type": "string"
        },
        "isCrawlCommunityPost": {
          "type": "string"
        },
        "isCrawlCommunityPostComment": {
          "type": "string"
        }
      }
    },
    "type": {
      "type": "string",
      "pattern": "ZENDESK"
    },
    "useChangeLog": {
      "type": "string",
      "enum": ["true", "false"]
    }
  },
  "version": {
    "type": "string",
    "anyOf": [
      {
        "pattern": "1.0.0"
      }
    ]
  },
  "additionalProperties": false,
  "required": [
    "connectionConfiguration",
    "repositoryConfigurations",
    "additionalProperties",
    "useChangeLog",
    "secretArn",
    "type"
  ]
}
```

# Adobe Experience Manager
<a name="data-source-aem"></a>

**Note**  
Adobe Experience Manager connector remains fully supported for existing customers through May 31, 2026. While this connector is no longer available for new users, current users can continue to use it without interruption. We are continuously evolving our connector portfolio to offer more scalable and customizable solutions. For future integrations, we recommend exploring the Amazon Kendra Custom Connector Framework[1], designed to support a broader range of enterprise use cases with enhanced flexibility.

Adobe Experience Manager is a content management system that's used for creating website or mobile app content. You can use Amazon Kendra to connect to Adobe Experience Manager and index your pages and content assets.

Amazon Kendra supports Adobe Experience Manager (AEM) as a Cloud Service author instance and Adobe Experience Manager On-Premise author and publish instance.

You can connect Amazon Kendra to your Adobe Experience Manager data source using the [Amazon Kendra console](https://console.aws.amazon.com/kendra/) or the [https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) API.

For troubleshooting your Amazon Kendra Adobe Experience Manager data source connector, see [Troubleshooting data sources](troubleshooting-data-sources.md).

**Topics**
+ [Supported features](#supported-features-aem)
+ [Prerequisites](#prerequisites-aem)
+ [Connection instructions](#data-source-procedure-aem)

## Supported features
<a name="supported-features-aem"></a>

Adobe Experience Manager data source connector supports the following features:
+ Field mappings
+ User access control
+ Inclusion/exclusion filters
+ Full and incremental content syncs
+ OAuth 2.0 and basic authentication
+ Virtual private cloud (VPC)

## Prerequisites
<a name="prerequisites-aem"></a>

Before you can use Amazon Kendra to index your Adobe Experience Manager data source, make these changes in your Adobe Experience Manager and AWS accounts.

**In Adobe Experience Manager, make sure you have**:
+ Access to an account with administrative privileges, or an admin user.
+ Copied your Adobe Experience Manager host URL.
**Note**  
(On-premise/server) Amazon Kendra checks if the endpoint information included in AWS Secrets Manager is the same the endpoint information specified in your data source configuration details. This helps protect against the [confused deputy problem](https://docs.aws.amazon.com/IAM/latest/UserGuide/confused-deputy.html), which is a security issue where a user doesn’t have permission to perform an action but uses Amazon Kendra as a proxy to access the configured secret and perform the action. If you later change your endpoint information, you must create a new secret to sync this information.
+ Noted your basic authentication credentials of admin user name and password.
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).
+ **Optional**: Configured OAuth 2.0 credentials in Adobe Experience Manager (AEM) as a Cloud Service or AEM On-Premise. If you use AEM On-Premise, the credentials include client ID, client secret, and private key. If you use AEM as a Cloud Service, the credentials include client ID, client secret, private key, organization ID, technical account ID, and Adobe Identity Management System (IMS) host. For more information about how to generate these credentials for AEM as a Cloud Service, see [Adobe Experience Manager documentation](https://experienceleague.adobe.com/docs/experience-manager-learn/getting-started-with-aem-headless/authentication/service-credentials.html). For AEM On-Premise, Adobe Granite OAuth 2.0 server implementation (com.adobe.granite.oauth.server) provides the support for OAuth 2.0 server functionalities in AEM.
+ Checked each document is unique in Adobe Experience Manager and across other data sources you plan to use for the same index. Each data source that you want to use for an index must not contain the same document across the data sources. Document IDs are global to an index and must be unique per index.

**In your AWS account, make sure you have:**
+ [Created an Amazon Kendra index](https://docs.aws.amazon.com/kendra/latest/dg/create-index.html) and, if using the API, noted the index ID.
+ [Created an IAM role](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds) for your data source and, if using the API, noted the ARN of the IAM role.
**Note**  
If you change your authentication type and credentials, you must update your IAM role to access the correct AWS Secrets Manager secret ID.
+ Stored your Adobe Experience Manager authentication credentials in an AWS Secrets Manager secret and, if using the API, noted the ARN of the secret.
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).

If you don’t have an existing IAM role or secret, you can use the console to create a new IAM role and Secrets Manager secret when you connect your Adobe Experience Manager data source to Amazon Kendra. If you are using the API, you must provide the ARN of an existing IAM role and Secrets Manager secret, and an index ID.

## Connection instructions
<a name="data-source-procedure-aem"></a>

To connect Amazon Kendra to your Adobe Experience Manager data source, you must provide the necessary details of your Adobe Experience Manager data source so that Amazon Kendra can access your data. If you have not yet configured Adobe Experience Manager for Amazon Kendra, see [Prerequisites](#prerequisites-aem).

------
#### [ Console ]

**To connect Amazon Kendra to Adobe Experience Manager** 

1. Sign in to the AWS Management Console and open the [Amazon Kendra console](https://console.aws.amazon.com/kendra/).

1. From the left navigation pane, choose **Indexes** and then choose the index you want to use from the list of indexes.
**Note**  
You can choose to configure or edit your **User access control** settings under **Index settings**. 

1. On the **Getting started** page, choose **Add data source**.

1. On the **Add data source** page, choose **Adobe Experience Manager connector**, and then choose **Add connector**. If using version 2 (if applicable), choose **Adobe Experience Manager connector** with the "V2.0" tag.

1. On the **Specify data source details** page, enter the following information:

   1. In **Name and description**, for **Data source name**—Enter a name for your data source. You can include hyphens but not spaces.

   1. (Optional)** Description**—Enter an optional description for your data source.

   1. In **Default language**—Choose a language to filter your documents for the index. Unless you specify otherwise, the language defaults to English. Language specified in the document metadata overrides the selected language.

   1. In **Tags**, for **Add new tag**—Include optional tags to search and filter your resources or track your AWS costs.

   1. Choose **Next**.

1. On the **Define access and security** page, enter the following information:

   1. **Source**—Choose either **AEM On-Premise** or **AEM as a Cloud Service**.

      Enter your Adobe Experience Manager host URL. For example, if you use AEM On-Premise, you include the hostname and port: *https://hostname:port*. Or, if you use AEM as a Cloud Service, you can use the author URL: *https://author-xxxxxx-xxxxxxx.adobeaemcloud.com*.

   1. **SSL certificate location**—Enter the path to the SSL certificate stored in an Amazon S3 bucket. You use this to connect to AEM On-Premise with a secure SSL connection.

   1. **Authorization**—Turn on or off access control list (ACL) information for your documents, if you have an ACL and want to use it for access control. The ACL specifies which documents that users and groups can access. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see [User context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources).

   1. **Authentication**—Choose **Basic authentication** or **OAuth 2.0 authentication**. Then choose an existing AWS Secrets Manager secret or create a new secret to store your Adobe Experience Manager credentials. If you choose to create a new secret, an AWS Secrets Manager secret window opens.

      If you chose **Basic authentication**, enter a name for the secret, the Adobe Experience Manager site user name and password. The user must have admin permission or be an admin user.

      If you chose **OAuth 2.0 authentication** and you use AEM On-Premise, enter a name for the secret, client ID, client secret, and private key. If you use AEM as a Cloud Service, enter a name for the secret, client ID, client secret, private key, organization ID, technical account ID, and Adobe Identity Management System (IMS) host.

      Save an add your secret.

   1. **Virtual Private Cloud (VPC)**—You can choose to use a VPC. If so, you must add **Subnets** and **VPC security groups**.

   1. **Identity crawler**—Specify whether to turn on Amazon Kendra’s identity crawler. The identity crawler uses the access control list (ACL) information for your documents to filter search results based on the user or their group access to documents. If you have an ACL for your documents and choose to use your ACL, you can then also choose to turn on Amazon Kendra’s identity crawler to configure [user context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources) of search results. Otherwise, if identity crawler is turned off, all documents can be publicly searched. If you want to use access control for your documents and identity crawler is turned off, you can alternatively use the [PutPrincipalMapping](https://docs.aws.amazon.com/kendra/latest/APIReference/API_PutPrincipalMapping.html) API to upload user and group access information for user context filtering.

   1. **IAM role**—Choose an existing IAM role or create a new IAM role to access your repository credentials and index content.
**Note**  
IAM roles used for indexes cannot be used for data sources. If you are unsure if an existing role is used for an index or FAQ, choose **Create a new role** to avoid errors.

   1. Choose **Next**.

1. On the **Configure sync settings** page, enter the following information:

   1. **Sync scope**—Set limits for crawling certain content types, page components, and roots paths, and filter content using regex expression patterns.

      1. **Content types**—Choose whether to crawl only pages or assets, or both.

      1. (Optional) **Additional configuration**—Configure the following settings:
         + **Page components**—The specific names of page components. The Page Component is an extensible page component designed to work with the Adobe Experience Manager template editor and allows page header/footer and structure components to be assembled with the template editor.
         + **Content fragment variations**—The specific names of content fragment variations. Content Fragments allow you to design, create, curate and publish page-independent content in Adobe Experience Manager. They allow you to prepare content ready for use in multiple locations/over multiple channels.
         + **Root paths**—The root paths to specific content.
         + **Regex patterns**—The regular expression patterns to include or exclude certain pages and assets.

   1. **Sync mode**—Choose how you want to update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option.
      + Full sync: Freshly index all content, replacing existing content each time your data source syncs with your index.
      + New, modified sync: Index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.
      + New, modified, deleted sync: Index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

   1. **Time zone ID**—If you use AEM On-Premise and the time zone of your server is different than the time zone of the Amazon Kendra AEM connector or index, you can specify the server time zone to align with the AEM connector or index. The default time zone for AEM On-Premise is the time zone of the Amazon Kendra AEM connector or index. The default time zone for AEM as a Cloud Service is Greenwich Mean Time.

   1. **Sync run schedule**, for **Frequency**—Choose how often to sync your data source content and update your index.

   1. Choose **Next**.

1. On the **Set field mappings** page, enter the following information:

   1. Select from the Amazon Kendra generated default data source fields you want to map to your index. To add custom data source fields, create an index field name to map to and the field data type.

   1. Choose **Next**.

1. On the **Review and create** page, check that the information you have entered is correct and then select **Add data source**. You can also choose to edit your information from this page. Your data source will appear on the **Data sources** page after the data source has been added successfully.

------
#### [ API ]

**To connect Amazon Kendra to Adobe Experience Manager**

You must specify a JSON of the [data source schema](https://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html#ds-aem-schema) using the [https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) API. You must provide the following information:
+ **Data source**—Specify the data source type as `AEM` when you use the [https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) JSON schema. Also specify the data source as `TEMPLATE` when you call the [https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html](https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html) API.
+ **AEM host URL**—Specify the Adobe Experience Manager host URL. For example, if you use AEM On-Premise, you include the hostname and port: *https://hostname:port*. Or, if you use AEM as a Cloud Service, you can use the author URL: *https://author-xxxxxx-xxxxxxx.adobeaemcloud.com*.
+ **Sync mode**—Specify how Amazon Kendra should update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option. You can choose between:
  + `FORCED_FULL_CRAWL` to freshly index all content, replacing existing content each time your data source syncs with your index.
  + `FULL_CRAWL` to index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source’s mechanism for tracking content changes and index content that changed since the last sync.
  + `CHANGE_LOG` to index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source’s mechanism for tracking content changes and index content that changed since the last sync.
+ **Authentication type**—Specify which type of authentication you want to use, either `Basic` or `OAuth2`.
+ **AEM type**—Specify which type of Adobe Experience Manager you use, either `CLOUD` or `ON_PREMISE`.
+ **Secret Amazon Resource Name (ARN)**—If you want to use basic authentication for either AEM On-Premise or Cloud, you provide a secret that stores your authentication credentials of your user name and password. You provide the Amazon Resource Name (ARN) of an AWS Secrets Manager secret. The secret is stored in a JSON structure with the following keys:

  ```
  {
      "aemUrl": "Adobe Experience Manager On-Premise host URL",
      "username": "user name with admin permissions",
      "password": "password with admin permissions"
  }
  ```

  If you want to use OAuth 2.0 authentication for AEM On-Premise, the secret is stored in a JSON structure with the following keys:

  ```
  {
      "aemUrl": "Adobe Experience Manager host URL",
      "clientId": "client ID",
      "clientSecret": "client secret",
      "privateKey": "private key"
  }
  ```

  If you want to use OAuth 2.0 authentication for AEM as a Cloud Service, the secret is stored in a JSON structure with the following keys:

  ```
  {
      "clientId": "client ID",
      "clientSecret": "client secret",
      "privateKey": "private key",
      "orgId": "organization ID",
      "technicalAccountId": "technical account ID",
      "imsHost": "Adobe Identity Management System (IMS) host"
  }
  ```
+ **IAM role**—Specify `RoleArn` when you call `CreateDataSource` to provide an IAM role with permissions to access your Secrets Manager secret and to call the required public APIs for the Adobe Experience Manager connector and Amazon Kendra. For more information, see [IAM roles for Adobe Experience Manager data sources](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds).

You can also add the following optional features:
+  **Virtual Private Cloud (VPC)**—Specify `VpcConfiguration` when you call `CreateDataSource`. For more information, see [Configuring Amazon Kendra to use an Amazon VPC](vpc-configuration.md).
+ **Time zone ID**—If you use AEM On-Premise and the time zone of your server is different than the time zone of the Amazon Kendra AEM connector or index, you can specify the server time zone to align with the AEM connector or index.

  The default time zone for AEM On-Premise is the time zone of the Amazon Kendra AEM connector or index. The default time zone for AEM as a Cloud Service is Greenwich Mean Time.

  For information about the supported time zones IDs, see [Adobe Experience Manager JSON schema](https://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html#aem-json).
+ **Inclusion and exclusion filters**—Specify whether to include or exclude certain pages and assets.
**Note**  
Most data sources use regular expression patterns, which are inclusion or exclusion patterns referred to as filters. If you specify an inclusion filter, only content that matches the inclusion filter is indexed. Any document that doesn’t match the inclusion filter isn’t indexed. If you specify an inclusion and exclusion filter, documents that match the exclusion filter are not indexed, even if they match the inclusion filter.
+ **Identity crawler**—Specify whether to turn on Amazon Kendra’s identity crawler. The identity crawler uses the access control list (ACL) information for your documents to filter search results based on the user or their group access to documents. If you have an ACL for your documents and choose to use your ACL, you can then also choose to turn on Amazon Kendra’s identity crawler to configure [user context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources) of search results. Otherwise, if identity crawler is turned off, all documents can be publicly searched. If you want to use access control for your documents and identity crawler is turned off, you can alternatively use the [PutPrincipalMapping](https://docs.aws.amazon.com/kendra/latest/APIReference/API_PutPrincipalMapping.html) API to upload user and group access information for user context filtering.
+  **Field mappings**—Choose to map your Adobe Experience Manager data source fields to your Amazon Kendra index fields. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html).
**Note**  
The document body field or the document body equivalent for your documents is required in order for Amazon Kendra to search your documents. You must map your document body field name in your data source to the index field name `_document_body`. All other fields are optional.

For a list of other important JSON keys to configure, see [Adobe Experience Manager template schema](https://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html#ds-aem-schema).

------

# Alfresco
<a name="data-source-alfresco"></a>

**Note**  
Alfresco connector remains fully supported for existing customers through May 31, 2026. While this connector is no longer available for new users, current users can continue to use it without interruption. We are continuously evolving our connector portfolio to offer more scalable and customizable solutions. For future integrations, we recommend exploring the Amazon Kendra Custom Connector Framework[1], designed to support a broader range of enterprise use cases with enhanced flexibility.

Alfresco is a content management service that helps customers store and manage their content. You can use Amazon Kendra to index your Alfresco Document library, Wiki, and Blog.

Amazon Kendra supports Alfresco On-Premises and Alfresco Cloud (Platform as a Service).

You can connect Amazon Kendra to your Alfresco data source using the [Amazon Kendra console](https://console.aws.amazon.com/kendra/) or the [https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) API.

For troubleshooting your Amazon Kendra Alfresco data source connector, see [Troubleshooting data sources](troubleshooting-data-sources.md).

**Topics**
+ [Supported features](#supported-features-alfresco)
+ [Prerequisites](#prerequisites-alfresco)
+ [Connection instructions](#data-source-procedure-alfresco)
+ [Learn more](#alfresco-learn-more)

## Supported features
<a name="supported-features-alfresco"></a>

Amazon Kendra Alfresco data source connector supports the following features:
+ Field mappings
+ User access control
+ Inclusion/exclusion filters
+ Full and incremental content syncs
+ OAuth 2.0 and basic authentication
+ Virtual private cloud (VPC)

## Prerequisites
<a name="prerequisites-alfresco"></a>

Before you can use Amazon Kendra to index your Alfresco data source, make these changes in your Alfresco and AWS accounts.

**In Alfresco, make sure you have:**
+ Copied your Alfresco repository URL and web application URL. If you only want to index a specific Alfresco site, then also copy the site ID.
+ Noted your Alfresco authentication credentials, which include a user name and password with at least read permissions. If you want to use OAuth 2.0 authentication, you should add the user to the Alfresco administrators group.
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).
+ **Optional**: Configured OAuth 2.0 credentials in Alfresco. The credentials include client ID, client secret, and token URL. For more information on how to configure clients for Alfresco On-Premises, see [Alfresco documentation](https://docs.alfresco.com/identity-service/latest/tutorial/sso/saml/). If you use Alfresco Cloud (PaaS), you must contact [Hyland support](https://community.hyland.com/) for Alfresco OAuth 2.0 authentication.
+ Checked each document is unique in Alfresco and across other data sources you plan to use for the same index. Each data source that you want to use for an index must not contain the same document across the data sources. Document IDs are global to an index and must be unique per index.

**In your AWS account, make sure you have:**
+ [Created an Amazon Kendra index](https://docs.aws.amazon.com/kendra/latest/dg/create-index.html) and, if using the API, noted the index ID.
+ [Created an IAM role](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds) for your data source and, if using the API, noted the ARN of the IAM role.
**Note**  
If you change your authentication type and credentials, you must update your IAM role to access the correct AWS Secrets Manager secret ID.
+ Stored your Alfresco authentication credentials in an AWS Secrets Manager secret and, if using the API, noted the ARN of the secret.
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).

If you don’t have an existing IAM role or secret, you can use the console to create a new IAM role and Secrets Manager secret when you connect your Alfresco data source to Amazon Kendra. If you are using the API, you must provide the ARN of an existing IAM role and Secrets Manager secret, and an index ID.

## Connection instructions
<a name="data-source-procedure-alfresco"></a>

To connect Amazon Kendra to your Alfresco data source, you must provide the necessary details of your Alfresco data source so that Amazon Kendra can access your data. If you have not yet configured Alfresco for Amazon Kendra, see [Prerequisites](#prerequisites-alfresco).

------
#### [ Console ]

**To connect Amazon Kendra to Alfresco**

1. Sign in to the AWS Management Console and open the [Amazon Kendra console](https://console.aws.amazon.com/kendra/).

1. From the left navigation pane, choose **Indexes** and then choose the index you want to use from the list of indexes.
**Note**  
You can choose to configure or edit your **User access control** settings under **Index settings**. 

1. On the **Getting started** page, choose **Add data source**.

1. On the **Add data source** page, choose **Alfresco connector**, and then choose **Add connector**. If using version 2 (if applicable), choose **Alfresco connector** with the "V2.0" tag.

1. On the **Specify data source details** page, enter the following information:

   1. In **Name and description**, for **Data source name**—Enter a name for your data source. You can include hyphens but not spaces.

   1. (Optional)** Description**—Enter an optional description for your data source.

   1. In **Default language**—Choose a language to filter your documents for the index. Unless you specify otherwise, the language defaults to English. Language specified in the document metadata overrides the selected language.

   1. In **Tags**, for **Add new tag**—Include optional tags to search and filter your resources or track your AWS costs.

   1. Choose **Next**.

1. On the **Define access and security** page, enter the following information:

   1. **Alfresco type**—Choose whether you use Alfresco On-Premises/server or Alfresco Cloud (Platform as a Service).

   1. **Alfresco repository URL**—Enter your Alfresco repository URL. For example, if you use Alfresco Cloud (PaaS), the repository URL could be *https://company.alfrescocloud.com*. Or, if you use Alfresco On-Premises, the repository URL could be *https://company-alfresco-instance.company-domain.suffix:port*.

   1. **Alfresco user application. URL**—Enter your Alfresco user interface URL. You can get the repository URL from your Alfresco administrator. For example, the user interface URL could be *https://example.com*.

   1. **SSL certificate location**—Enter the path to the SSL certificate stored in an Amazon S3 bucket. You use this to connect to Alfresco On-Premises with a secure SSL connection.

   1. **Authorization**—Turn on or off access control list (ACL) information for your documents, if you have an ACL and want to use it for access control. The ACL specifies which documents that users and groups can access. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see [User context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources).

   1. **Authentication**—Choose **Basic authentication** or **OAuth 2.0 authentication**. Then choose an existing Secrets Manager secret or create a new secret to store your Alfresco credentials. If you choose to create a new secret, an AWS Secrets Manager secret window opens.

      If you chose **Basic authentication**, enter a name for the secret, the Alfresco user name, and password.

      If you chose **OAuth 2.0 authentication**, enter a name for the secret, client ID, client secret, and token URL.

   1. **Virtual Private Cloud (VPC)**—You can choose to use a VPC. If so, you must add **Subnets** and **VPC security groups**.

   1. **Identity crawler**—Specify whether to turn on Amazon Kendra’s identity crawler. The identity crawler uses the access control list (ACL) information for your documents to filter search results based on the user or their group access to documents. If you have an ACL for your documents and choose to use your ACL, you can then also choose to turn on Amazon Kendra’s identity crawler to configure [user context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources) of search results. Otherwise, if identity crawler is turned off, all documents can be publicly searched. If you want to use access control for your documents and identity crawler is turned off, you can alternatively use the [PutPrincipalMapping](https://docs.aws.amazon.com/kendra/latest/APIReference/API_PutPrincipalMapping.html) API to upload user and group access information for user context filtering.

   1. **IAM role**—Choose an existing IAM role or create a new IAM role to access your repository credentials and index content.
**Note**  
IAM roles used for indexes cannot be used for data sources. If you are unsure if an existing role is used for an index or FAQ, choose **Create a new role** to avoid errors.

   1. Choose **Next**.

1. On the **Configure sync settings** page, enter the following information:

   1. **Sync scope**—Set limits for crawling certain content and filter content using regex expression patterns.

   1. 

      1. **Content**—Choose whether to crawl content marked with 'Aspects' in Alfresco, content within a specific Alfresco site, or content across all your Alfresco sites.

      1. (Optional)**Additional configuration**—Set the following settings:
         + **Include comments**—Choose to include comments in Alfresco Document library and Blog.
         + **Regex patterns**—Regular expression patterns to include or exclude certain files.

   1. **Sync mode**—Choose how you want to update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option.
      + Full sync: Freshly index all content, replacing existing content each time your data source syncs with your index.
      + New, modified, deleted sync: Index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

   1. In **Sync run schedule**, for **Frequency**—Choose how often to sync your data source content and update your index.

   1. Choose **Next**.

1. On the **Set field mappings** page, enter the following information:

   1. Select from the Amazon Kendra generated default data source fields that you want to map to your index.

   1. To add custom data source fields, create an index field name to map to and the field data type.

   1. Choose **Next**.

1. On the **Review and create** page, check that the information you have entered is correct and then select **Add data source**. You can also choose to edit your information from this page. Your data source will appear on the **Data sources** page after the data source has been added successfully.

------
#### [ API ]

**To connect Amazon Kendra to Alfresco**

You must specify a JSON of the [data source schema](https://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html#ds-alfresco-schema) using the [https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) API. You must provide the following information:
+ **Data source**—Specify the data source type as `ALFRESCO` when you use the [https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) JSON schema. Also specify the data source as `TEMPLATE` when you call the [https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html](https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html) API.
+ **Alfresco site ID**—Specify the Alfresco site ID.
+ **Alfresco repository URL**—Specify the Alfresco repository URL. You can get the repository URL from your Alfresco administrator. For example, if you use Alfresco Cloud (PaaS), the repository URL could be *https://company.alfrescocloud.com*. Or, if you use Alfresco On-Premises, the repository URL could be *https://company-alfresco-instance.company-domain.suffix:port*.
+ **Alfresco web application URL**—Specify the Alfresco user interface URL. You can get the repository URL from your Alfresco administrator. For example, the user interface URL could be *https://example.com*.
+ **Authentication type**—Specify which type of authentication you want to use, whether `OAuth2` or `Basic`.
+ **Alfresco type**—Specify which type of Alfresco you use, whether `PAAS` (Cloud/Platform as a Service) or `ON_PREM` (On-Premises).
+ **Secret Amazon Resource Name (ARN)**—If you want to use basic authentication, you provide a secret that stores your authentication credentials of your user name and password. You provide the Amazon Resource Name (ARN) of an AWS Secrets Manager secret. The secret is stored in a JSON structure with the following keys:

  ```
  {
      "username": "user name",
      "password": "password"
  }
  ```

  If you want to use OAuth 2.0 authentication, the secret is stored in a JSON structure with the following keys:

  ```
  {
      "clientId": "client ID",
      "clientSecret": "client secret",
      "tokenUrl": "token URL"
  }
  ```
+ **IAM role**—Specify `RoleArn` when you call `CreateDataSource` to provide an IAM role with permissions to access your Secrets Manager secret and to call the required public APIs for the Alfresco connector and Amazon Kendra. For more information, see [IAM roles for Alfresco data sources](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds).

You can also add the following optional features:
+  **Virtual Private Cloud (VPC)**—Specify `VpcConfiguration` when you call `CreateDataSource`. For more information, see [Configuring Amazon Kendra to use an Amazon VPC](vpc-configuration.md).
+ **Content type**—The type of content that you want to crawl, whether content marked with 'Aspects' in Alfresco, content within a specific Alfresco site, or content across all your Alfresco sites. You can also list specific 'Aspects' content.
+ **Inclusion and exclusion filters**—Specify whether to include or exclude certain files.
**Note**  
Most data sources use regular expression patterns, which are inclusion or exclusion patterns referred to as filters. If you specify an inclusion filter, only content that matches the inclusion filter is indexed. Any document that doesn’t match the inclusion filter isn’t indexed. If you specify an inclusion and exclusion filter, documents that match the exclusion filter are not indexed, even if they match the inclusion filter.
+ **Sync mode**—Specify how Amazon Kendra should update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option. You can choose between:
  + `FORCED_FULL_CRAWL` to freshly index all content, replacing existing content each time your data source syncs with your index.
  + `FULL_CRAWL` to index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source’s mechanism for tracking content changes and index content that changed since the last sync.
+ **Identity crawler**—Specify whether to turn on Amazon Kendra’s identity crawler. The identity crawler uses the access control list (ACL) information for your documents to filter search results based on the user or their group access to documents. If you have an ACL for your documents and choose to use your ACL, you can then also choose to turn on Amazon Kendra’s identity crawler to configure [user context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources) of search results. Otherwise, if identity crawler is turned off, all documents can be publicly searched. If you want to use access control for your documents and identity crawler is turned off, you can alternatively use the [PutPrincipalMapping](https://docs.aws.amazon.com/kendra/latest/APIReference/API_PutPrincipalMapping.html) API to upload user and group access information for user context filtering.
+  **Field mappings**—Choose to map your Alfresco data source fields to your Amazon Kendra index fields. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html).
**Note**  
The document body field or the document body equivalent for your documents is required in order for Amazon Kendra to search your documents. You must map your document body field name in your data source to the index field name `_document_body`. All other fields are optional.

For a list of other important JSON keys to configure, see [Alfresco template schema](https://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html#ds-alfresco-schema).

------

## Learn more
<a name="alfresco-learn-more"></a>

To learn more about integrating Amazon Kendra with your Alfresco data source, see:
+ [Intelligently search Alfresco content using Amazon Kendra](https://aws.amazon.com/blogs/machine-learning/intelligently-search-alfresco-content-using-amazon-kendra/)

# Aurora (MySQL)
<a name="data-source-aurora-mysql"></a>

**Note**  
Aurora (MySQL) connector remains fully supported for existing customers through May 31, 2026. While this connector is no longer available for new users, current users can continue to use it without interruption. We are continuously evolving our connector portfolio to offer more scalable and customizable solutions. For future integrations, we recommend exploring the Amazon Kendra Custom Connector Framework[1], designed to support a broader range of enterprise use cases with enhanced flexibility.

Aurora is a relational database management system (RDBMS) built for the cloud. If you are a Aurora user, you can use Amazon Kendra to index your Aurora (MySQL) data source. The Amazon Kendra Aurora (MySQL) data source connector supports Aurora MySQL 3 and Aurora Serverless MySQL 8.0.

You can connect Amazon Kendra to your Aurora (MySQL) data source using the [Amazon Kendra console](https://console.aws.amazon.com/kendra/) and the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) API.

For troubleshooting your Amazon Kendra Aurora (MySQL) data source connector, see [Troubleshooting data sources](troubleshooting-data-sources.md).

**Topics**
+ [Supported features](#supported-features-aurora-mysql)
+ [Prerequisites](#prerequisites-aurora-mysql)
+ [Connection instructions](#data-source-procedure-aurora-mysql)
+ [Notes](#aurora-mysql-notes)

## Supported features
<a name="supported-features-aurora-mysql"></a>
+ Field mappings
+ User context filtering
+ Inclusion/exclusion filters
+ Full and incremental content syncs
+ Virtual private cloud (VPC)

## Prerequisites
<a name="prerequisites-aurora-mysql"></a>

Before you can use Amazon Kendra to index your Aurora (MySQL) data source, make these changes in your Aurora (MySQL) and AWS accounts.

**In Aurora (MySQL), make sure you have:**
+ Noted your database user name and password.
**Important**  
As a best practice, provide Amazon Kendra with read-only database credentials.
+ Copied your database host url, port, and instance. You can find this information on the Amazon RDS console.
+ Checked each document is unique in Aurora (MySQL) and across other data sources you plan to use for the same index. Each data source that you want to use for an index must not contain the same document across the data sources. Document IDs are global to an index and must be unique per index.

**In your AWS account, make sure you have:**
+ [Created an Amazon Kendra index](https://docs.aws.amazon.com/kendra/latest/dg/create-index.html) and, if using the API, noted the index ID.
+ [Created an IAM role](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds) for your data source and, if using the API, noted the ARN of the IAM role.
**Note**  
If you change your authentication type and credentials, you must update your IAM role to access the correct AWS Secrets Manager secret ID.
+ Stored your Aurora (MySQL) authentication credentials in an AWS Secrets Manager secret and, if using the API, noted the ARN of the secret.
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).

If you don’t have an existing IAM role or secret, you can use the console to create a new IAM role and Secrets Manager secret when you connect your Aurora (MySQL) data source to Amazon Kendra. If you are using the API, you must provide the ARN of an existing IAM role and Secrets Manager secret, and an index ID.

## Connection instructions
<a name="data-source-procedure-aurora-mysql"></a>

To connect Amazon Kendra to your Aurora (MySQL) data source you must provide details of your Aurora (MySQL) credentials so that Amazon Kendra can access your data. If you have not yet configured Aurora (MySQL) for Amazon Kendra see [Prerequisites](#prerequisites-aurora-mysql).

------
#### [ Console ]

**To connect Amazon Kendra to Aurora (MySQL)** 

1. Sign in to the AWS Management Console and open the [Amazon Kendra console](https://console.aws.amazon.com/kendra/).

1. From the left navigation pane, choose **Indexes** and then choose the index you want to use from the list of indexes.
**Note**  
You can choose to configure or edit your **User access control** settings under **Index settings**. 

1. On the **Getting started** page, choose **Add data source**.

1. On the **Add data source** page, choose **Aurora (MySQL) connector**, and then choose **Add connector**. If using version 2 (if applicable), choose **Aurora (MySQL) connector** with the "V2.0" tag.

1. On the **Specify data source details** page, enter the following information:

   1. In **Name and description**, for **Data source name**—Enter a name for your data source. You can include hyphens but not spaces.

   1. (Optional)** Description**—Enter an optional description for your data source.

   1. In **Default language**—Choose a language to filter your documents for the index. Unless you specify otherwise, the language defaults to English. Language specified in the document metadata overrides the selected language.

   1. In **Tags**, for **Add new tag**—Include optional tags to search and filter your resources or track your AWS costs.

   1. Choose **Next**.

1. On the **Define access and security** page, enter the following information:

   1. In **Source**, enter the following information:

   1.  **Host** – Enter the database host URL, for example: `http://instance URL.region.rds.amazonaws.com`.

   1.  **Port** – Enter the database port, for example, `5432`.

   1.  **Instance**— Enter the database instance.

   1. In **Authentication**—enter the following information:

      1. **AWS Secrets Manager secret**—Choose an existing secret or create a new Secrets Manager secret to store your Aurora (MySQL) authentication credentials. If you choose to create a new secret an AWS Secrets Manager secret window opens.

        1. Enter following information in the **Create an AWS Secrets Manager secret window**:

           1. **Secret name**—A name for your secret. The prefix ‘AmazonKendra-Aurora (MySQL)-’ is automatically added to your secret name.

           1. For **Database user name**, and **Password**—Enter the authentication credential values you copied from your database. 

        1. Choose **Save**.

   1. **Virtual Private Cloud (VPC)**—You can choose to use a VPC. If so, you must add **Subnets** and **VPC security groups**.

   1. **IAM role**—Choose an existing IAM role or create a new IAM role to access your repository credentials and index content.
**Note**  
IAM roles used for indexes cannot be used for data sources. If you are unsure if an existing role is used for an index or FAQ, choose **Create a new role** to avoid errors.

   1. Choose **Next**.

1. On the **Configure sync settings** page, enter the following information:

   1. In **Sync scope**, choose from the following options :
      + **SQL query**—Enter SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB SQL queries must be less than 32KB and not contain any semi-colons (;). Amazon Kendra will crawl all database content that matches your query.
      + **Primary key column**—Provide the primary key for the database table. This identifies a table within your database.
      + **Title column**—Provide the name of the document title column within your database table.
      + **Body column**—Provide the name of the document body column within your database table.

   1. In **Additional configuration – *optional***, choose from the following options to sync specific content instead of syncing all files:
      + **Change-detecting columns**—Enter the names of the columns that Amazon Kendra will use to detect content changes. Amazon Kendra will re-index content when there is a change in any of these columns.
      + **Users' IDs column**—Enter the name of the column which contains User IDs to be allowed access to content.
      + **Groups column**—Enter the name of the column that contains groups to be allowed access to content.
      + **Source URLs column**—Enter the name of the column which contains Source URLs to be indexed.
      + **Time stamps column**—Enter the name of the column which contains time stamps. Amazon Kendra uses time stamp information to detect changes in your content and sync only changed content. 
      + **Time zones column**—Enter the name of the column which contains time zones for the content to be crawled.
      + **Time stamps format**—Enter the name of the column which contains time stamp formats to use to detect content changes and re-sync your content.

   1. **Sync mode**—Choose how you want to update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option.
      + Full sync: Freshly index all content, replacing existing content each time your data source syncs with your index.
      + New, modified sync: Index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.
      + New, modified, deleted sync: Index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

   1. In **Sync run schedule**, for **Frequency**—How often Amazon Kendra will sync with your data source.

   1. Choose **Next**.

1. On the **Set field mappings** page, enter the following information:

   1. Select from the generated default data source fields—**Document IDs**, **Document titles**, and **Source URLs**—you want to map to Amazon Kendra index.

   1.  **Add field**—To add custom data source fields to create an index field name to map to and the field data type.

   1. Choose **Next**.

1. On the **Review and create** page, check that the information you have entered is correct and then select **Add data source**. You can also choose to edit your information from this page. Your data source will appear on the **Data sources** page after the data source has been added successfully.

------
#### [ API ]

**To connect Amazon Kendra to Aurora (MySQL)**

You must specify the following using the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) API:
+ **Data source**—Specify the data source type as `JDBC` when you use the [https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) JSON schema. Also specify the data source as `TEMPLATE` when you call the [https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html](https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html) API.
+ **Database type**—You must specify the database type as `mySql`.
+ **SQL query**—Specify SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB. Amazon Kendra will crawl all database content that matches your query.
+ **Sync mode**—Specify how Amazon Kendra should update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option. You can choose between:
  + `FORCED_FULL_CRAWL` to freshly index all content, replacing existing content each time your data source syncs with your index.
  + `FULL_CRAWL` to index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source’s mechanism for tracking content changes and index content that changed since the last sync.
  + `CHANGE_LOG` to index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source’s mechanism for tracking content changes and index content that changed since the last sync.
+ **Secret Amazon Resource Name (ARN)**—Provide the Amazon Resource Name (ARN) of an Secrets Manager secret that contains the authentication credentials you created in your Aurora (MySQL) account. The secret is stored in a JSON structure with the following keys:

  ```
  {
      "user name": "database user name",
      "password": "password"
  }
  ```
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).
+ **IAM role**—Specify `RoleArn` when you call `CreateDataSource` to provide an IAM role with permissions to access your Secrets Manager secret and to call the required public APIs for the Aurora (MySQL) connector and Amazon Kendra. For more information, see [IAM roles for Aurora (MySQL) data sources](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds).

You can also add the following optional features:
+  **Virtual Private Cloud (VPC)**—Specify `VpcConfiguration` when you call `CreateDataSource`. For more information, see [Configuring Amazon Kendra to use an Amazon VPC](vpc-configuration.md).
+  **Inclusion and exclusion filters**—You can specify whether to include specific content using user IDs, groups, source URLs, time stamps, and time zones. 
+  **User context filtering and access control**—Amazon Kendra crawls the access control list (ACL) for your documents, if you have an ACL for your documents. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see [User context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#datasource-context-filter).
+  **Field mappings**—Choose to map your Aurora (MySQL) data source fields to your Amazon Kendra index fields. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html).
**Note**  
The document body field or the document body equivalent for your documents is required in order for Amazon Kendra to search your documents. You must map your document body field name in your data source to the index field name `_document_body`. All other fields are optional.

For a list of other important JSON keys to configure, see [Aurora (MySQL) template schema](ds-schemas.md#ds-aurora-mysql-schema).

------

## Notes
<a name="aurora-mysql-notes"></a>
+ Deleted database rows will not be tracked in when Amazon Kendra checks for updated content.
+ The size of field names and values in a row of your database can't exceed 400KB.
+ If you have a large amount of data in your database data source, and do not want Amazon Kendra to index all your database content after the first sync, you can choose to sync only new, modified, or deleted documents.
+ As a best practice, provide Amazon Kendra with read-only database credentials.
+ As a best practice, avoid adding tables with sensitive data or personal identifiable information (PII).

# Aurora (PostgreSQL)
<a name="data-source-aurora-postgresql"></a>

**Note**  
Aurora (PostgreSQL) connector remains fully supported for existing customers through May 31, 2026. While this connector is no longer available for new users, current users can continue to use it without interruption. We are continuously evolving our connector portfolio to offer more scalable and customizable solutions. For future integrations, we recommend exploring the Amazon Kendra Custom Connector Framework[1], designed to support a broader range of enterprise use cases with enhanced flexibility.

Aurora is a relational database management system (RDBMS) built for the cloud. If you are a Aurora user, you can use Amazon Kendra to index your Aurora (PostgreSQL) data source. The Amazon Kendra Aurora (PostgreSQL) data source connector supports Aurora PostgreSQL 1.

You can connect Amazon Kendra to your Aurora (PostgreSQL) data source using the [Amazon Kendra console](https://console.aws.amazon.com/kendra/) and the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) API.

For troubleshooting your Amazon Kendra Aurora (PostgreSQL) data source connector, see [Troubleshooting data sources](troubleshooting-data-sources.md).

**Topics**
+ [Supported features](#supported-features-aurora-postgresql)
+ [Prerequisites](#prerequisites-aurora-postgresql)
+ [Connection instructions](#data-source-procedure-aurora-postgresql)
+ [Notes](#aurora-postgresql-notes)

## Supported features
<a name="supported-features-aurora-postgresql"></a>
+ Field mappings
+ User context filtering
+ Inclusion/exclusion filters
+ Full and incremental content syncs
+ Virtual private cloud (VPC)

## Prerequisites
<a name="prerequisites-aurora-postgresql"></a>

Before you can use Amazon Kendra to index your Aurora (PostgreSQL) data source, make these changes in your Aurora (PostgreSQL) and AWS accounts.

**In Aurora (PostgreSQL), make sure you have:**
+ Noted your database user name and password.
**Important**  
As a best practice, provide Amazon Kendra with read-only database credentials.
+ Copied your database host url, port, and instance.
+ Checked each document is unique in Aurora (PostgreSQL) and across other data sources you plan to use for the same index. Each data source that you want to use for an index must not contain the same document across the data sources. Document IDs are global to an index and must be unique per index.

**In your AWS account, make sure you have:**
+ [Created an Amazon Kendra index](https://docs.aws.amazon.com/kendra/latest/dg/create-index.html) and, if using the API, noted the index ID.
+ [Created an IAM role](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds) for your data source and, if using the API, noted the ARN of the IAM role.
**Note**  
If you change your authentication type and credentials, you must update your IAM role to access the correct AWS Secrets Manager secret ID.
+ Stored your Aurora (PostgreSQL) authentication credentials in an AWS Secrets Manager secret and, if using the API, noted the ARN of the secret.
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).

If you don’t have an existing IAM role or secret, you can use the console to create a new IAM role and Secrets Manager secret when you connect your Aurora (PostgreSQL) data source to Amazon Kendra. If you are using the API, you must provide the ARN of an existing IAM role and Secrets Manager secret, and an index ID.

## Connection instructions
<a name="data-source-procedure-aurora-postgresql"></a>

To connect Amazon Kendra to your Aurora (PostgreSQL) data source you must provide details of your Aurora (PostgreSQL) credentials so that Amazon Kendra can access your data. If you have not yet configured Aurora (PostgreSQL) for Amazon Kendra see [Prerequisites](#prerequisites-aurora-postgresql).

------
#### [ Console ]

**To connect Amazon Kendra to Aurora (PostgreSQL)** 

1. Sign in to the AWS Management Console and open the [Amazon Kendra console](https://console.aws.amazon.com/kendra/).

1. From the left navigation pane, choose **Indexes** and then choose the index you want to use from the list of indexes.
**Note**  
You can choose to configure or edit your **User access control** settings under **Index settings**. 

1. On the **Getting started** page, choose **Add data source**.

1. On the **Add data source** page, choose **Aurora (PostgreSQL) connector**, and then choose **Add connector**. If using version 2 (if applicable), choose **Aurora (PostgreSQL) connector** with the "V2.0" tag.

1. On the **Specify data source details** page, enter the following information:

   1. In **Name and description**, for **Data source name**—Enter a name for your data source. You can include hyphens but not spaces.

   1. (Optional)** Description**—Enter an optional description for your data source.

   1. In **Default language**—Choose a language to filter your documents for the index. Unless you specify otherwise, the language defaults to English. Language specified in the document metadata overrides the selected language.

   1. In **Tags**, for **Add new tag**—Include optional tags to search and filter your resources or track your AWS costs.

   1. Choose **Next**.

1. On the **Define access and security** page, enter the following information:

   1. In **Source**, enter the following information:

   1.  **Host** – Enter the database host URL, for example: `http://instance URL.region.rds.amazonaws.com`.

   1.  **Port** – Enter the database port, for example, `5432`.

   1.  **Instance** – Enter the database instance, for example `postgres`.

   1. **Enable SSL certificate location**—Choose to enter the Amazon S3 path to your SSL certificate file.

   1. In **Authentication**—enter the following information:

      1. **AWS Secrets Manager secret**—Choose an existing secret or create a new Secrets Manager secret to store your Aurora (PostgreSQL) authentication credentials. If you choose to create a new secret an AWS Secrets Manager secret window opens.

        1. Enter following information in the **Create an AWS Secrets Manager secret window**:

           1. **Secret name**—A name for your secret. The prefix ‘AmazonKendra-Aurora (PostgreSQL)-’ is automatically added to your secret name.

           1. For **Database user name**, and **Password**—Enter the authentication credential values you copied from your database. 

        1. Choose **Save**.

   1. **Virtual Private Cloud (VPC)**—You can choose to use a VPC. If so, you must add **Subnets** and **VPC security groups**.

   1. **IAM role**—Choose an existing IAM role or create a new IAM role to access your repository credentials and index content.
**Note**  
IAM roles used for indexes cannot be used for data sources. If you are unsure if an existing role is used for an index or FAQ, choose **Create a new role** to avoid errors.

   1. Choose **Next**.

1. On the **Configure sync settings** page, enter the following information:

   1. In **Sync scope**, choose from the following options :
      + **SQL query**—Enter SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB SQL queries must be less than 32KB and not contain any semi-colons (;). Amazon Kendra will crawl all database content that matches your query.
      + **Primary key column**—Provide the primary key for the database table. This identifies a table within your database.
      + **Title column**—Provide the name of the document title column within your database table.
      + **Body column**—Provide the name of the document body column within your database table.

   1. In **Additional configuration – *optional***, choose from the following options to sync specific content instead of syncing all files:
      + **Change-detecting columns**—Enter the names of the columns that Amazon Kendra will use to detect content changes. Amazon Kendra will re-index content when there is a change in any of these columns.
      + **Users' IDs column**—Enter the name of the column which contains User IDs to be allowed access to content.
      + **Groups column**—Enter the name of the column that contains groups to be allowed access to content.
      + **Source URLs column**—Enter the name of the column which contains Source URLs to be indexed.
      + **Time stamps column**—Enter the name of the column which contains time stamps. Amazon Kendra uses time stamp information to detect changes in your content and sync only changed content. 
      + **Time zones column**—Enter the name of the column which contains time zones for the content to be crawled.
      + **Time stamps format**—Enter the name of the column which contains time stamp formats to use to detect content changes and re-sync your content.

   1. **Sync mode**—Choose how you want to update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option.
      + Full sync: Freshly index all content, replacing existing content each time your data source syncs with your index.
      + New, modified sync: Index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.
      + New, modified, deleted sync: Index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

   1. In **Sync run schedule**, for **Frequency**—How often Amazon Kendra will sync with your data source.

   1. Choose **Next**.

1. On the **Set field mappings** page, enter the following information:

   1. Select from the generated default data source fields—**Document IDs**, **Document titles**, and **Source URLs**—you want to map to Amazon Kendra index.

   1.  **Add field**—To add custom data source fields to create an index field name to map to and the field data type.

   1. Choose **Next**.

1. On the **Review and create** page, check that the information you have entered is correct and then select **Add data source**. You can also choose to edit your information from this page. Your data source will appear on the **Data sources** page after the data source has been added successfully.

------
#### [ API ]

**To connect Amazon Kendra to Aurora (PostgreSQL)**

You must specify the following using the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) API:
+ **Data source**—Specify the data source type as `JDBC` when you use the [https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) JSON schema. Also specify the data source as `TEMPLATE` when you call the [https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html](https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html) API.
+ **Database type**—You must specify the database type as `postgresql`.
+ **SQL query**—Specify SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB. Amazon Kendra will crawl all database content that matches your query.
+ **Sync mode**—Specify how Amazon Kendra should update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option. You can choose between:
  + `FORCED_FULL_CRAWL` to freshly index all content, replacing existing content each time your data source syncs with your index.
  + `FULL_CRAWL` to index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source’s mechanism for tracking content changes and index content that changed since the last sync.
  + `CHANGE_LOG` to index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source’s mechanism for tracking content changes and index content that changed since the last sync.
+ **Secret Amazon Resource Name (ARN)**—Provide the Amazon Resource Name (ARN) of an Secrets Manager secret that contains the authentication credentials you created in your Aurora (PostgreSQL) account. The secret is stored in a JSON structure with the following keys:

  ```
  {
      "user name": "database user name",
      "password": "password"
  }
  ```
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).
+ **IAM role**—Specify `RoleArn` when you call `CreateDataSource` to provide an IAM role with permissions to access your Secrets Manager secret and to call the required public APIs for the Aurora (PostgreSQL) connector and Amazon Kendra. For more information, see [IAM roles for Aurora (PostgreSQL) data sources](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds).

You can also add the following optional features:
+  **Virtual Private Cloud (VPC)**—Specify `VpcConfiguration` when you call `CreateDataSource`. For more information, see [Configuring Amazon Kendra to use an Amazon VPC](vpc-configuration.md).
+  **Inclusion and exclusion filters**—You can specify whether to include specific content using user IDs, groups, source URLs, time stamps, and time zones. 
+  **User context filtering and access control**—Amazon Kendra crawls the access control list (ACL) for your documents, if you have an ACL for your documents. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see [User context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#datasource-context-filter).
+  **Field mappings**—Choose to map your Aurora (PostgreSQL) data source fields to your Amazon Kendra index fields. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html).
**Note**  
The document body field or the document body equivalent for your documents is required in order for Amazon Kendra to search your documents. You must map your document body field name in your data source to the index field name `_document_body`. All other fields are optional.

For a list of other important JSON keys to configure, see [Aurora (PostgreSQL) template schema](ds-schemas.md#ds-aurora-postgresql-schema).

------

## Notes
<a name="aurora-postgresql-notes"></a>
+ Deleted database rows will not be tracked in when Amazon Kendra checks for updated content.
+ The size of field names and values in a row of your database can't exceed 400KB.
+ If you have a large amount of data in your database data source, and do not want Amazon Kendra to index all your database content after the first sync, you can choose to sync only new, modified, or deleted documents.
+ As a best practice, provide Amazon Kendra with read-only database credentials.
+ As a best practice, avoid adding tables with sensitive data or personal identifiable information (PII).

# Amazon FSx (Windows)
<a name="data-source-fsx"></a>

Amazon FSx (Windows) is a fully managed, cloud based file server system that offers shared storage capabilities. If you're an Amazon FSx (Windows) user, you can use Amazon Kendra to index your Amazon FSx (Windows) data source.

**Note**  
Amazon Kendra now supports an upgraded Amazon FSx (Windows) connector.  
The console has been automatically upgraded for you. Any new connectors you create on the console will use the upgraded architecture. If you use the API, you must now use the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html) object instead of the `FSxConfiguration` object to configure your connector.  
Connectors configured using the older console and API architecture will continue to function as configured. However, you won’t be able to edit or update them. If you want to edit or update your connector configuration, you must create a new connector.  
We recommended migrating your connector workflow to the upgraded version. Support for connectors configured using the older architecture is scheduled to end by June 2024. 

You can connect Amazon Kendra to your Amazon FSx (Windows) data source using the [Amazon Kendra console](https://console.aws.amazon.com/kendra/), or the [https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) API.

For troubleshooting your Amazon Kendra Amazon FSx (Windows) data source connector, see [Troubleshooting data sources](troubleshooting-data-sources.md).

**Topics**
+ [Supported features](#supported-features-fsx)
+ [Prerequisites](#prerequisites-fsx)
+ [Connection instructions](#data-source-procedure-fsx)
+ [Learn more](#fsx-learn-more)

## Supported features
<a name="supported-features-fsx"></a>

Amazon Kendra Amazon FSx (Windows) data source connector supports the following features:
+ Field mappings
+ User access control
+ User identity crawling
+ Inclusion and exclusion filters
+ Full and incremental content syncs
+ Virtual private cloud (VPC)

## Prerequisites
<a name="prerequisites-fsx"></a>

Before you can use Amazon Kendra to index your Amazon FSx (Windows) data source, check the details of your Amazon FSx (Windows) and AWS accounts.

**For Amazon FSx (Windows), make sure you have**:
+ Set up Amazon FSx (Windows) with read and mounting permissions.
+ Noted your file system ID. You can find your file system ID on the File Systems dashboard in the Amazon FSx (Windows) console.
+ Configured a virtual private cloud using Amazon VPC where your Amazon FSx (Windows) file system resides.
+ Noted your Amazon FSx (Windows) authentication credentials for an Active Directory user account. This includes your Active Directory user name with your DNS domain name (for example, *user@corp.example.com*) and password.
**Note**  
Use only the necessary credentials required for the connector to function. Do not use privileged credentials like domain admin.
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).
+ Checked each document is unique in Amazon FSx (Windows) and across other data sources you plan to use for the same index. Each data source that you want to use for an index must not contain the same document across the data sources. Document IDs are global to an index and must be unique per index.

**In your AWS account, make sure you have:**
+ [Created an Amazon Kendra index](https://docs.aws.amazon.com/kendra/latest/dg/create-index.html) and, if using the API, noted the index ID.
+ [Created an IAM role](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds) for your data source and, if using the API, noted the ARN of the IAM role.
**Note**  
If you change your authentication type and credentials, you must update your IAM role to access the correct AWS Secrets Manager secret ID.
+ Stored your Amazon FSx (Windows) authentication credentials in an AWS Secrets Manager secret and, if using the API, noted the ARN of the secret.
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).

If you don’t have an existing IAM role or secret, you can use the console to create a new IAM role and Secrets Manager secret when you connect your Amazon FSx (Windows) data source to Amazon Kendra. If you are using the API, you must provide the ARN of an existing IAM role and Secrets Manager secret, and an index ID.

## Connection instructions
<a name="data-source-procedure-fsx"></a>

To connect Amazon Kendra to your Amazon FSx (Windows) data source, you must provide the necessary details of your Amazon FSx (Windows) data source so that Amazon Kendra can access your data. If you have not yet configured Amazon FSx (Windows) for Amazon Kendra, see [Prerequisites](#prerequisites-fsx).

------
#### [ Console ]

**To connect Amazon Kendra to your Amazon FSx (Windows) file system** 

1. Sign in to the AWS Management Console and open the [Amazon Kendra console](https://console.aws.amazon.com/kendra/).

1. From the left navigation pane, choose **Indexes** and then choose the index you want to use from the list of indexes.
**Note**  
You can choose to configure or edit your **User access control** settings under **Index settings**. 

1. On the **Getting started** page, choose **Add data source**.

1. On the **Add data source** page, choose **Amazon FSx (Windows) connector**, and then choose **Add connector**. If using version 2 (if applicable), choose **Amazon FSx (Windows) connector** with the "V2.0" tag.

1. On the **Specify data source details** page, enter the following information:

   1. In **Name and description**, for **Data source name**—Enter a name for your data source. You can include hyphens but not spaces.

   1. (Optional)** Description**—Enter an optional description for your data source.

   1. In **Default language**—Choose a language to filter your documents for the index. Unless you specify otherwise, the language defaults to English. Language specified in the document metadata overrides the selected language.

   1. In **Tags**, for **Add new tag**—Include optional tags to search and filter your resources or track your AWS costs.

   1. Choose **Next**.

1. On the **Define access and security** page, enter the following information:

   1. **Amazon FSx (Windows) file system ID**—Select from the dropdown your existing file system ID, fetched from Amazon FSx (Windows). Or, create an [Amazon FSx (Windows) file system](https://console.aws.amazon.com/fsx/). You can find your file system ID on the File Systems dashboard in the Amazon FSx (Windows) console.

   1. **Authorization**—Turn on or off access control list (ACL) information for your documents, if you have an ACL and want to use it for access control. The ACL specifies which documents that users and groups can access. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see [User context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources).

   1. **Authentication**—Choose an existing AWS Secrets Manager secret, or create a new secret to store your file system credentials. If you choose to create a new secret, an AWS Secrets Manager secret window opens.

      Provide a secret that stores your authentication credentials of your user name and password. The user name must include your DNS domain name. For example, *user@corp.example.com*.

      Save and add your secret.

   1. **Virtual Private Cloud (VPC)**—You must select an Amazon VPC where your Amazon FSx (Windows) resides. You include the VPC subnet and security groups. See [Configuring an Amazon VPC](https://docs.aws.amazon.com/kendra/latest/dg/vpc-configuration.html).

   1. **IAM role**—Choose an existing IAM role or create a new IAM role to access your repository credentials and index content.
**Note**  
IAM roles used for indexes cannot be used for data sources. If you are unsure if an existing role is used for an index or FAQ, choose **Create a new role** to avoid errors.

   1. Choose **Next**.

1. On the **Configure sync settings** page, enter the following information:

   1. **Sync scope, Regex patterns**—Add regular expression patterns to include or exclude certain files.

   1. **Sync mode**—Choose how you want to update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option.
      + Full sync: Freshly index all content, replacing existing content each time your data source syncs with your index.
      + New, modified, deleted sync: Index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

   1. **Sync run schedule**—For **Frequency**, choose how often to sync your data source content and update your index.

   1. Choose **Next**.

1. On the **Set field mappings** page, enter the following information:

   1. Select from the Amazon Kendra generated default fields of your files that you want to map to your index. To add custom data source fields, create an index field name to map to and the field data type.

   1. Choose **Next**.

1. On the **Review and create** page, check that the information you have entered is correct and then select **Add data source**. You can also choose to edit your information from this page. Your data source will appear on the **Data sources** page after the data source has been added successfully.

------
#### [ API ]

**To connect Amazon Kendra to your Amazon FSx (Windows) file system**

You must specify a JSON of the [data source schema](https://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html#ds-fsx-schema) using the [https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) API. You must provide the following information:
+ **Data source**—Specify the data source type as `FSX` when you use the [https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) JSON schema. Also specify the data source as `TEMPLATE` when you call the [https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html](https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html) API.
+ **File system ID**—The identifier of the Amazon FSx (Windows) file system. You can find your file system ID on the File Systems dashboard in the Amazon FSx (Windows) console.
+ **File system type**—Specify the type of file system as `WINDOWS`.
+  **Virtual Private Cloud (VPC)**—Specify `VpcConfiguration` when you call `CreateDataSource`. For more information, see [Configuring Amazon Kendra to use an Amazon VPC](vpc-configuration.md).
**Note**  
You must select an Amazon VPC where your Amazon FSx (Windows) resides. You include the VPC subnet and security groups.
+ **Sync mode**—Specify how Amazon Kendra should update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option. You can choose between:
  + `FORCED_FULL_CRAWL` to freshly index all content, replacing existing content each time your data source syncs with your index.
  + `FULL_CRAWL` to index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source’s mechanism for tracking content changes and index content that changed since the last sync.
+ **Identity crawler**—Specify whether to turn on Amazon Kendra’s identity crawler. The identity crawler uses the access control list (ACL) information for your documents to filter search results based on the user or their group access to documents. If you have an ACL for your documents and choose to use your ACL, you can then also choose to turn on Amazon Kendra’s identity crawler to configure [user context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources) of search results. Otherwise, if identity crawler is turned off, all documents can be publicly searched. If you want to use access control for your documents and identity crawler is turned off, you can alternatively use the [PutPrincipalMapping](https://docs.aws.amazon.com/kendra/latest/APIReference/API_PutPrincipalMapping.html) API to upload user and group access information for user context filtering.
+ **Secret Amazon Resource Name (ARN)**—Provide the Amazon Resource Name (ARN) of an Secrets Manager secret that contains the authentication credentials for your Amazon FSx (Windows) account. The secret is stored in a JSON structure with the following keys:

  ```
  {
      "username": "user@corp.example.com",
      "password": "password"
  }
  ```
+ **IAM role**—Specify `RoleArn` when you call `CreateDataSource` to provide an IAM role with permissions to access your Secrets Manager secret and to call the required public APIs for the Amazon FSx (Windows) connector and Amazon Kendra. For more information, see [IAM roles for Amazon FSx (Windows) data sources](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds).

You can also add the following optional features:
+ **Inclusion and exclusion filters**—Specify whether to include or exclude certain files.
**Note**  
Most data sources use regular expression patterns, which are inclusion or exclusion patterns referred to as filters. If you specify an inclusion filter, only content that matches the inclusion filter is indexed. Any document that doesn’t match the inclusion filter isn’t indexed. If you specify an inclusion and exclusion filter, documents that match the exclusion filter are not indexed, even if they match the inclusion filter.
+ **Access control list (ACL)**—Specify whether to crawl ACL information for your documents, if you have an ACL and want to use it for access control. The ACL specifies which documents that users and groups can access. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see [User context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources).
**Note**  
To test user context filtering on a user, you must include the DNS domain name as part of the user name when you issue the query. You must have administrative permissions of the Active Directory domain. You can also test user context filtering on a group name.
+  **Field mappings**—Choose to map your Amazon FSx (Windows) data source fields to your Amazon Kendra index fields. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html).
**Note**  
The document body field or the document body equivalent for your documents is required in order for Amazon Kendra to search your documents. You must map your document body field name in your data source to the index field name `_document_body`. All other fields are optional.

For a list of other important JSON keys to configure, see [Amazon FSx (Windows) template schema](https://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html#ds-fsx-windows-schema).

------

## Learn more
<a name="fsx-learn-more"></a>

To learn more about integrating Amazon Kendra with your Amazon FSx (Windows) data source, see:
+ [Securely search unstructured data on Windows file systems with the Amazon Kendra connector for Amazon FSx (Windows) for Windows File Server](https://aws.amazon.com/blogs/machine-learning/securely-search-unstructured-data-on-windows-file-systems-with-amazon-kendra-connector-for-amazon-fsx-for-windows-file-server/).

# Amazon FSx (NetApp ONTAP)
<a name="data-source-fsx-ontap"></a>

Amazon FSx (NetApp ONTAP) is a fully managed, cloud based file server system that offers shared storage capabilities. If you're an Amazon FSx (NetApp ONTAP) user, you can use Amazon Kendra to index your Amazon FSx (NetApp ONTAP) data source.

You can connect Amazon Kendra to your Amazon FSx (NetApp ONTAP) data source using the [Amazon Kendra console](https://console.aws.amazon.com/kendra/), or the [https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) API.

For troubleshooting your Amazon Kendra Amazon FSx (NetApp ONTAP) data source connector, see [Troubleshooting data sources](troubleshooting-data-sources.md).

**Topics**
+ [Supported features](#supported-features-fsx-ontap)
+ [Prerequisites](#prerequisites-fsx-ontap)
+ [Connection instructions](#data-source-procedure-fsx-ontap)

## Supported features
<a name="supported-features-fsx-ontap"></a>

Amazon Kendra Amazon FSx (NetApp ONTAP) data source connector supports the following features:
+ Field mappings
+ User access control
+ Inclusion and exclusion filters
+ Full and incremental content syncs
+ Virtual private cloud (VPC)

## Prerequisites
<a name="prerequisites-fsx-ontap"></a>

Before you can use Amazon Kendra to index your Amazon FSx (NetApp ONTAP) data source, check the details of your Amazon FSx (NetApp ONTAP) and AWS accounts.

**For Amazon FSx (NetApp ONTAP), make sure you have**:
+ Set up Amazon FSx (NetApp ONTAP) with read and mounting permissions.
+ Noted your file system ID. You can find your file system ID on the File Systems dashboard in the Amazon FSx (NetApp ONTAP) console.
+ Noted the storage virtual machine (SVM) ID used with your file system. You can find your SVM ID by going to the File Systems dashboard in the Amazon FSx (NetApp ONTAP) console, selecting your file system ID, and then selecting **Storage virtual machines**.
+ Configured a virtual private cloud using Amazon VPC where your Amazon FSx (NetApp ONTAP) file system resides.
+ Noted your Amazon FSx (NetApp ONTAP) authentication credentials for an Active Directory user account. This includes your Active Directory user name with your DNS domain name (for example, *user@corp.example.com*) and password. If you use the Network File System (NFS) protocol for your Amazon FSx (NetApp ONTAP) file system, the authentication credentials include a left ID, right ID, and pre-shared key.
**Note**  
Use only the necessary credentials required for the connector to function. Do not use privileged credentials like domain admin.
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).
+ Checked each document is unique in Amazon FSx (NetApp ONTAP) and across other data sources you plan to use for the same index. Each data source that you want to use for an index must not contain the same document across the data sources. Document IDs are global to an index and must be unique per index.

**In your AWS account, make sure you have:**
+ [Created an Amazon Kendra index](https://docs.aws.amazon.com/kendra/latest/dg/create-index.html) and, if using the API, noted the index ID.
+ [Created an IAM role](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds) for your data source and, if using the API, noted the ARN of the IAM role.
**Note**  
If you change your authentication type and credentials, you must update your IAM role to access the correct AWS Secrets Manager secret ID.
+ Stored your Amazon FSx (NetApp ONTAP) authentication credentials in an AWS Secrets Manager secret and, if using the API, noted the ARN of the secret.
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).

If you don’t have an existing IAM role or secret, you can use the console to create a new IAM role and Secrets Manager secret when you connect your Amazon FSx (NetApp ONTAP) data source to Amazon Kendra. If you are using the API, you must provide the ARN of an existing IAM role and Secrets Manager secret, and an index ID.

## Connection instructions
<a name="data-source-procedure-fsx-ontap"></a>

To connect Amazon Kendra to your Amazon FSx (NetApp ONTAP) data source, you must provide the necessary details of your Amazon FSx (NetApp ONTAP) data source so that Amazon Kendra can access your data. If you have not yet configured Amazon FSx (NetApp ONTAP) for Amazon Kendra, see [Prerequisites](#prerequisites-fsx-ontap).

------
#### [ Console ]

**To connect Amazon Kendra to your Amazon FSx (NetApp ONTAP) file system** 

1. Sign in to the AWS Management Console and open the [Amazon Kendra console](https://console.aws.amazon.com/kendra/).

1. From the left navigation pane, choose **Indexes** and then choose the index you want to use from the list of indexes.
**Note**  
You can choose to configure or edit your **User access control** settings under **Index settings**. 

1. On the **Getting started** page, choose **Add data source**.

1. On the **Add data source** page, choose **Amazon FSx (NetApp ONTAP) connector**, and then choose **Add connector**. If using version 2 (if applicable), choose **Amazon FSx (NetApp ONTAP) connector** with the "V2.0" tag.

1. On the **Specify data source details** page, enter the following information:

   1. In **Name and description**, for **Data source name**—Enter a name for your data source. You can include hyphens but not spaces.

   1. (Optional)** Description**—Enter an optional description for your data source.

   1. In **Default language**—Choose a language to filter your documents for the index. Unless you specify otherwise, the language defaults to English. Language specified in the document metadata overrides the selected language.

   1. In **Tags**, for **Add new tag**—Include optional tags to search and filter your resources or track your AWS costs.

   1. Choose **Next**.

1. On the **Define access and security** page, enter the following information:

   1. **Source**—Provide your file system information.
      + **File system protocol**—Choose the protocol of your Amazon FSx (NetApp ONTAP) file system. You can choose either Common Internet File System (CIFS) protocol, or the Network File System (NFS) protocol for Linux.
      + **Amazon FSx (NetApp ONTAP) file system ID**—Select from the dropdown your existing file system ID, fetched from Amazon FSx (NetApp ONTAP). Or, create an [Amazon FSx (NetApp ONTAP) file system](https://console.aws.amazon.com/fsx/). You can find your file system ID on the File Systems dashboard in the Amazon FSx (NetApp ONTAP) console.
      + **SVM ID** (Amazon FSx (NetApp ONTAP) for NetApp ONTAP only)—Provide the storage virtual machine (SVM) ID of your Amazon FSx (NetApp ONTAP) NetApp ONTAP. You can find your SVM ID by going to the File Systems dashboard in the Amazon FSx (NetApp ONTAP) console, selecting your file system ID, and selecting **Storage virtual machines**.

   1. **Authorization**—Turn on or off access control list (ACL) information for your documents, if you have an ACL and want to use it for access control. The ACL specifies which documents that users and groups can access. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see [User context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources).

   1. **Authentication**—Choose an existing AWS Secrets Manager secret, or create a new secret to store your file system credentials. If you choose to create a new secret, an AWS Secrets Manager secret window opens.

      Provide a secret that stores your authentication credentials of your user name and password. The user name must include your DNS domain name. For example, *user@corp.example.com*.

      If you use the NFS protocol for your Amazon FSx (NetApp ONTAP) file system, provide a secret that stores your authentication credentials of left ID, right ID, and pre-shared key.

      Save and add your secret.

   1. **Virtual Private Cloud (VPC)**—You must select an Amazon VPC where your Amazon FSx (NetApp ONTAP) resides. You include the VPC subnet and security groups. See [Configuring an Amazon VPC](https://docs.aws.amazon.com/kendra/latest/dg/vpc-configuration.html).

   1. **IAM role**—Choose an existing IAM role or create a new IAM role to access your repository credentials and index content.
**Note**  
IAM roles used for indexes cannot be used for data sources. If you are unsure if an existing role is used for an index or FAQ, choose **Create a new role** to avoid errors.

   1. Choose **Next**.

1. On the **Configure sync settings** page, enter the following information:

   1. **Sync scope, Regex patterns**—Add regular expression patterns to include or exclude certain files.

   1. **Sync mode**—Choose how you want to update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option.
      + Full sync: Freshly index all content, replacing existing content each time your data source syncs with your index.
      + New, modified, deleted sync: Index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

   1. **Sync run schedule**—For **Frequency**, choose how often to sync your data source content and update your index.

   1. Choose **Next**.

1. On the **Set field mappings** page, enter the following information:

   1. Select from the Amazon Kendra generated default fields of your files that you want to map to your index. To add custom data source fields, create an index field name to map to and the field data type.

   1. Choose **Next**.

1. On the **Review and create** page, check that the information you have entered is correct and then select **Add data source**. You can also choose to edit your information from this page. Your data source will appear on the **Data sources** page after the data source has been added successfully.

------
#### [ API ]

**To connect Amazon Kendra to your Amazon FSx (NetApp ONTAP) file system**

You must specify a JSON of the [data source schema](https://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html#ds-fsx-ontap-schema) using the [https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) API. You must provide the following information:
+ **Data source**—Specify the data source type as `FSXONTAP` when you use the [https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) JSON schema. Also specify the data source as `TEMPLATE` when you call the [https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html](https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html) API.
+ **File system ID**—The identifier of the Amazon FSx (NetApp ONTAP) file system. You can find your file system ID on the File Systems dashboard in the Amazon FSx (NetApp ONTAP) console.
+ **SVM ID**—The storage virtual machine (SVM) ID used with your file system. You can find your SVM ID by going to the File Systems dashboard in the Amazon FSx (NetApp ONTAP) console, selecting your file system ID, and then selecting **Storage virtual machines**.
+ **Protocol type**—Specify whether you use the Common Internet File System (CIFS) protocol, or the Network File System (NFS) protocol for Linux.
+ **File system type**—Specify the type of file system as either `FSXONTAP`.
+  **Virtual Private Cloud (VPC)**—Specify `VpcConfiguration` when you call `CreateDataSource`. For more information, see [Configuring Amazon Kendra to use an Amazon VPC](vpc-configuration.md).
**Note**  
You must select an Amazon VPC where your Amazon FSx (NetApp ONTAP) resides. You include the VPC subnet and security groups.
+ **Secret Amazon Resource Name (ARN)**—Provide the Amazon Resource Name (ARN) of an Secrets Manager secret that contains the authentication credentials for your Amazon FSx (NetApp ONTAP) account. The secret is stored in a JSON structure with the following keys:

  ```
  {
      "username": "user@corp.example.com",
      "password": "password"
  }
  ```

  If you use the NFS protocol for your Amazon FSx (NetApp ONTAP) file system, the secret is stored in a JSON structure with the following keys:

  ```
  {
      "leftId": "left ID",
      "rightId": "right ID",
      "preSharedKey": "pre-shared key"
  }
  ```
+ **IAM role**—Specify `RoleArn` when you call `CreateDataSource` to provide an IAM role with permissions to access your Secrets Manager secret and to call the required public APIs for the Amazon FSx (NetApp ONTAP) connector and Amazon Kendra. For more information, see [IAM roles for Amazon FSx (NetApp ONTAP) data sources](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds).

You can also add the following optional features:
+ **Sync mode**—Specify how Amazon Kendra should update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option. You can choose between:
  + `FORCED_FULL_CRAWL` to freshly index all content, replacing existing content each time your data source syncs with your index.
  + `FULL_CRAWL` to index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source’s mechanism for tracking content changes and index content that changed since the last sync.
+ **Inclusion and exclusion filters**—Specify whether to include or exclude certain files.
**Note**  
Most data sources use regular expression patterns, which are inclusion or exclusion patterns referred to as filters. If you specify an inclusion filter, only content that matches the inclusion filter is indexed. Any document that doesn’t match the inclusion filter isn’t indexed. If you specify an inclusion and exclusion filter, documents that match the exclusion filter are not indexed, even if they match the inclusion filter.
+ **Access control list (ACL)**—Specify whether to crawl ACL information for your documents, if you have an ACL and want to use it for access control. The ACL specifies which documents that users and groups can access. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see [User context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources).
**Note**  
To test user context filtering on a user, you must include the DNS domain name as part of the user name when you issue the query. You must have administrative permissions of the Active Directory domain. You can also test user context filtering on a group name.
+  **Field mappings**—Choose to map your Amazon FSx (NetApp ONTAP) data source fields to your Amazon Kendra index fields. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html).
**Note**  
The document body field or the document body equivalent for your documents is required in order for Amazon Kendra to search your documents. You must map your document body field name in your data source to the index field name `_document_body`. All other fields are optional.

For a list of other important JSON keys to configure, see [Amazon FSx (NetApp ONTAP) template schema](https://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html#ds-fsx-ontap-schema).

------

# Amazon RDS/Aurora
<a name="data-source-database"></a>

You can index documents that are stored in a database using a database data source. After you provided connection information for the database, Amazon Kendra connects and indexes documents.

Amazon Kendra supports the following databases:
+ Amazon Aurora MySQL
+ Amazon Aurora PostgreSQL
+ Amazon RDS for MySQL
+ Amazon RDS for PostgreSQL

**Note**  
Serverless Aurora databases are not supported.

**Important**  
This Amazon RDS/Aurora connector is scheduled for deprecation by the end of 2023.  
Amazon Kendra now supports new database data source connectors. For an improved experience, we recommend you choose from the following new connectors for your use case:  
[Aurora (MySQL)](https://docs.aws.amazon.com/kendra/latest/dg/data-source-aurora-mysql.html)
[Aurora (PostgreSQL)](https://docs.aws.amazon.com/kendra/latest/dg/data-source-aurora-postgresql.html)
[Amazon RDS (MySQL)](https://docs.aws.amazon.com/kendra/latest/dg/data-source-rds-mysql.html)
[Amazon RDS (Microsoft SQL Server)](https://docs.aws.amazon.com/kendra/latest/dg/data-source-rds-ms-sql-server.html)
[Amazon RDS (Oracle)](https://docs.aws.amazon.com/kendra/latest/dg/data-source-rds-oracle.html)
[Amazon RDS (PostgreSQL)](https://docs.aws.amazon.com/kendra/latest/dg/data-source-rds-postgresql.html)
[IBM DB2](https://docs.aws.amazon.com/kendra/latest/dg/data-source-ibm-db2.html)
[Microsoft SQL Server](https://docs.aws.amazon.com/kendra/latest/dg/data-source-ms-sql-server.html)
[MySQL](https://docs.aws.amazon.com/kendra/latest/dg/data-source-mysql.html)
[Oracle Database](https://docs.aws.amazon.com/kendra/latest/dg/data-source-oracle-database.html)
[PostgreSQL](https://docs.aws.amazon.com/kendra/latest/dg/data-source-postgresql.html)

You can connect Amazon Kendra to your database data source using the [Amazon Kendra console](https://console.aws.amazon.com/kendra/) and the [DatabaseConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_DatabaseConfiguration.html) API.

For troubleshooting your Amazon Kendra database data source connector, see [Troubleshooting data sources](troubleshooting-data-sources.md).

**Topics**
+ [Supported features](#supported-features-database)
+ [Prerequisites](#prerequisites-database)
+ [Connection instructions](#data-source-procedure-database)

## Supported features
<a name="supported-features-database"></a>

Amazon Kendra database data source connector supports the following features:
+ Field mappings
+ User context filtering
+ Virtual private cloud (VPC)

## Prerequisites
<a name="prerequisites-database"></a>

Before you can use Amazon Kendra to index your database data source, make these changes in your database and AWS accounts.

**In your database, make sure you have:**
+ Noted your basic authentication credentials of user name and password for your database.
+ Copied the host name, port number, host address, the name of the database, and the name of the data table that contains the document data. For PostgreSQL, the data table must be a public table or public schema.
**Note**  
The host and port tell Amazon Kendra where to find the database server on the internet. The database name and table name tell Amazon Kendra where to find the document data on the database server.
+ Copied the names of the columns in the data table that contain the document data. You must include the document ID, document body, columns to detect if a document has changed (for example, last updated column), and optional data table columns that map to custom index fields. You can also map any of the [Amazon Kendra reserved field names](https://docs.aws.amazon.com/kendra/latest/dg/hiw-document-attributes.html#index-reserved-fields) to a table column.
+ Copied the database engine type information such as whether you use Amazon RDS for MySQL or another type.
+ Checked each document is unique in database and across other data sources you plan to use for the same index. Each data source that you want to use for an index must not contain the same document across the data sources. Document IDs are global to an index and must be unique per index.

**In your AWS account, make sure you have:**
+ [Created an Amazon Kendra index](https://docs.aws.amazon.com/kendra/latest/dg/create-index.html) and, if using the API, noted the index ID.
+ [Created an IAM role](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds) for your data source and, if using the API, noted the ARN of the IAM role.
**Note**  
If you change your authentication type and credentials, you must update your IAM role to access the correct AWS Secrets Manager secret ID.
+ Stored your database authentication credentials in an AWS Secrets Manager secret and, if using the API, noted the ARN of the secret.
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).

If you don’t have an existing IAM role or secret, you can use the console to create a new IAM role and Secrets Manager secret when you connect your database data source to Amazon Kendra. If you are using the API, you must provide the ARN of an existing IAM role and Secrets Manager secret, and an index ID.

## Connection instructions
<a name="data-source-procedure-database"></a>

To connect Amazon Kendra to your database data source, you must provide the necessary details of your database data source so that Amazon Kendra can access your data. If you have not yet configured database for Amazon Kendra, see [Prerequisites](#prerequisites-database).

------
#### [ Console ]

**To connect Amazon Kendra to a database** 

1. Sign in to the AWS Management Console and open the [Amazon Kendra console](https://console.aws.amazon.com/kendra/).

1. From the left navigation pane, choose **Indexes** and then choose the index you want to use from the list of indexes.
**Note**  
You can choose to configure or edit your **User access control** settings under **Index settings**. 

1. On the **Getting started** page, choose **Add data source**.

1. On the **Add data source** page, choose **database connector**, and then choose **Add connector**. If using version 2 (if applicable), choose **database connector** with the "V2.0" tag.

1. On the **Specify data source details** page, enter the following information:

   1. In **Name and description**, for **Data source name**—Enter a name for your data source. You can include hyphens but not spaces.

   1. (Optional)** Description**—Enter an optional description for your data source.

   1. In **Default language**—Choose a language to filter your documents for the index. Unless you specify otherwise, the language defaults to English. Language specified in the document metadata overrides the selected language.

   1. In **Tags**, for **Add new tag**—Include optional tags to search and filter your resources or track your AWS costs.

   1. Choose **Next**.

1. On the **Define access and security** page, enter the following information:

   1. **Endpoint**—A DNS host name, an IPv4 address, or an IPv6 address.

   1. **Port**—A port number.

   1. **Database**—Database name.

   1. **Table name**—Table name.

   1. For **Type of authentication**, choose between **Existing** and **New** to store your database authentication credentials. If you choose to create a new secret an AWS Secrets Manager secret window opens. 

      1. Enter following information in the **Create an AWS Secrets Manager secret window**:

        1. **Secret name**—A name for your secret. The prefix ‘AmazonKendra-database-’ is automatically added to your secret name.

        1. For **User name** and **Password**—Enter the authentication credential values from your database account.

        1. Choose **Save authentication**.

   1. **Virtual Private Cloud (VPC)**—You can choose to use a VPC. If so, you must add **Subnets** and **VPC security groups**.
**Note**  
You must use a private subnet. If your RDS instance is in a public subnet in your VPC, you can create a private subnet that has outbound access to a NAT gateway in the public subnet. The subnets provided in the VPC configuration must be in either US West (Oregon), US East (N. Virginia), EU (Ireland).

   1. **IAM role**—Choose an existing IAM role or create a new IAM role to access your repository credentials and index content.
**Note**  
IAM roles used for indexes cannot be used for data sources. If you are unsure if an existing role is used for an index or FAQ, choose **Create a new role** to avoid errors.

   1. Choose **Next**.

1. On the **Configure sync settings** page, enter the following information:

   1. Select between **Aurora MySQL**, **MySQL**, **Aurora PostgreSQL**, and **PostgreSQL** based on your use case.

   1. **Enclose SQL identifiers with double quotes**—Select to enclose SQL identifiers in double quotes. For example, “columnName”.

   1. **ACL column** and **Change detecting columns**—Configure the columns that Amazon Kendra uses for change detection (for example, last updated column) and your access control list.

   1. In **Sync run schedule**, for **Frequency**—Choose how often Amazon Kendra will sync with your data source.

   1. Choose **Next**.

1. On the **Set field mappings** page, enter the following information:

   1. **Amazon Kendra default field mappings**—Select from the Amazon Kendra generated default data source fields you want to map to your index. You must add the **Database column** values for `document_id` and `document_body` 

   1.  **Custom field mappings**—To add custom data source fields to create an index field name to map to and the field data type.

   1. Choose **Next**.

1. On the **Review and create** page, check that the information you have entered is correct and then select **Add data source**. You can also choose to edit your information from this page. Your data source will appear on the **Data sources** page after the data source has been added successfully.

------
#### [ API ]

**To connect Amazon Kendra to a database**

You must specify the following the [DatabaseConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_DatabaseConfiguration.html) API:
+ **ColumnConfiguration**—Information about where the index should get the document information from the database. For more details, see [ColumnConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_ColumnConfiguration.html). You must specify the `DocumentDataColumnName` (document body or main text) and `DocumentIdColumnName`, and `ChangeDetectingColumn` (for example, last updated column) fields. The column mapped to the `DocumentIdColumnName` field must be an integer column. The following example shows a simple column configuration for a database data source: 

  ```
  "ColumnConfiguration": {
      "ChangeDetectingColumns": [
          "LastUpdateDate",
          "LastUpdateTime"
      ],
      "DocumentDataColumnName": "TextColumn",
      "DocumentIdColumnName": "IdentifierColumn",
      "DocoumentTitleColumnName": "TitleColumn",
      "FieldMappings": [
          {
              "DataSourceFieldName": "AbstractColumn",
              "IndexFieldName": "Abstract"
          }
      ]
  }
  ```
+ **ConnectionConfiguration**—Configuration information that's required to connect to a database. For more details, see [ConnectionConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_ConnectionConfiguration.html).
+ **DatabaseEngineType**—The type of database engine that runs the database. The `DatabaseHost` field for `ConnectionConfiguration` must be the Amazon Relational Database Service (Amazon RDS) instance endpoint for the database. Don't use the cluster endpoint.
+ **Secret Amazon Resource Name (ARN)**—Provide the Amazon Resource Name (ARN) of an Secrets Manager secret that contains the authentication credentials for your database account. The secret is stored in a JSON structure with the following keys:

  ```
  {
      "username": "user name",
      "password": "password"
  }
  ```

  The following example shows a database configuration, including the secret ARN.

  ```
  "DatabaseConfiguration": {
  "ConnectionConfiguration": {
  "DatabaseHost": "host.subdomain.domain.tld",
          "DatabaseName": "DocumentDatabase",
          "DatabasePort": 3306,
          "SecretArn": "arn:aws:secretmanager:region:account ID:secret/secret name",
          "TableName": "DocumentTable"
      }
  }
  ```
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).
+ **IAM role**—Specify `RoleArn` when you call `CreateDataSource` to provide an IAM role with permissions to access your Secrets Manager secret and to call the required public APIs for the database connector and Amazon Kendra. For more information, see [IAM roles for database data sources](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds).

You can also add the following optional features:
+ **Virtual Private Cloud (VPC)**—Specify `VpcConfiguration` as part of the data source configuration. See [Configuring Amazon Kendra to use a VPC](https://docs.aws.amazon.com/kendra/latest/dg/vpc-configuration.html).
**Note**  
You must only use a private subnet. If your RDS instance is in a public subnet in your VPC, you can create a private subnet that has outbound access to a NAT gateway in the public subnet. The subnets provided in the VPC configuration must be in either US West (Oregon), US East (N. Virginia), EU (Ireland).
+  **Field mappings**—Choose to map your database data source fields to your Amazon Kendra index fields. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html).
**Note**  
The document body field or the document body equivalent for your documents is required in order for Amazon Kendra to search your documents. You must map your document body field name in your data source to the index field name `_document_body`. All other fields are optional.
+  **User context filtering and access control**—Amazon Kendra crawls the access control list (ACL) for your documents, if you have an ACL for your documents. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see [User context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#datasource-context-filter).

------

# Amazon RDS (Microsoft SQL Server)
<a name="data-source-rds-ms-sql-server"></a>

**Note**  
Amazon RDS (Microsoft SQL Server) connector remains fully supported for existing customers through May 31, 2026. While this connector is no longer available for new users, current users can continue to use it without interruption. We are continuously evolving our connector portfolio to offer more scalable and customizable solutions. For future integrations, we recommend exploring the Amazon Kendra Custom Connector Framework[1], designed to support a broader range of enterprise use cases with enhanced flexibility.

SQL Server is database management system developed by Microsoft. Amazon RDS for SQL Server makes it easy to set up, operate, and scale SQL Server deployments in the cloud. If you are a Amazon RDS (Microsoft SQL Server) user, you can use Amazon Kendra to index your Amazon RDS (Microsoft SQL Server) data source. The Amazon Kendra JDBC data source connector supports Microsoft SQL Server 2019.

You can connect Amazon Kendra to your Amazon RDS (Microsoft SQL Server) data source using the [Amazon Kendra console](https://console.aws.amazon.com/kendra/) and the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) API.

For troubleshooting your Amazon Kendra Amazon RDS (Microsoft SQL Server) data source connector, see [Troubleshooting data sources](troubleshooting-data-sources.md).

**Topics**
+ [Supported features](#supported-features-rds-ms-sql-server)
+ [Prerequisites](#prerequisites-rds-ms-sql-server)
+ [Connection instructions](#data-source-procedure-rds-ms-sql-server)
+ [Notes](#rds-ms-sql-server-notes)

## Supported features
<a name="supported-features-rds-ms-sql-server"></a>
+ Field mappings
+ User context filtering
+ Inclusion/exclusion filters
+ Full and incremental content syncs
+ Virtual private cloud (VPC)

## Prerequisites
<a name="prerequisites-rds-ms-sql-server"></a>

Before you can use Amazon Kendra to index your Amazon RDS (Microsoft SQL Server) data source, make these changes in your Amazon RDS (Microsoft SQL Server) and AWS accounts.

**In Amazon RDS (Microsoft SQL Server), make sure you have:**
+ Noted your database user name and password.
**Important**  
As a best practice, provide Amazon Kendra with read-only database credentials.
+ Copied your database host url, port, and instance.
+ Checked each document is unique in Amazon RDS (Microsoft SQL Server) and across other data sources you plan to use for the same index. Each data source that you want to use for an index must not contain the same document across the data sources. Document IDs are global to an index and must be unique per index.

**In your AWS account, make sure you have:**
+ [Created an Amazon Kendra index](https://docs.aws.amazon.com/kendra/latest/dg/create-index.html) and, if using the API, noted the index ID.
+ [Created an IAM role](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds) for your data source and, if using the API, noted the ARN of the IAM role.
**Note**  
If you change your authentication type and credentials, you must update your IAM role to access the correct AWS Secrets Manager secret ID.
+ Stored your Amazon RDS (Microsoft SQL Server) authentication credentials in an AWS Secrets Manager secret and, if using the API, noted the ARN of the secret.
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).

If you don’t have an existing IAM role or secret, you can use the console to create a new IAM role and Secrets Manager secret when you connect your Amazon RDS (Microsoft SQL Server) data source to Amazon Kendra. If you are using the API, you must provide the ARN of an existing IAM role and Secrets Manager secret, and an index ID.

## Connection instructions
<a name="data-source-procedure-rds-ms-sql-server"></a>

To connect Amazon Kendra to your Amazon RDS (Microsoft SQL Server) data source you must provide details of your Amazon RDS (Microsoft SQL Server) credentials so that Amazon Kendra can access your data. If you have not yet configured Amazon RDS (Microsoft SQL Server) for Amazon Kendra see [Prerequisites](#prerequisites-rds-ms-sql-server).

------
#### [ Console ]

**To connect Amazon Kendra to Amazon RDS (Microsoft SQL Server)** 

1. Sign in to the AWS Management Console and open the [Amazon Kendra console](https://console.aws.amazon.com/kendra/).

1. From the left navigation pane, choose **Indexes** and then choose the index you want to use from the list of indexes.
**Note**  
You can choose to configure or edit your **User access control** settings under **Index settings**. 

1. On the **Getting started** page, choose **Add data source**.

1. On the **Add data source** page, choose **Amazon RDS (Microsoft SQL Server) connector**, and then choose **Add connector**. If using version 2 (if applicable), choose **Amazon RDS (Microsoft SQL Server) connector** with the "V2.0" tag.

1. On the **Specify data source details** page, enter the following information:

   1. In **Name and description**, for **Data source name**—Enter a name for your data source. You can include hyphens but not spaces.

   1. (Optional)** Description**—Enter an optional description for your data source.

   1. In **Default language**—Choose a language to filter your documents for the index. Unless you specify otherwise, the language defaults to English. Language specified in the document metadata overrides the selected language.

   1. In **Tags**, for **Add new tag**—Include optional tags to search and filter your resources or track your AWS costs.

   1. Choose **Next**.

1. On the **Define access and security** page, enter the following information:

   1. In **Source**, enter the following information:

   1.  **Host**— Enter the database host name.

   1.  **Port**— Enter the database port.

   1.  **Instance**— Enter the database instance.

   1. **Enable SSL certificate location**—Choose to enter the Amazon S3 path to your SSL certificate file.

   1. In **Authentication**—enter the following information:

      1. **AWS Secrets Manager secret**—Choose an existing secret or create a new Secrets Manager secret to store your Amazon RDS (Microsoft SQL Server) authentication credentials. If you choose to create a new secret an AWS Secrets Manager secret window opens.

        1. Enter following information in the **Create an AWS Secrets Manager secret window**:

           1. **Secret name**—A name for your secret. The prefix ‘AmazonKendra-Amazon RDS (Microsoft SQL Server)-’ is automatically added to your secret name.

           1. For **Database user name**, and **Password**—Enter the authentication credential values you copied from your database. 

        1. Choose **Save**.

   1. **Virtual Private Cloud (VPC)**—You can choose to use a VPC. If so, you must add **Subnets** and **VPC security groups**.

   1. **IAM role**—Choose an existing IAM role or create a new IAM role to access your repository credentials and index content.
**Note**  
IAM roles used for indexes cannot be used for data sources. If you are unsure if an existing role is used for an index or FAQ, choose **Create a new role** to avoid errors.

   1. Choose **Next**.

1. On the **Configure sync settings** page, enter the following information:

   1. In **Sync scope**, choose from the following options :
      + **SQL query**—Enter SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB. Amazon Kendra will crawl all database content that matches your query.
**Note**  
If a table name includes special characters (non alphanumeric) in the name, you must use square brackets around the table name. For example, *select \$1 from [my-database-table]*
      + **Primary key column**—Provide the primary key for the database table. This identifies a table within your database.
      + **Title column**—Provide the name of the document title column within your database table.
      + **Body column**—Provide the name of the document body column within your database table.

   1. In **Additional configuration – *optional***, choose from the following options to sync specific content instead of syncing all files:
      + **Change-detecting columns**—Enter the names of the columns that Amazon Kendra will use to detect content changes. Amazon Kendra will re-index content when there is a change in any of these columns.
      + **User IDs column**—Enter the name of the column which contains User IDs to be allowed access to content.
      + **Groups column**—Enter the name of the column that contains groups to be allowed access to content.
      + **Source URLs column**—Enter the name of the column which contains Source URLs to be indexed.
      + **Time stamps column**—Enter the name of the column which contains time stamps. Amazon Kendra uses time stamp information to detect changes in your content and sync only changed content. 
      + **Time zones column**—Enter the name of the column which contains time zones for the content to be crawled.
      + **Time stamps format**—Enter the name of the column which contains time stamp formats to use to detect content changes and re-sync your content.

   1. **Sync mode**—Choose how you want to update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option.
      + Full sync: Freshly index all content, replacing existing content each time your data source syncs with your index.
      + New, modified sync: Index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.
      + New, modified, deleted sync: Index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

   1. In **Sync run schedule**, for **Frequency**—How often Amazon Kendra will sync with your data source.

   1. Choose **Next**.

1. On the **Set field mappings** page, enter the following information:

   1. Select from the generated default data source fields—**Document IDs**, **Document titles**, and **Source URLs**—you want to map to Amazon Kendra index.

   1.  **Add field**—To add custom data source fields to create an index field name to map to and the field data type.

   1. Choose **Next**.

1. On the **Review and create** page, check that the information you have entered is correct and then select **Add data source**. You can also choose to edit your information from this page. Your data source will appear on the **Data sources** page after the data source has been added successfully.

------
#### [ API ]

**To connect Amazon Kendra to Amazon RDS (Microsoft SQL Server)**

You must specify the following using the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) API:
+ **Data source**—Specify the data source type as `JDBC` when you use the [https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) JSON schema. Also specify the data source as `TEMPLATE` when you call the [https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html](https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html) API.
+ **Database type**—You must specify the database type as `sqlserver`.
+ **SQL query**—Specify SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB. Amazon Kendra will crawl all database content that matches your query.
**Note**  
If a table name includes special characters (non alphanumeric) in the name, you must use square brackets around the table name. For example, *select \$1 from [my-database-table]*
+ **Sync mode**—Specify how Amazon Kendra should update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option. You can choose between:
  + `FORCED_FULL_CRAWL` to freshly index all content, replacing existing content each time your data source syncs with your index.
  + `FULL_CRAWL` to index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source’s mechanism for tracking content changes and index content that changed since the last sync.
  + `CHANGE_LOG` to index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source’s mechanism for tracking content changes and index content that changed since the last sync.
+ **Secret Amazon Resource Name (ARN)**—Provide the Amazon Resource Name (ARN) of an Secrets Manager secret that contains the authentication credentials you created in your Amazon RDS (Microsoft SQL Server) account. The secret is stored in a JSON structure with the following keys:

  ```
  {
      "user name": "database user name",
      "password": "password"
  }
  ```
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).
+ **IAM role**—Specify `RoleArn` when you call `CreateDataSource` to provide an IAM role with permissions to access your Secrets Manager secret and to call the required public APIs for the Amazon RDS (Microsoft SQL Server) connector and Amazon Kendra. For more information, see [IAM roles for Amazon RDS (Microsoft SQL Server) data sources](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds).

You can also add the following optional features:
+  **Virtual Private Cloud (VPC)**—Specify `VpcConfiguration` when you call `CreateDataSource`. For more information, see [Configuring Amazon Kendra to use an Amazon VPC](vpc-configuration.md).
+  **Inclusion and exclusion filters**—You can specify whether to include specific content using user IDs, groups, source URLs, time stamps, and time zones. 
+  **User context filtering and access control**—Amazon Kendra crawls the access control list (ACL) for your documents, if you have an ACL for your documents. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see [User context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#datasource-context-filter).
+  **Field mappings**—Choose to map your Amazon RDS (Microsoft SQL Server) data source fields to your Amazon Kendra index fields. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html).
**Note**  
The document body field or the document body equivalent for your documents is required in order for Amazon Kendra to search your documents. You must map your document body field name in your data source to the index field name `_document_body`. All other fields are optional.

For a list of other important JSON keys to configure, see [Amazon RDS (Microsoft SQL Server) template schema](ds-schemas.md#ds-rds-ms-sql-server-schema).

------

## Notes
<a name="rds-ms-sql-server-notes"></a>
+ Deleted database rows will not be tracked in when Amazon Kendra checks for updated content.
+ The size of field names and values in a row of your database can't exceed 400KB.
+ If you have a large amount of data in your database data source, and do not want Amazon Kendra to index all your database content after the first sync, you can choose to sync only new, modified, or deleted documents.
+ As a best practice, provide Amazon Kendra with read-only database credentials.
+ As a best practice, avoid adding tables with sensitive data or personal identifiable information (PII).

# Amazon RDS (MySQL)
<a name="data-source-rds-mysql"></a>

**Note**  
Amazon RDS (MySQL) connector remains fully supported for existing customers through May 31, 2026. While this connector is no longer available for new users, current users can continue to use it without interruption. We are continuously evolving our connector portfolio to offer more scalable and customizable solutions. For future integrations, we recommend exploring the Amazon Kendra Custom Connector Framework[1], designed to support a broader range of enterprise use cases with enhanced flexibility.

Amazon RDS (Amazon Relational Database Service) is a web service that makes it easier to set up, operate, and scale a relational database in the AWS Cloud. If you are a Amazon RDS user, you can use Amazon Kendra to index your Amazon RDS (MySQL) data source. The Amazon Kendra data source connector supports Amazon RDS MySql 5.6, 5.7, and 8.0.

You can connect Amazon Kendra to your Amazon RDS (MySQL) data source using the [Amazon Kendra console](https://console.aws.amazon.com/kendra/) and the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) API.

For troubleshooting your Amazon Kendra Amazon RDS (MySQL) data source connector, see [Troubleshooting data sources](troubleshooting-data-sources.md).

**Topics**
+ [Supported features](#supported-features-rds-mysql)
+ [Prerequisites](#prerequisites-rds-mysql)
+ [Connection instructions](#data-source-procedure-rds-mysql)
+ [Notes](#rds-mysql-notes)

## Supported features
<a name="supported-features-rds-mysql"></a>
+ Field mappings
+ User context filtering
+ Inclusion/exclusion filters
+ Full and incremental content syncs
+ Virtual private cloud (VPC)

## Prerequisites
<a name="prerequisites-rds-mysql"></a>

Before you can use Amazon Kendra to index your Amazon RDS (MySQL) data source, make these changes in your Amazon RDS (MySQL) and AWS accounts.

**In Amazon RDS (MySQL), make sure you have:**
+ Noted your database user name and password.
**Important**  
As a best practice, provide Amazon Kendra with read-only database credentials.
+ Copied your database host url, port, and instance. You can find this information on the Amazon RDS console.
+ Checked each document is unique in Amazon RDS (MySQL) and across other data sources you plan to use for the same index. Each data source that you want to use for an index must not contain the same document across the data sources. Document IDs are global to an index and must be unique per index.

**In your AWS account, make sure you have:**
+ [Created an Amazon Kendra index](https://docs.aws.amazon.com/kendra/latest/dg/create-index.html) and, if using the API, noted the index ID.
+ [Created an IAM role](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds) for your data source and, if using the API, noted the ARN of the IAM role.
**Note**  
If you change your authentication type and credentials, you must update your IAM role to access the correct AWS Secrets Manager secret ID.
+ Stored your Amazon RDS (MySQL) authentication credentials in an AWS Secrets Manager secret and, if using the API, noted the ARN of the secret.
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).

If you don’t have an existing IAM role or secret, you can use the console to create a new IAM role and Secrets Manager secret when you connect your Amazon RDS (MySQL) data source to Amazon Kendra. If you are using the API, you must provide the ARN of an existing IAM role and Secrets Manager secret, and an index ID.

## Connection instructions
<a name="data-source-procedure-rds-mysql"></a>

To connect Amazon Kendra to your Amazon RDS (MySQL) data source you must provide details of your Amazon RDS (MySQL) credentials so that Amazon Kendra can access your data. If you have not yet configured Amazon RDS (MySQL) for Amazon Kendra see [Prerequisites](#prerequisites-rds-mysql).

------
#### [ Console ]

**To connect Amazon Kendra to Amazon RDS (MySQL)** 

1. Sign in to the AWS Management Console and open the [Amazon Kendra console](https://console.aws.amazon.com/kendra/).

1. From the left navigation pane, choose **Indexes** and then choose the index you want to use from the list of indexes.
**Note**  
You can choose to configure or edit your **User access control** settings under **Index settings**. 

1. On the **Getting started** page, choose **Add data source**.

1. On the **Add data source** page, choose **Amazon RDS (MySQL) connector**, and then choose **Add connector**. If using version 2 (if applicable), choose **Amazon RDS (MySQL) connector** with the "V2.0" tag.

1. On the **Specify data source details** page, enter the following information:

   1. In **Name and description**, for **Data source name**—Enter a name for your data source. You can include hyphens but not spaces.

   1. (Optional)** Description**—Enter an optional description for your data source.

   1. In **Default language**—Choose a language to filter your documents for the index. Unless you specify otherwise, the language defaults to English. Language specified in the document metadata overrides the selected language.

   1. In **Tags**, for **Add new tag**—Include optional tags to search and filter your resources or track your AWS costs.

   1. Choose **Next**.

1. On the **Define access and security** page, enter the following information:

   1. In **Source**, enter the following information:

   1.  **Host** – Enter the database host URL, for example: `http://instance URL.region.rds.amazonaws.com`.

   1.  **Port** – Enter the database port, for example, `5432`.

   1.  **Instance** – Enter the database instance, for example `postgres`.

   1. **Enable SSL certificate location**—Choose to enter the Amazon S3 path to your SSL certificate file.

   1. In **Authentication**—enter the following information:

      1. **AWS Secrets Manager secret**—Choose an existing secret or create a new Secrets Manager secret to store your Amazon RDS (MySQL) authentication credentials. If you choose to create a new secret an AWS Secrets Manager secret window opens.

        1. Enter following information in the **Create an AWS Secrets Manager secret window**:

           1. **Secret name**—A name for your secret. The prefix ‘AmazonKendra-Amazon RDS (MySQL)-’ is automatically added to your secret name.

           1. For **Database user name**, and **Password**—Enter the authentication credential values you copied from your database. 

        1. Choose **Save**.

   1. **Virtual Private Cloud (VPC)**—You can choose to use a VPC. If so, you must add **Subnets** and **VPC security groups**.

   1. **IAM role**—Choose an existing IAM role or create a new IAM role to access your repository credentials and index content.
**Note**  
IAM roles used for indexes cannot be used for data sources. If you are unsure if an existing role is used for an index or FAQ, choose **Create a new role** to avoid errors.

   1. Choose **Next**.

1. On the **Configure sync settings** page, enter the following information:

   1. In **Sync scope**, choose from the following options :
      + **SQL query**—Enter SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB SQL queries must be less than 32KB and not contain any semi-colons (;). Amazon Kendra will crawl all database content that matches your query.
      + **Primary key column**—Provide the primary key for the database table. This identifies a table within your database.
      + **Title column**—Provide the name of the document title column within your database table.
      + **Body column**—Provide the name of the document body column within your database table.

   1. In **Additional configuration – *optional***, choose from the following options to sync specific content instead of syncing all files:
      + **Change-detecting columns**—Enter the names of the columns that Amazon Kendra will use to detect content changes. Amazon Kendra will re-index content when there is a change in any of these columns.
      + **Users' IDs column**—Enter the name of the column which contains User IDs to be allowed access to content.
      + **Groups column**—Enter the name of the column that contains groups to be allowed access to content.
      + **Source URLs column**—Enter the name of the column which contains Source URLs to be indexed.
      + **Time stamps column**—Enter the name of the column which contains time stamps. Amazon Kendra uses time stamp information to detect changes in your content and sync only changed content. 
      + **Time zones column**—Enter the name of the column which contains time zones for the content to be crawled.
      + **Time stamps format**—Enter the name of the column which contains time stamp formats to use to detect content changes and re-sync your content.

   1. **Sync mode**—Choose how you want to update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option.
      + Full sync: Freshly index all content, replacing existing content each time your data source syncs with your index.
      + New, modified sync: Index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.
      + New, modified, deleted sync: Index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

   1. In **Sync run schedule**, for **Frequency**—How often Amazon Kendra will sync with your data source.

   1. Choose **Next**.

1. On the **Set field mappings** page, enter the following information:

   1. Select from the generated default data source fields—**Document IDs**, **Document titles**, and **Source URLs**—you want to map to Amazon Kendra index.

   1.  **Add field**—To add custom data source fields to create an index field name to map to and the field data type.

   1. Choose **Next**.

1. On the **Review and create** page, check that the information you have entered is correct and then select **Add data source**. You can also choose to edit your information from this page. Your data source will appear on the **Data sources** page after the data source has been added successfully.

------
#### [ API ]

**To connect Amazon Kendra to Amazon RDS (MySQL)**

You must specify the following using the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) API:
+ **Data source**—Specify the data source type as `JDBC` when you use the [https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) JSON schema. Also specify the data source as `TEMPLATE` when you call the [https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html](https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html) API.
+ **Database type**—You must specify the database type as `mySql`.
+ **SQL query**—Specify SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB. Amazon Kendra will crawl all database content that matches your query.
+ **Sync mode**—Specify how Amazon Kendra should update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option. You can choose between:
  + `FORCED_FULL_CRAWL` to freshly index all content, replacing existing content each time your data source syncs with your index.
  + `FULL_CRAWL` to index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source’s mechanism for tracking content changes and index content that changed since the last sync.
  + `CHANGE_LOG` to index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source’s mechanism for tracking content changes and index content that changed since the last sync.
+ **Secret Amazon Resource Name (ARN)**—Provide the Amazon Resource Name (ARN) of an Secrets Manager secret that contains the authentication credentials you created in your Amazon RDS (MySQL) account. The secret is stored in a JSON structure with the following keys:

  ```
  {
      "user name": "database user name",
      "password": "password"
  }
  ```
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).
+ **IAM role**—Specify `RoleArn` when you call `CreateDataSource` to provide an IAM role with permissions to access your Secrets Manager secret and to call the required public APIs for the Amazon RDS (MySQL) connector and Amazon Kendra. For more information, see [IAM roles for Amazon RDS (MySQL) data sources](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds).

You can also add the following optional features:
+  **Virtual Private Cloud (VPC)**—Specify `VpcConfiguration` when you call `CreateDataSource`. For more information, see [Configuring Amazon Kendra to use an Amazon VPC](vpc-configuration.md).
+  **Inclusion and exclusion filters**—You can specify whether to include specific content using user IDs, groups, source URLs, time stamps, and time zones. 
+  **Field mappings**—Choose to map your Amazon RDS (MySQL) data source fields to your Amazon Kendra index fields. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html).
**Note**  
The document body field or the document body equivalent for your documents is required in order for Amazon Kendra to search your documents. You must map your document body field name in your data source to the index field name `_document_body`. All other fields are optional.
+  **User context filtering and access control**—Amazon Kendra crawls the access control list (ACL) for your documents, if you have an ACL for your documents. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see [User context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#datasource-context-filter).

For a list of other important JSON keys to configure, see [Amazon RDS (MySQL) template schema](ds-schemas.md#ds-rds-mysql-schema).

------

## Notes
<a name="rds-mysql-notes"></a>
+ Deleted database rows will not be tracked in when Amazon Kendra checks for updated content.
+ The size of field names and values in a row of your database can't exceed 400KB.
+ If you have a large amount of data in your database data source, and do not want Amazon Kendra to index all your database content after the first sync, you can choose to sync only new, modified, or deleted documents.
+ As a best practice, provide Amazon Kendra with read-only database credentials.
+ As a best practice, avoid adding tables with sensitive data or personal identifiable information (PII).

# Amazon RDS (Oracle)
<a name="data-source-rds-oracle"></a>

**Note**  
Amazon RDS (Oracle) connector remains fully supported for existing customers through May 31, 2026. While this connector is no longer available for new users, current users can continue to use it without interruption. We are continuously evolving our connector portfolio to offer more scalable and customizable solutions. For future integrations, we recommend exploring the Amazon Kendra Custom Connector Framework[1], designed to support a broader range of enterprise use cases with enhanced flexibility.

Amazon RDS (Amazon Relational Database Service) is a web service that makes it easier to set up, operate, and scale a relational database in the AWS Cloud. If you are a Amazon RDS (Oracle) user, you can use Amazon Kendra to index your Amazon RDS (Oracle) data source. The Amazon Kendra Amazon RDS (Oracle) data source connector supports Amazon RDS Oracle Database 21c, Oracle Database 19c, Oracle Database 12c.

You can connect Amazon Kendra to your Amazon RDS (Oracle) data source using the [Amazon Kendra console](https://console.aws.amazon.com/kendra/) and the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) API.

For troubleshooting your Amazon Kendra Amazon RDS (Oracle) data source connector, see [Troubleshooting data sources](troubleshooting-data-sources.md).

**Topics**
+ [Supported features](#supported-features-rds-oracle)
+ [Prerequisites](#prerequisites-rds-oracle)
+ [Connection instructions](#data-source-procedure-rds-oracle)
+ [Notes](#rds-oracle-notes)

## Supported features
<a name="supported-features-rds-oracle"></a>
+ Field mappings
+ User context filtering
+ Inclusion/exclusion filters
+ Full and incremental content syncs
+ Virtual private cloud (VPC)

## Prerequisites
<a name="prerequisites-rds-oracle"></a>

Before you can use Amazon Kendra to index your Amazon RDS (Oracle) data source, make these changes in your Amazon RDS (Oracle) and AWS accounts.

**In Amazon RDS (Oracle), make sure you have:**
+ Noted your database user name and password.
**Important**  
As a best practice, provide Amazon Kendra with read-only database credentials.
+ Copied your database host url, port, and instance.
+ Checked each document is unique in Amazon RDS (Oracle) and across other data sources you plan to use for the same index. Each data source that you want to use for an index must not contain the same document across the data sources. Document IDs are global to an index and must be unique per index.

**In your AWS account, make sure you have:**
+ [Created an Amazon Kendra index](https://docs.aws.amazon.com/kendra/latest/dg/create-index.html) and, if using the API, noted the index ID.
+ [Created an IAM role](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds) for your data source and, if using the API, noted the ARN of the IAM role.
**Note**  
If you change your authentication type and credentials, you must update your IAM role to access the correct AWS Secrets Manager secret ID.
+ Stored your Amazon RDS (Oracle) authentication credentials in an AWS Secrets Manager secret and, if using the API, noted the ARN of the secret.
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).

If you don’t have an existing IAM role or secret, you can use the console to create a new IAM role and Secrets Manager secret when you connect your Amazon RDS (Oracle) data source to Amazon Kendra. If you are using the API, you must provide the ARN of an existing IAM role and Secrets Manager secret, and an index ID.

## Connection instructions
<a name="data-source-procedure-rds-oracle"></a>

To connect Amazon Kendra to your Amazon RDS (Oracle) data source you must provide details of your Amazon RDS (Oracle) credentials so that Amazon Kendra can access your data. If you have not yet configured Amazon RDS (Oracle) for Amazon Kendra see [Prerequisites](#prerequisites-rds-oracle).

------
#### [ Console ]

**To connect Amazon Kendra to Amazon RDS (Oracle)** 

1. Sign in to the AWS Management Console and open the [Amazon Kendra console](https://console.aws.amazon.com/kendra/).

1. From the left navigation pane, choose **Indexes** and then choose the index you want to use from the list of indexes.
**Note**  
You can choose to configure or edit your **User access control** settings under **Index settings**. 

1. On the **Getting started** page, choose **Add data source**.

1. On the **Add data source** page, choose **Amazon RDS (Oracle) connector**, and then choose **Add connector**. If using version 2 (if applicable), choose **Amazon RDS (Oracle) connector** with the "V2.0" tag.

1. On the **Specify data source details** page, enter the following information:

   1. In **Name and description**, for **Data source name**—Enter a name for your data source. You can include hyphens but not spaces.

   1. (Optional)** Description**—Enter an optional description for your data source.

   1. In **Default language**—Choose a language to filter your documents for the index. Unless you specify otherwise, the language defaults to English. Language specified in the document metadata overrides the selected language.

   1. In **Tags**, for **Add new tag**—Include optional tags to search and filter your resources or track your AWS costs.

   1. Choose **Next**.

1. On the **Define access and security** page, enter the following information:

   1. In **Source**, enter the following information:

   1.  **Host**— Enter the database host name.

   1.  **Port**— Enter the database port.

   1.  **Instance**— Enter the database instance.

   1. **Enable SSL certificate location**—Choose to enter the Amazon S3 path to your SSL certificate file.

   1. In **Authentication**—enter the following information:

      1. **AWS Secrets Manager secret**—Choose an existing secret or create a new Secrets Manager secret to store your Amazon RDS (Oracle) authentication credentials. If you choose to create a new secret an AWS Secrets Manager secret window opens.

        1. Enter following information in the **Create an AWS Secrets Manager secret window**:

           1. **Secret name**—A name for your secret. The prefix ‘AmazonKendra-Amazon RDS (Oracle)-’ is automatically added to your secret name.

           1. For **Database user name**, and **Password**—Enter the authentication credential values you copied from your database. 

        1. Choose **Save**.

   1. **Virtual Private Cloud (VPC)**—You can choose to use a VPC. If so, you must add **Subnets** and **VPC security groups**.

   1. **IAM role**—Choose an existing IAM role or create a new IAM role to access your repository credentials and index content.
**Note**  
IAM roles used for indexes cannot be used for data sources. If you are unsure if an existing role is used for an index or FAQ, choose **Create a new role** to avoid errors.

   1. Choose **Next**.

1. On the **Configure sync settings** page, enter the following information:

   1. In **Sync scope**, choose from the following options :
      + **SQL query**—Enter SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB. Amazon Kendra will crawl all database content that matches your query.
      + **Primary key column**—Provide the primary key for the database table. This identifies a table within your database.
      + **Title column**—Provide the name of the document title column within your database table.
      + **Body column**—Provide the name of the document body column within your database table.

   1. In **Additional configuration – *optional***, choose from the following options to sync specific content instead of syncing all files:
      + **Change-detecting columns**—Enter the names of the columns that Amazon Kendra will use to detect content changes. Amazon Kendra will re-index content when there is a change in any of these columns.
      + **Users' IDs column**—Enter the name of the column which contains User IDs to be allowed access to content.
      + **Groups column**—Enter the name of the column that contains groups to be allowed access to content.
      + **Source URLs column**—Enter the name of the column which contains Source URLs to be indexed.
      + **Time stamps column**—Enter the name of the column which contains time stamps. Amazon Kendra uses time stamp information to detect changes in your content and sync only changed content. 
      + **Time zones column**—Enter the name of the column which contains time zones for the content to be crawled.
      + **Time stamps format**—Enter the name of the column which contains time stamp formats to use to detect content changes and re-sync your content.

   1. **Sync mode**—Choose how you want to update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option.
      + Full sync: Freshly index all content, replacing existing content each time your data source syncs with your index.
      + New, modified sync: Index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.
      + New, modified, deleted sync: Index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

   1. In **Sync run schedule**, for **Frequency**—How often Amazon Kendra will sync with your data source.

   1. Choose **Next**.

1. On the **Set field mappings** page, enter the following information:

   1. Select from the generated default data source fields—**Document IDs**, **Document titles**, and **Source URLs**—you want to map to Amazon Kendra index.

   1.  **Add field**—To add custom data source fields to create an index field name to map to and the field data type.

   1. Choose **Next**.

1. On the **Review and create** page, check that the information you have entered is correct and then select **Add data source**. You can also choose to edit your information from this page. Your data source will appear on the **Data sources** page after the data source has been added successfully.

------
#### [ API ]

**To connect Amazon Kendra to Amazon RDS (Oracle)**

You must specify the following using the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) API:
+ **Data source**—Specify the data source type as `JDBC` when you use the [https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) JSON schema. Also specify the data source as `TEMPLATE` when you call the [https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html](https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html) API.
+ **Database type**—You must specify the database type as `oracle`.
+ **SQL query**—Specify SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB. Amazon Kendra will crawl all database content that matches your query.
+ **Sync mode**—Specify how Amazon Kendra should update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option. You can choose between:
  + `FORCED_FULL_CRAWL` to freshly index all content, replacing existing content each time your data source syncs with your index.
  + `FULL_CRAWL` to index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source’s mechanism for tracking content changes and index content that changed since the last sync.
  + `CHANGE_LOG` to index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source’s mechanism for tracking content changes and index content that changed since the last sync.
+ **Secret Amazon Resource Name (ARN)**—Provide the Amazon Resource Name (ARN) of an Secrets Manager secret that contains the authentication credentials you created in your Amazon RDS (Oracle) account. The secret is stored in a JSON structure with the following keys:

  ```
  {
      "user name": "database user name",
      "password": "password"
  }
  ```
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).
+ **IAM role**—Specify `RoleArn` when you call `CreateDataSource` to provide an IAM role with permissions to access your Secrets Manager secret and to call the required public APIs for the Amazon RDS (Oracle) connector and Amazon Kendra. For more information, see [IAM roles for Amazon RDS (Oracle) data sources](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds).

You can also add the following optional features:
+  **Virtual Private Cloud (VPC)**—Specify `VpcConfiguration` when you call `CreateDataSource`. For more information, see [Configuring Amazon Kendra to use an Amazon VPC](vpc-configuration.md).
+  **Inclusion and exclusion filters**—You can specify whether to include specific content using user IDs, groups, source URLs, time stamps, and time zones. 
+  **User context filtering and access control**—Amazon Kendra crawls the access control list (ACL) for your documents, if you have an ACL for your documents. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see [User context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#datasource-context-filter).
+  **Field mappings**—Choose to map your Amazon RDS (Oracle) data source fields to your Amazon Kendra index fields. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html).
**Note**  
The document body field or the document body equivalent for your documents is required in order for Amazon Kendra to search your documents. You must map your document body field name in your data source to the index field name `_document_body`. All other fields are optional.

For a list of other important JSON keys to configure, see [Amazon RDS (Oracle) template schema](ds-schemas.md#ds-rds-oracle-schema).

------

## Notes
<a name="rds-oracle-notes"></a>
+ Deleted database rows will not be tracked in when Amazon Kendra checks for updated content.
+ The size of field names and values in a row of your database can't exceed 400KB.
+ If you have a large amount of data in your database data source, and do not want Amazon Kendra to index all your database content after the first sync, you can choose to sync only new, modified, or deleted documents.
+ As a best practice, provide Amazon Kendra with read-only database credentials.
+ As a best practice, avoid adding tables with sensitive data or personal identifiable information (PII).

# Amazon RDS (PostgreSQL)
<a name="data-source-rds-postgresql"></a>

**Note**  
Amazon RDS (PostgreSQL) connector remains fully supported for existing customers through May 31, 2026. While this connector is no longer available for new users, current users can continue to use it without interruption. We are continuously evolving our connector portfolio to offer more scalable and customizable solutions. For future integrations, we recommend exploring the Amazon Kendra Custom Connector Framework[1], designed to support a broader range of enterprise use cases with enhanced flexibility.

Amazon RDS is a web service that makes it easier to set up, operate, and scale a relational database in the AWS Cloud. If you are a Amazon RDS user, you can use Amazon Kendra to index your Amazon RDS (PostgreSQL) data source. The Amazon Kendra Amazon RDS (PostgreSQL) data source connector supports PostgreSQL 9.6.

You can connect Amazon Kendra to your Amazon RDS (PostgreSQL) data source using the [Amazon Kendra console](https://console.aws.amazon.com/kendra/) and the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) API.

For troubleshooting your Amazon Kendra Amazon RDS (PostgreSQL) data source connector, see [Troubleshooting data sources](troubleshooting-data-sources.md).

**Topics**
+ [Supported features](#supported-features-rds-postgresql)
+ [Prerequisites](#prerequisites-rds-postgresql)
+ [Connection instructions](#data-source-procedure-rds-postgresql)
+ [Notes](#rds-postgresql-notes)

## Supported features
<a name="supported-features-rds-postgresql"></a>
+ Field mappings
+ User context filtering
+ Inclusion/exclusion filters
+ Full and incremental content syncs
+ Virtual private cloud (VPC)

## Prerequisites
<a name="prerequisites-rds-postgresql"></a>

Before you can use Amazon Kendra to index your Amazon RDS (PostgreSQL) data source, make these changes in your Amazon RDS (PostgreSQL) and AWS accounts.

**In Amazon RDS (PostgreSQL), make sure you have:**
+ Noted your database user name and password.
**Important**  
As a best practice, provide Amazon Kendra with read-only database credentials.
+ Copied your database host url, port, and instance. You can find this information on the Amazon RDS console.
+ Checked each document is unique in Amazon RDS (PostgreSQL) and across other data sources you plan to use for the same index. Each data source that you want to use for an index must not contain the same document across the data sources. Document IDs are global to an index and must be unique per index.

**In your AWS account, make sure you have:**
+ [Created an Amazon Kendra index](https://docs.aws.amazon.com/kendra/latest/dg/create-index.html) and, if using the API, noted the index ID.
+ [Created an IAM role](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds) for your data source and, if using the API, noted the ARN of the IAM role.
**Note**  
If you change your authentication type and credentials, you must update your IAM role to access the correct AWS Secrets Manager secret ID.
+ Stored your Amazon RDS (PostgreSQL) authentication credentials in an AWS Secrets Manager secret and, if using the API, noted the ARN of the secret.
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).

If you don’t have an existing IAM role or secret, you can use the console to create a new IAM role and Secrets Manager secret when you connect your Amazon RDS (PostgreSQL) data source to Amazon Kendra. If you are using the API, you must provide the ARN of an existing IAM role and Secrets Manager secret, and an index ID.

## Connection instructions
<a name="data-source-procedure-rds-postgresql"></a>

To connect Amazon Kendra to your Amazon RDS (PostgreSQL) data source you must provide details of your Amazon RDS (PostgreSQL) credentials so that Amazon Kendra can access your data. If you have not yet configured Amazon RDS (PostgreSQL) for Amazon Kendra see [Prerequisites](#prerequisites-rds-postgresql).

------
#### [ Console ]

**To connect Amazon Kendra to Amazon RDS (PostgreSQL)** 

1. Sign in to the AWS Management Console and open the [Amazon Kendra console](https://console.aws.amazon.com/kendra/).

1. From the left navigation pane, choose **Indexes** and then choose the index you want to use from the list of indexes.
**Note**  
You can choose to configure or edit your **User access control** settings under **Index settings**. 

1. On the **Getting started** page, choose **Add data source**.

1. On the **Add data source** page, choose **Amazon RDS (PostgreSQL) connector**, and then choose **Add connector**. If using version 2 (if applicable), choose **Amazon RDS (PostgreSQL) connector** with the "V2.0" tag.

1. On the **Specify data source details** page, enter the following information:

   1. In **Name and description**, for **Data source name**—Enter a name for your data source. You can include hyphens but not spaces.

   1. (Optional)** Description**—Enter an optional description for your data source.

   1. In **Default language**—Choose a language to filter your documents for the index. Unless you specify otherwise, the language defaults to English. Language specified in the document metadata overrides the selected language.

   1. In **Tags**, for **Add new tag**—Include optional tags to search and filter your resources or track your AWS costs.

   1. Choose **Next**.

1. On the **Define access and security** page, enter the following information:

   1. In **Source**, enter the following information:

   1.  **Host** – Enter the database host URL, for example: `http://instance URL.region.rds.amazonaws.com`.

   1.  **Port** – Enter the database port, for example, `5432`.

   1.  **Instance** – Enter the database instance, for example `postgres`.

   1. **Enable SSL certificate location**—Choose to enter the Amazon S3 path to your SSL certificate file.

   1. In **Authentication**—enter the following information:

      1. **AWS Secrets Manager secret**—Choose an existing secret or create a new Secrets Manager secret to store your Amazon RDS (PostgreSQL) authentication credentials. If you choose to create a new secret an AWS Secrets Manager secret window opens.

        1. Enter following information in the **Create an AWS Secrets Manager secret window**:

           1. **Secret name**—A name for your secret. The prefix ‘AmazonKendra-Amazon RDS (PostgreSQL)-’ is automatically added to your secret name.

           1. For **Database user name**, and **Password**—Enter the authentication credential values you copied from your database. 

        1. Choose **Save**.

   1. **Virtual Private Cloud (VPC)**—You can choose to use a VPC. If so, you must add **Subnets** and **VPC security groups**.

   1. **IAM role**—Choose an existing IAM role or create a new IAM role to access your repository credentials and index content.
**Note**  
IAM roles used for indexes cannot be used for data sources. If you are unsure if an existing role is used for an index or FAQ, choose **Create a new role** to avoid errors.

   1. Choose **Next**.

1. On the **Configure sync settings** page, enter the following information:

   1. In **Sync scope**, choose from the following options :
      + **SQL query**—Enter SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB SQL queries must be less than 32KB and not contain any semi-colons (;). Amazon Kendra will crawl all database content that matches your query.
      + **Primary key column**—Provide the primary key for the database table. This identifies a table within your database.
      + **Title column**—Provide the name of the document title column within your database table.
      + **Body column**—Provide the name of the document body column within your database table.

   1. In **Additional configuration – *optional***, choose from the following options to sync specific content instead of syncing all files:
      + **Change-detecting columns**—Enter the names of the columns that Amazon Kendra will use to detect content changes. Amazon Kendra will re-index content when there is a change in any of these columns.
      + **Users' IDs column**—Enter the name of the column which contains User IDs to be allowed access to content.
      + **Groups column**—Enter the name of the column that contains groups to be allowed access to content.
      + **Source URLs column**—Enter the name of the column which contains Source URLs to be indexed.
      + **Time stamps column**—Enter the name of the column which contains time stamps. Amazon Kendra uses time stamp information to detect changes in your content and sync only changed content. 
      + **Time zones column**—Enter the name of the column which contains time zones for the content to be crawled.
      + **Time stamps format**—Enter the name of the column which contains time stamp formats to use to detect content changes and re-sync your content.

   1. **Sync mode**—Choose how you want to update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option.
      + Full sync: Freshly index all content, replacing existing content each time your data source syncs with your index.
      + New, modified sync: Index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.
      + New, modified, deleted sync: Index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

   1. In **Sync run schedule**, for **Frequency**—How often Amazon Kendra will sync with your data source.

   1. Choose **Next**.

1. On the **Set field mappings** page, enter the following information:

   1. Select from the generated default data source fields—**Document IDs**, **Document titles**, and **Source URLs**—you want to map to Amazon Kendra index.

   1.  **Add field**—To add custom data source fields to create an index field name to map to and the field data type.

   1. Choose **Next**.

1. On the **Review and create** page, check that the information you have entered is correct and then select **Add data source**. You can also choose to edit your information from this page. Your data source will appear on the **Data sources** page after the data source has been added successfully.

------
#### [ API ]

**To connect Amazon Kendra to Amazon RDS (PostgreSQL)**

You must specify the following using the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) API:
+ **Data source**—Specify the data source type as `JDBC` when you use the [https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) JSON schema. Also specify the data source as `TEMPLATE` when you call the [https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html](https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html) API.
+ **Database type**—You must specify the database type as `postgresql`.
+ **SQL query**—Specify SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB. Amazon Kendra will crawl all database content that matches your query.
+ **Sync mode**—Specify how Amazon Kendra should update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option. You can choose between:
  + `FORCED_FULL_CRAWL` to freshly index all content, replacing existing content each time your data source syncs with your index.
  + `FULL_CRAWL` to index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source’s mechanism for tracking content changes and index content that changed since the last sync.
  + `CHANGE_LOG` to index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source’s mechanism for tracking content changes and index content that changed since the last sync.
+ **Secret Amazon Resource Name (ARN)**—Provide the Amazon Resource Name (ARN) of an Secrets Manager secret that contains the authentication credentials you created in your Amazon RDS (PostgreSQL) account. The secret is stored in a JSON structure with the following keys:

  ```
  {
      "user name": "database user name",
      "password": "password"
  }
  ```
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).
+ **IAM role**—Specify `RoleArn` when you call `CreateDataSource` to provide an IAM role with permissions to access your Secrets Manager secret and to call the required public APIs for the Amazon RDS (PostgreSQL) connector and Amazon Kendra. For more information, see [IAM roles for Amazon RDS (PostgreSQL) data sources](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds).

You can also add the following optional features:
+  **Virtual Private Cloud (VPC)**—Specify `VpcConfiguration` when you call `CreateDataSource`. For more information, see [Configuring Amazon Kendra to use an Amazon VPC](vpc-configuration.md).
+  **Inclusion and exclusion filters**—You can specify whether to include specific content using user IDs, groups, source URLs, time stamps, and time zones.
+  **User context filtering and access control**—Amazon Kendra crawls the access control list (ACL) for your documents, if you have an ACL for your documents. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see [User context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#datasource-context-filter).
+  **Field mappings**—Choose to map your Amazon RDS (PostgreSQL) data source fields to your Amazon Kendra index fields. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html).
**Note**  
The document body field or the document body equivalent for your documents is required in order for Amazon Kendra to search your documents. You must map your document body field name in your data source to the index field name `_document_body`. All other fields are optional.

For a list of other important JSON keys to configure, see [Amazon RDS (PostgreSQL) template schema](ds-schemas.md#ds-rds-postgresql-schema).

------

## Notes
<a name="rds-postgresql-notes"></a>
+ Deleted database rows will not be tracked in when Amazon Kendra checks for updated content.
+ The size of field names and values in a row of your database can't exceed 400KB.
+ If you have a large amount of data in your database data source, and do not want Amazon Kendra to index all your database content after the first sync, you can choose to sync only new, modified, or deleted documents.
+ As a best practice, provide Amazon Kendra with read-only database credentials.
+ As a best practice, avoid adding tables with sensitive data or personal identifiable information (PII).

# Amazon S3
<a name="data-source-s3"></a>

Amazon S3 is an object storage service that stores data as objects within buckets. You can use Amazon Kendra to index your Amazon S3 bucket repository of documents.

**Warning**  
Amazon Kendra doesn't use a bucket policy that grants permissions to an Amazon Kendra principal to interact with an S3 bucket. Instead, it uses IAM roles. Make sure that Amazon Kendra isn't included as a trusted member in your bucket policy to avoid any data security issues in accidentally granting permissions to arbitrary principals. However, you can add a bucket policy to use an Amazon S3 bucket across different accounts. For more information, see [Policies to use Amazon S3 across accounts](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds-s3-cross-accounts) (within the S3 IAM roles tab, under **IAM roles for data sources**). For information about IAM roles for S3 data sources, see [IAM roles](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds-s3).

**Note**  
Amazon Kendra now supports an upgraded Amazon S3 connector.  
The console has been automatically upgraded for you. Any new connectors you create in the console will use the upgraded architecture. If you use the API, you must now use the [https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html](https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html) object instead of the `S3DataSourceConfiguration` object to configure your connector.  
Connectors configured using the older console and API architecture will continue to function as configured. However, you won’t be able to edit or update them. If you want to edit or update your connector configuration, you must create a new connector.  
We recommended migrating your connector workflow to the upgraded version. Support for connectors configured using the older architecture is scheduled to end by June 2024.

You can connect to your Amazon S3 data source using the the [Amazon Kendra console](https://console.aws.amazon.com/kendra/) or the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html) API.

**Note**  
To generate a sync status report for your Amazon S3 data source, see [Troubleshooting data sources](https://docs.aws.amazon.com/kendra/latest/dg/troubleshooting-data-sources.html#troubleshooting-data-sources-sync-status-manifest).

For troubleshooting your Amazon Kendra S3 data source connector, see [Troubleshooting data sources](troubleshooting-data-sources.md).

**Topics**
+ [Supported features](#supported-features-s3)
+ [Prerequisites](#prerequisites-s3)
+ [Connection instructions](#data-source-procedure-s3)
+ [Creating an Amazon S3 data source](create-ds-s3.md)
+ [Amazon S3 document metadata](s3-metadata.md)
+ [Access control for Amazon S3 data sources](s3-acl.md)
+ [Using Amazon VPC with an Amazon S3 data source](s3-vpc-example-1.md)

## Supported features
<a name="supported-features-s3"></a>
+ Field mappings
+ User access control
+ Inclusion/exclusion filters
+ Full and incremental content syncs
+ Virtual private cloud (VPC)

## Prerequisites
<a name="prerequisites-s3"></a>

Before you can use Amazon Kendra to index your S3 data source, make these changes in your S3 and AWS accounts.

**In S3, make sure you have:**
+ Copied the name of your Amazon S3 bucket.
**Note**  
Your bucket must be in the same region as your Amazon Kendra index and your index must have permission to access the bucket that contains your documents.
+ Checked each document is unique in S3 and across other data sources you plan to use for the same index. Each data source that you want to use for an index must not contain the same document across the data sources. Document IDs are global to an index and must be unique per index.

**In your AWS account, make sure you have:**
+ [Created an Amazon Kendra index](https://docs.aws.amazon.com/kendra/latest/dg/create-index.html) and, if using the API, noted the index ID.
+ [Created an IAM role](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds) for your data source and, if using the API, noted the ARN of the IAM role.

If you don’t have an existing IAM role, you can use the console to create a new IAM role when you connect your S3 data source to Amazon Kendra. If you are using the API, you must provide the ARN of an existing IAM role and an index ID.

## Connection instructions
<a name="data-source-procedure-s3"></a>

To connect Amazon Kendra to your S3 data source, you must provide the necessary details of your S3 data source so that Amazon Kendra can access your data. If you have not yet configured S3 for Amazon Kendra, see [Prerequisites](#prerequisites-s3).

------
#### [ Console ]

**To connect Amazon Kendra to Amazon S3 ** 

1. Sign in to the AWS Management Console and open the [Amazon Kendra console](https://console.aws.amazon.com/kendra/).

1. From the left navigation pane, choose **Indexes** and then choose the index you want to use from the list of indexes.
**Note**  
You can choose to configure or edit your **User access control** settings under **Index settings**. 

1. On the **Getting started** page, choose **Add data source**.

1. On the **Add data source** page, choose **S3 connector**, and then choose **Add connector**. If using version 2 (if applicable), choose **S3 connector** with the "V2.0" tag.

1. On the **Specify data source details** page, enter the following information:

   1. In **Name and description**, for **Data source name**—Enter a name for your data source. You can include hyphens but not spaces.

   1. (Optional)** Description**—Enter an optional description for your data source.

   1. In **Default language**—Choose a language to filter your documents for the index. Unless you specify otherwise, the language defaults to English. Language specified in the document metadata overrides the selected language.

   1. In **Tags**, for **Add new tag**—Include optional tags to search and filter your resources or track your AWS costs.

   1. Choose **Next**.

1. On the **Define access and security** page, enter the following optional information:

   1. **IAM role**—Choose an existing IAM role or create a new IAM role to access your repository credentials and index content.
**Note**  
IAM roles used for indexes cannot be used for data sources. If you are unsure if an existing role is used for an index or FAQ, choose **Create a new role** to avoid errors.

   1. **Virtual Private Cloud (VPC)**—You can choose to use a VPC. If so, you must add **Subnets** and **VPC security groups**.

   1. Choose **Next**.

1. On the **Configure sync settings** page, enter the following information:

   1. For **Data source location**—Specify the path to the Amazon S3 bucket where your data is stored. Select **Browse S3** to choose your S3 bucket.

   1. For **Maximum file size**—Specify a limit in MB to only crawl files under this limit. The maximum file size Amazon Kendra can allow is 50 MB.

   1. For (Optional) **Metadata files prefix folder location**—Specify the path to the folder in which your fields/attributes and other document metadata is stored. Select **Browse S3** to locate your metadata folder.

   1. For (Optional) **Access control list configuration file location**—Specify the path to the file that contains a JSON structure of your users and their access to documents. Select **Browse S3** to locate your ACL file.

   1. (Optional) **Select decryption key**—Select to use a decryption key. You can choose to use an existing AWS KMS key.

   1. For (Optional) **Additional configuration**—Add patterns to include or exclude certain files. All paths are relative to the data source location S3 bucket.

   1. **Sync mode**—Choose how you want to update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option.
      + Full sync: Freshly index all content, replacing existing content each time your data source syncs with your index.
      + New, modified, deleted sync: Index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

   1. In **Sync run schedule**, for **Frequency**—Choose how often to sync your data source content and update your index.

   1. Choose **Next**.

1. On the **Set field mappings** page, enter the following optional information:

   1. **Default field mappings**—Select from the Amazon Kendra generated default data source fields you want to map to your index. 

   1.  **Add field**—Choose to add custom data source fields to create an index field name to map to and the field data type.

   1. Choose **Next**.

1. On the **Review and create** page, check that the information you have entered is correct and then select **Add data source**. You can also choose to edit your information from this page. Your data source will appear on the **Data sources** page after the data source has been added successfully.

------
#### [ API ]

**To connect Amazon Kendra to Amazon S3**

You must specify a JSON of the [data source schema](https://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html) using the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html) API. You must provide the following information:
+ **Data source**—Specify the data source type as `S3` when you use the [https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) JSON schema. Also specify the data source as `TEMPLATE` when you call the [https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html](https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html) API.
+ **BucketName**—The name of the bucket that contains the documents.
+ **Sync mode**—Specify how Amazon Kendra should update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option. You can choose between:
  + `FORCED_FULL_CRAWL` to freshly index all content, replacing existing content each time your data source syncs with your index.
  + `FULL_CRAWL` to index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source’s mechanism for tracking content changes and index content that changed since the last sync.
+ **IAM role**—Specify `RoleArn` when you call `CreateDataSource` to provide an IAM role with permissions to access your Secrets Manager secret and to call the required public APIs for the S3 connector and Amazon Kendra. For more information, see [IAM roles for S3 data sources](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds).

You can also add the following optional features:
+  **Virtual Private Cloud (VPC)**—Specify `VpcConfiguration` when you call `CreateDataSource`. For more information, see [Configuring Amazon Kendra to use an Amazon VPC](vpc-configuration.md).
+  **Inclusion and exclusion filters**—Specify whether to include or exclude certain file names, file types, file paths. You use glob patterns (patterns that can expand a wildcard pattern into a list of path names that match the given pattern). For examples, see [Use of Exclude and Include Filters](https://docs.aws.amazon.com/cli/latest/reference/s3/#use-of-exclude-and-include-filters) in the AWS CLI Command Reference. 
+ **Document metadata and access control configuration**—Add document metadata and access control files that contain information such as the source URI, document author, or custom document attributes/fields, and your users and which documents they can access. Each metadata file contains metadata about a single document.
+  **Field mappings**—Choose to map your S3 data source fields to your Amazon Kendra index fields. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html).
**Note**  
The document body field or the document body equivalent for your documents is required in order for Amazon Kendra to search your documents. You must map your document body field name in your data source to the index field name `_document_body`. All other fields are optional.

For a list of other important JSON keys to configure, see [S3 template schema](https://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html#ds-s3-schema).

------

### Learn more
<a name="s3-learn-more"></a>

To learn more about integrating Amazon Kendra with your S3 data source, see:
+ [Search for answers accurately using Amazon Kendra S3 Connector with VPC support](https://aws.amazon.com/blogs/machine-learning/search-for-answers-accurately-using-amazon-kendra-s3-connector-with-vpc-support/)

# Creating an Amazon S3 data source
<a name="create-ds-s3"></a>

The following examples demonstrate creating an Amazon S3 data source. The examples assume that you have already created an index and an IAM role with permission to read the data from the index. For more information about the IAM role, see [IAM access roles](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds). For more information about creating an index, see [Creating an index](https://docs.aws.amazon.com/kendra/latest/dg/create-index.html).

------
#### [ CLI ]

```
aws kendra create-data-source \
 --index-id index ID \
 --name example-data-source \
 --type S3 \
 --configuration '{"S3Configuration":{"BucketName":"bucket name"}}' 
 --role-arn 'arn:aws:iam::account id:role:/role name
```

------
#### [ Python ]

The following snippet of Python code creates an Amazon S3 data source. For the complete example, see [Getting started (AWS SDK for Python (Boto3))](gs-python.md).

```
print("Create an Amazon S3 data source.")
    
    # Provide a name for the data source
    name = "getting-started-data-source"
    # Provide an optional description for the data source
    description = "Getting started data source."
    # Provide the IAM role ARN required for data sources
    role_arn = "arn:aws:iam::${accountID}:role/${roleName}"
    # Provide the data soource connection information
    s3_bucket_name = "S3-bucket-name"
    type = "S3"
    # Configure the data source
    configuration = {"S3DataSourceConfiguration":
        {
            "BucketName": s3_bucket_name
        }
    }

    data_source_response = kendra.create_data_source(
        Configuration = configuration,
        Name = name,
        Description = description,
        RoleArn = role_arn,
        Type = type,
        IndexId = index_id
    )
```

------

It can take some time to create your data source. You can monitor the progress by using the [DescribeDataSource](https://docs.aws.amazon.com/kendra/latest/APIReference/API_DescribeDataSource.html) API. When the data source status is `ACTIVE` the data source is ready to use. 

The following examples demonstrate getting the status of a data source.

------
#### [ CLI ]

```
aws kendra describe-data-source \
 --index-id index ID \
 --id data source ID
```

------
#### [ Python ]

The following snippet of Python code gets information about an S3 data source. For the complete example, see [Getting started (AWS SDK for Python (Boto3))](gs-python.md).

```
print("Wait for Amazon Kendra to create the data source.")

    while True:
        data_source_description = kendra.describe_data_source(
            Id = "data-source-id",
            IndexId = "index-id"
        )
        status = data_source_description["Status"]
        print(" Creating data source. Status: "+status)
        time.sleep(60)
        if status != "CREATING":
            break
```

------

This data source doesn't have a schedule, so it doesn't run automatically. To index the data source, you call [StartDataSourceSyncJob](https://docs.aws.amazon.com/kendra/latest/APIReference/API_StartDataSourceSyncJob.html) to synchronize the index with the data source.

The following examples demonstrate synchronizing a data source.

------
#### [ CLI ]

```
aws kendra start-data-source-sync-job \
 --index-id index ID \
 --id data source ID
```

------
#### [ Python ]

The following snippet of Python code synchronizes an Amazon S3 data source. For the complete example, see [Getting started (AWS SDK for Python (Boto3))](gs-python.md).

```
print("Synchronize the data source.")

    sync_response = kendra.start_data_source_sync_job(
        Id = "data-source-id",
        IndexId = "index-id"
    )
```

------

# Amazon S3 document metadata
<a name="s3-metadata"></a>

You can add metadata, additional information about a document, to documents in an Amazon S3 bucket using a metadata file. Each metadata file is associated with an indexed document. 

Your metadata files must be stored in the same bucket as your indexed files. You can specify a location within the bucket for your metadata files using the console or the `S3Prefix` field of the `DocumentsMetadataConfiguration` parameter when you create an Amazon S3 data source. If you don't specify an Amazon S3 prefix, your metadata files must be stored in the same location as your indexed documents.

If you specify an Amazon S3 prefix for your metadata files, they are in a directory structure parallel to your indexed documents. Amazon Kendra looks only in the specified directory for your metadata. If the metadata isn't read, check that the directory location matches the location of your metadata.

The following examples show how the indexed document location maps to the metadata file location. Note that the document's Amazon S3 key is appended to the metadata's Amazon S3 prefix and then suffixed with `.metadata.json` to form the metadata file's Amazon S3 path. The combined Amazon S3 key, with the metadata's Amazon S3 prefix and `.metadata.json` suffix must be no more than a total of 1024 characters. It is recommended that you keep your Amazon S3 key below 1000 characters to account for addtional characters when combining your key with the prefix and suffix.

```
Bucket name:
     s3://bucketName
Document path:
     documents
Metadata path:
     none
File mapping
     s3://bucketName/documents/file.txt -> 
        s3://bucketName/documents/file.txt.metadata.json
```

```
Bucket name:
     s3://bucketName
Document path:
     documents/legal
Metadata path:
     metadata
File mapping
     s3://bucketName/documents/legal/file.txt -> 
        s3://bucketName/metadata/documents/legal/file.txt.metadata.json
```

Your document metadata is defined in a JSON file. The file must be a UTF-8 text file without a BOM marker. The file name of the JSON file must be `<document>.<extension>.metadata.json`. In this example, "document" is the name of the document that the metadata applies to and "extension" is the file extension for the document. The document ID must be unique in `<document>.<extension>.metadata.json`.

The content of the JSON file follows this template. All of the attributes/fields are optional, so it's not necessary to include all attributes. You must provide a value for each attribute you want to include; the value cannot be empty. If you don't specify the `_source_uri`, then the links returned by Amazon Kendra in the search results point to the Amazon S3 bucket that contains the document. `DocumentId` is mapped to the field `s3_document_id` and is the absolute path to the document in S3.

```
{
    "DocumentId": "S3 document ID, the S3 path to doc",
    "Attributes": {
        "_category": "document category",
        "_created_at": "ISO 8601 encoded string",
        "_last_updated_at": "ISO 8601 encoded string",
        "_source_uri": "document URI",
        "_version": "file version",
        "_view_count": number of times document has been viewed,
        "custom attribute key": "custom attribute value",
        additional custom attributes
    },
    "AccessControlList": [
         {
             "Name": "user name",
             "Type": "GROUP | USER",
             "Access": "ALLOW | DENY"
         }
    ],
    "Title": "document title",
    "ContentType": "For example HTML | PDF. For supported content types, see [Types of documents](https://docs.aws.amazon.com/kendra/latest/dg/index-document-types.html)."
}
```

The `_created_at` and `_last_updated_at` metadata fields are ISO 8601 encoded dates. For example, 2012-03-25T12:30:10\$101:00 is the ISO 8601 date-time format for March 25, 2012, at 12:30PM (plus 10 seconds) in the Central European Time time zone.

You can add additional information to the `Attributes` field about a document that you use to filter queries or to group query responses. For more information, see [Creating custom document fields](custom-attributes.md).

You can use the `AccessControlList` field to filter the response from a query. This way, only certain users and groups have access to documents. For more information, see [Filtering on user context](user-context-filter.md).

# Access control for Amazon S3 data sources
<a name="s3-acl"></a>

You can control access to documents in an Amazon S3 data source using a configuration file. You specify the file in the console or as the `AccessControlListConfiguration` parameter when you call the [CreateDataSource](https://docs.aws.amazon.com/kendra/latest/APIReference/API_CreateDataSource.html) or [UpdateDataSource](https://docs.aws.amazon.com/kendra/latest/APIReference/API_UpdateDataSource.html) API.

The configuration file contains a JSON structure that identifies an S3 prefix and lists the access settings for the prefix. The prefix can be a path, or it can be an individual file. If the prefix is a path, the access settings apply to all of the files in that path. There is a maximum number of S3 prefixes in the JSON configuration file and a default maximum file size. For more information, see [Quotas for Amazon Kendra](quotas.md)

You can specify both users and groups in the access settings. When you query the index, you specify user and group information. For more information, see [Filtering by user attribute](user-context-filter.md#context-filter-attribute).

The JSON structure for the configuration file must be in the following format:

```
[
    {
        "keyPrefix": "s3://BUCKETNAME/prefix1/",
        "aclEntries": [
            {
                "Name": "user1",
                "Type": "USER",
                "Access": "ALLOW"
            },
            {
                "Name": "group1",
                "Type": "GROUP",
                "Access": "DENY"
            }
        ]
    },
    {
        "keyPrefix": "s3://prefix2",
        "aclEntries": [
            {
                "Name": "user2",
                "Type": "USER",
                "Access": "ALLOW"
            },
            {
                "Name": "user1",
                "Type": "USER",
                "Access": "DENY"
            },
            {
                "Name": "group1",
                "Type": "GROUP",
                "Access": "DENY"
            }
        ]
    }
]
```

# Using Amazon VPC with an Amazon S3 data source
<a name="s3-vpc-example-1"></a>

This topic provides a step-by-step example that shows how to connect to an Amazon S3 bucket by using an Amazon S3 connector through Amazon VPC. The example assumes that you're starting with an existing S3 bucket. We recommend that you upload just a few documents to your S3 bucket to test the example.

You can connect Amazon Kendra to your Amazon S3 bucket through Amazon VPC. To do so, you must specify the Amazon VPC subnet and Amazon VPC security groups when creating your Amazon S3 data source connector.

**Important**  
So that an Amazon Kendra Amazon S3 connector can access your Amazon S3 bucket, make sure that you have assigned an Amazon S3 endpoint to your virtual private cloud (VPC).

For Amazon Kendra to sync documents from your Amazon S3 bucket through Amazon VPC, you must complete the following steps:
+ Set up an Amazon S3 endpoint for Amazon VPC. For more information about how to set up an Amazon S3 endpoint, see [Gateway endpoints for Amazon S3](https://docs.aws.amazon.com/vpc/latest/privatelink/vpc-endpoints-s3.html) in the *AWS PrivateLink Guide*.
+ (Optional) Checked your Amazon S3 bucket policies to make sure that the Amazon S3 bucket is accessible from the virtual private cloud (VPC) that you assigned to Amazon Kendra. For more information, see [Controlling access from VPC endpoints with bucket policies](https://docs.aws.amazon.com/AmazonS3/latest/userguide/example-bucket-policies-vpc-endpoint.html) in the *Amazon S3 User Guide*

**Topics**
+ [Step 1: Configure an Amazon VPC](#s3-configure-vpc)
+ [(Optional) Step 2: Configure Amazon S3 bucket policy](#s3-configure-bucket-policy)
+ [Step 3: Create a test Amazon S3 data source connector](#s3-connect-vpc)

## Step 1: Configure an Amazon VPC
<a name="s3-configure-vpc"></a>

Create a VPC network including a private subnet with an Amazon S3 gateway endpoint and a security group for Amazon Kendra to use later.

**To configure a VPC with a private subnet, an S3 endpoint, and a security group**

1. Sign in to the AWS Management Console and open the Amazon VPC console at [https://console.aws.amazon.com/vpc/](https://console.aws.amazon.com/vpc/).

1. **Create a VPC with a private subnet and an S3 endpoint for Amazon Kendra to use:**

   From the navigation pane, choose **Your VPCs**, and then choose **Create VPC**.

   1. For **Resources to create**, choose **VPC and more**.

   1. For **Name tag**, enable **Auto-generate**, then enter **kendra-s3-example**.

   1. For **IPv4 / IPv6 CIDR block**, keep the default values.

   1. For **Number of Availability Zones (AZs)**, choose **number 1**.

   1. Select **Customize AZs**, and then select an Availability Zone from the **First availability zone** list.

      Amazon Kendra only supports a specific set of Availability Zones.

   1. For **Number of public subnets**, choose **number 0**.

   1. For **Number of private subnets**, choose **number 1**.

   1. For **NAT gateways**, choose **None**.

   1. For **VPC endpoints**, choose **Amazon S3 gateway.**.

   1. Leave the rest of the values at their default settings.

   1. Select **Create VPC**.

      Wait until the **Create VPC** workflow finishes. Then, choose **View VPC** to check the **VPC** you just created.

   You have now created a VPC network with a private subnet, which does not have access to the public internet.

1. **Copy your VPC endpoint ID of your Amazon S3 endpoint:**

   1. From the navigation pane, choose **Endpoints**.

   1. In the **Endpoints** list, find the Amazon S3 endpoint `kendra-s3-example-vpce-s3` that you just created together with your VPC.

   1. Make a note of the **VPC endpoint ID**.

   You have now created an Amazon S3 gateway endpoint to access your Amazon S3 bucket through a subnet.

1. **Create a **Security Group** for Amazon Kendra to use:**

   1. From the navigation pane, choose **Security Groups**, then select **Create security group**.

   1. For **Security group name**, enter **s3-data-source-security-group**.

   1. Choose your VPC from the **Amazon VPC** list.

   1. Leave **inbound rules** and **outbound rules** as the default.

   1. Choose **Create security group**.

   You have now created a VPC security group.

You assign the subnet and security group that you created to your Amazon Kendra Amazon S3 data source connector during the connector configuration process.

## (Optional) Step 2: Configure Amazon S3 bucket policy
<a name="s3-configure-bucket-policy"></a>

In this optional step, learn how to configure an Amazon S3 bucket policy so that your Amazon S3 bucket is only accessible from the VPC that you assign to Amazon Kendra.

Amazon Kendra uses IAM roles to access your Amazon S3 bucket and doesn't require that you configure an Amazon S3 bucket policy. However, you might find it useful to create a bucket policy if you want to configure an Amazon S3 connector using an Amazon S3 bucket that has existing policies restricting access to it from the public internet.

**To configure your Amazon S3 bucket policy**

1. Open the Amazon S3 console at [https://console.aws.amazon.com/s3/](https://console.aws.amazon.com/s3/).

1. From the navigation pane, choose **Buckets**.

1. Choose the name of the Amazon S3 bucket that you want to sync with Amazon Kendra.

1. Choose the **Permissions** tab, scroll down to **Bucket policy**, and then click on **Edit**.

1. Add or modify your bucket policy to allow access only from the VPC endpoint that you created.

   The following is an example bucket policy. Replace *`bucket-name`* and *`vpce-id`* with your Amazon S3 bucket name and the Amazon S3 endpoint ID that you noted earlier.

1. Select **Save changes**.

Your S3 bucket is now accessible only from the specific VPC that you created.

## Step 3: Create a test Amazon S3 data source connector
<a name="s3-connect-vpc"></a>

To test your Amazon VPC configuration, create an Amazon S3 connector. Then, configure it with the VPC that you created by following the steps outlined in [Amazon S3](https://docs.aws.amazon.com/kendra/latest/dg/data-source-s3.html).

For Amazon VPC configuration values, choose the values that you created during this example:
+ **Amazon VPC(VPC)** – `kendra-s3-example-vpc`
+ **Subnets** – `kendra-s3-example-subnet-private1-[availability zone]`
+ **Security groups** – `s3-data-source-security-group`

Wait for your connector to finish creating. After the Amazon S3 connector has been created, choose **Sync now** to initiate a sync.

It might take several minutes to several hours to finish the sync, depending on how many documents are in your Amazon S3 bucket. To test the example, we recommend that you upload just a few documents to your S3 bucket. If your configuration is correct, you should eventually see a **Sync status** of **Completed**.

If you encounter any errors, see [Troubleshooting Amazon VPC connection](https://docs.aws.amazon.com/kendra/latest/dg/vpc-connector-troubleshoot.html).

# Amazon Kendra Web Crawler
<a name="data-source-web-crawler"></a>

You can use Amazon Kendra Web Crawler to crawl and index web pages.

You can only crawl public facing websites or internal company websites that use the secure communication protocol Hypertext Transfer Protocol Secure (HTTPS). If you receive an error when crawling a website, it could be that the website is blocked from crawling. To crawl internal websites, you can set up a web proxy. The web proxy must be public facing. You can also use authentication to access and crawl websites.

*When selecting websites to index, you must adhere to the [Amazon Acceptable Use Policy](https://aws.amazon.com/aup/) and all other Amazon terms. Remember that you must only use Amazon Kendra Web Crawler to index your own web pages, or web pages that you have authorization to index. To learn how to stop Amazon Kendra Web Crawler from indexing your website(s), please see [Configuring the `robots.txt` file for Amazon Kendra Web Crawler](stop-web-crawler.md).*

**Note**  
Abusing Amazon Kendra Web Crawler to aggressively crawl websites or web pages you don't own is **not** considered acceptable use.

Amazon Kendra has two versions of the web crawler connector. Supported features of each version include:

**Amazon Kendra Web Crawler connector v1.0 / [https://docs.aws.amazon.com/kendra/latest/dg/API_WebCrawlerConfiguration.html](https://docs.aws.amazon.com/kendra/latest/dg/API_WebCrawlerConfiguration.html) API**
+ Web proxy
+ Inclusion/exclusion filters

**Amazon Kendra Web Crawler connector v2.0 / [https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) API**
+ Field mappings
+ Inclusion/exclusion filters
+ Full and incremental content syncs
+ Web proxy
+ Basic, NTLM/Kerberos, SAML, and form authentication for your websites
+ Virtual private cloud (VPC)

**Important**  
Web Crawler v2.0 connector creation is not supported by CloudFormation. Use the Web Crawler v1.0 connector if you need CloudFormation support.

For troubleshooting your Amazon Kendra web crawler data source connector, see [Troubleshooting data sources](troubleshooting-data-sources.md).

**Topics**
+ [Amazon Kendra Web Crawler connector v1.0](data-source-v1-web-crawler.md)
+ [Amazon Kendra Web Crawler connector v2.0](data-source-v2-web-crawler.md)
+ [Configuring the `robots.txt` file for Amazon Kendra Web Crawler](stop-web-crawler.md)

# Amazon Kendra Web Crawler connector v1.0
<a name="data-source-v1-web-crawler"></a>

You can use Amazon Kendra Web Crawler to crawl and index web pages.

You can only crawl public facing websites and websites that use the secure communication protocol Hypertext Transfer Protocol Secure (HTTPS). If you receive an error when crawling a website, it could be that the website is blocked from crawling. To crawl internal websites, you can set up a web proxy. The web proxy must be public facing.

*When selecting websites to index, you must adhere to the [Amazon Acceptable Use Policy](https://aws.amazon.com/aup/) and all other Amazon terms. Remember that you must only use Amazon Kendra Web Crawler to index your own web pages, or web pages that you have authorization to index. To learn how to stop Amazon Kendra Web Crawler from indexing your website(s), please see [Configuring the `robots.txt` file for Amazon Kendra Web Crawler](stop-web-crawler.md).*

**Note**  
Abusing Amazon Kendra Web Crawler to aggressively crawl websites or web pages you don't own is **not** considered acceptable use.

For troubleshooting your Amazon Kendra web crawler data source connector, see [Troubleshooting data sources](troubleshooting-data-sources.md).

**Topics**
+ [Supported features](#supported-features-v1-web-crawler)
+ [Prerequisites](#prerequisites-v1-web-crawler)
+ [Connection instructions](#data-source-v1-procedure-web-crawler)
+ [Learn more](#web-crawler-learn-more)

## Supported features
<a name="supported-features-v1-web-crawler"></a>
+ Web proxy
+ Inclusion/exclusion filters

## Prerequisites
<a name="prerequisites-v1-web-crawler"></a>

Before you can use Amazon Kendra to index your websites, check the details of your websites and AWS accounts.

**For your websites, make sure you have:**
+ Copied the seed or sitemap URLs of the websites you want to index.
+ **For websites that require basic authentication**: Noted the user name and password, and copied the host name of the website and the port number.
+ **Optional:** Copied the host name of the website and the port number if you want to use a web proxy to connect to internal websites you want to crawl. The web proxy must be public facing. Amazon Kendra supports connecting to web proxy servers that are backed by basic authentication or you can connect with no authentication.
+ Checked each web page document you want to index is unique and across other data sources you plan to use for the same index. Each data source that you want to use for an index must not contain the same document across the data sources. Document IDs are global to an index and must be unique per index.

**In your AWS account, make sure you have:**
+ [Created an Amazon Kendra index](https://docs.aws.amazon.com/kendra/latest/dg/create-index.html) and, if using the API, noted the index ID.
+ [Created an IAM role](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds) for your data source and, if using the API, noted the ARN of the IAM role.
**Note**  
If you change your authentication type and credentials, you must update your IAM role to access the correct AWS Secrets Manager secret ID.
+ For websites that require authentication, or if using a web proxy with authentication, stored your authentication credentials in an AWS Secrets Manager secret and, if using the API, noted the ARN of the secret.
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).

If you don't have an existing IAM role or secret, you can use the console to create a new IAM role and Secrets Manager secret when you connect your web crawler data source to Amazon Kendra. If you are using the API, you must provide the ARN of an existing IAM role and Secrets Manager secret, and an index ID.

## Connection instructions
<a name="data-source-v1-procedure-web-crawler"></a>

To connect Amazon Kendra to your web crawler data source, you must provide the necessary details of your web crawler data source so that Amazon Kendra can access your data. If you have not yet configured web crawler for Amazon Kendra see [Prerequisites](#prerequisites-v1-web-crawler).

------
#### [ Console ]

**To connect Amazon Kendra to web crawler** 

1. Sign in to the AWS Management Console and open the [Amazon Kendra console](https://console.aws.amazon.com/kendra/).

1. From the left navigation pane, choose **Indexes** and then choose the index you want to use from the list of indexes.
**Note**  
You can choose to configure or edit your **User access control** settings under **Index settings**. 

1. On the **Getting started** page, choose **Add data source**.

1. On the **Add data source** page, choose **web crawler connector**, and then choose **Add connector**. If using version 2 (if applicable), choose **web crawler connector** with the "V2.0" tag.

1. On the **Specify data source details** page, enter the following information:

   1. In **Name and description**, for **Data source name**—Enter a name for your data source. You can include hyphens but not spaces.

   1. (Optional)** Description**—Enter an optional description for your data source.

   1. In **Default language**—Choose a language to filter your documents for the index. Unless you specify otherwise, the language defaults to English. Language specified in the document metadata overrides the selected language.

   1. In **Tags**, for **Add new tag**—Include optional tags to search and filter your resources or track your AWS costs.

   1. Choose **Next**.

1. On the **Define access and security** page, enter the following information:

   1. For **Source**, choose between **Source URLs** and **Source sitemaps** depending on your use case and enter the values for each.

      You can add up to 10 source URLs and three sitemaps.
**Note**  
If you want to crawl a sitemap, check that the base or root URL is the same as the URLs listed on your sitemap page. For example, if your sitemap URL is *https://example.com/sitemap-page.html*, the URLs listed on this sitemap page should also use the base URL "https://example.com/".

   1. (Optional) For **Web proxy**— enter the following information:

      1. **Host name**—The host name where web proxy is required.

      1. **Port number**—The port used by the host URL transport protocol. The port number should be a numeric value between 0 and 65535.

      1. For **Web proxy credentials**—If your web proxy connection requires authentication, choose an existing secret or create a new secret to store your authentication credentials. If you choose to create a new secret an AWS Secrets Manager secret window opens.

      1. Enter the following information in the **Create an AWS Secrets Manager Secrets Manager secret window**:

         1. **Secret name**—A name for your secret. The prefix ‘AmazonKendra-WebCrawler-’ is automatically added to your secret name.

         1. For **User name** and **Password**—Enter these basic authentication credentials for your websites.

         1. Choose **Save**.

   1. (Optional) **Hosts with authentication**—Select to add additional hosts with authentication.

   1. **IAM role**—Choose an existing IAM role or create a new IAM role to access your repository credentials and index content.
**Note**  
IAM roles used for indexes cannot be used for data sources. If you are unsure if an existing role is used for an index or FAQ, choose **Create a new role** to avoid errors.

   1. Choose **Next**.

1. On the **Configure sync settings** page, enter the following information:

   1. **Crawl range**—Choose the kind of web pages you want to crawl.

   1. **Crawl depth**—Select number of levels from the seed URL that Amazon Kendra should crawl.

   1. **Advanced crawl settings** and **Additional configuration**enter the following information:

      1. **Maximum file size**—The maximum web page or attachment size to crawl. Minimum 0.000001 MB (1 byte). Maximum 50 MB.

      1. **Maximum links per page**—The maximum number of links crawled per page. Links are crawled in order of appearance. Minimum 1 link/page. Maximum 1000 links/page.

      1. **Maximum throttling**—The maximum number of URLs crawled per host name per minute. Minimum 1 URLs/host name/minute. Maximum 300 URLs/host name/minute.

      1. **Regex patterns**—Add regular expression patterns to include or exclude certain URLs. You can add up to 100 patterns.

   1. In **Sync run schedule**, for **Frequency**—Choose how often Amazon Kendra will sync with your data source.

   1. Choose **Next**.

1. On the **Review and create** page, check that the information you have entered is correct and then select **Add data source**. You can also choose to edit your information from this page. Your data source will appear on the **Data sources** page after the data source has been added successfully.

------
#### [ API ]

**To connect Amazon Kendra to web crawler**

You must specify the following using the [WebCrawlerConfiguration](https://docs.aws.amazon.com/kendra/latest/dg/API_WebCrawlerConfiguration.html) API:
+ **URLs**—Specify the seed or starting point URLs of the websites or the sitemap URLs of the websites you want to crawl using [https://docs.aws.amazon.com/kendra/latest/dg/API_SeedUrlConfiguration.html](https://docs.aws.amazon.com/kendra/latest/dg/API_SeedUrlConfiguration.html) and [https://docs.aws.amazon.com/kendra/latest/dg/API_SiteMapsConfiguration.html](https://docs.aws.amazon.com/kendra/latest/dg/API_SiteMapsConfiguration.html).
**Note**  
If you want to crawl a sitemap, check that the base or root URL is the same as the URLs listed on your sitemap page. For example, if your sitemap URL is *https://example.com/sitemap-page.html*, the URLs listed on this sitemap page should also use the base URL "https://example.com/".
+ **Secret Amazon Resource Name (ARN)**—If a website requires basic authentication, you provide the host name, port number and a secret that stores your basic authentication credentials of your user name and password. You provide the secret ARN using the [https://docs.aws.amazon.com/kendra/latest/dg/API_AuthenticationConfiguration.html](https://docs.aws.amazon.com/kendra/latest/dg/API_AuthenticationConfiguration.html) API. The secret is stored in a JSON structure with the following keys:

  ```
  {
      "username": "user name",
      "password": "password"
  }
  ```

  You can also provide web proxy credentials using an AWS Secrets Manager secret. You use the [https://docs.aws.amazon.com/kendra/latest/dg/API_ProxyConfiguration.html](https://docs.aws.amazon.com/kendra/latest/dg/API_ProxyConfiguration.html) API to provide the website host name and port number, and optionally the secret that stores your web proxy credentials.
+ **IAM role**—Specify `RoleArn` when you call `CreateDataSource` to provide an IAM role with permissions to access your Secrets Manager secret and to call the required public APIs for the web crawler connector and Amazon Kendra. For more information, see [IAM roles for web crawler data sources](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds).

You can also add the following optional features:
+ **Crawl mode**—Choose whether to crawl website host names only, or host names with subdomains, or also crawl other domains the web pages link to.
+ The 'depth' or number of levels from the seed level to crawl. For example, the seed URL page is depth 1 and any hyperlinks on this page that are also crawled are depth 2.
+ The maximum number of URLs on a single web page to crawl.
+ The maximum size in MB of a web page to crawl.
+ The maximum number of URLs crawled per website host per minute.
+ The web proxy host and port number to connect to and crawl internal websites. For example, the host name of *https://a.example.com/page1.html* is "a.example.com" and the port number is is 443, the standard port for HTTPS. If web proxy credentials are required to connect to a website host, you can create an AWS Secrets Manager that stores the credentials.
+ The authentication information to access and crawl websites that require user authentication.
+ You can extract HTML meta tags as fields using the *Custom Document Enrichment* tool. For more information, see [Customizing document metadata during the ingestion process](https://docs.aws.amazon.com/kendra/latest/dg/custom-document-enrichment.html). For an example of extracting HTML meta tags, see [CDE examples](https://github.com/aws-samples/amazon-kendra-cde-examples).
+  **Inclusion and exclusion filters**—Specify whether to include or exclude certain URLs.
**Note**  
Most data sources use regular expression patterns, which are inclusion or exclusion patterns referred to as filters. If you specify an inclusion filter, only content that matches the inclusion filter is indexed. Any document that doesn’t match the inclusion filter isn’t indexed. If you specify an inclusion and exclusion filter, documents that match the exclusion filter are not indexed, even if they match the inclusion filter.

------

## Learn more
<a name="web-crawler-learn-more"></a>

To learn more about integrating Amazon Kendra with your web crawler data source, see:
+ [Reimagine knowledge discovery using Amazon Kendra's Web Crawler](https://aws.amazon.com/blogs/machine-learning/reimagine-knowledge-discovery-using-amazon-kendras-web-crawler/)

# Amazon Kendra Web Crawler connector v2.0
<a name="data-source-v2-web-crawler"></a>

You can use Amazon Kendra Web Crawler to crawl and index web pages.

You can only crawl public facing websites or internal company websites that use the secure communication protocol Hypertext Transfer Protocol Secure (HTTPS). If you receive an error when crawling a website, it could be that the website is blocked from crawling. To crawl internal websites, you can set up a web proxy. The web proxy must be public facing. You can also use authentication to access and crawl websites.

Amazon Kendra Web Crawler v2.0 uses the Selenium web crawler package and a Chromium driver. Amazon Kendra automatically updates the version of Selenium and the Chromium driver using Continuous Integration (CI).

*When selecting websites to index, you must adhere to the [Amazon Acceptable Use Policy](https://aws.amazon.com/aup/) and all other Amazon terms. Remember that you must only use Amazon Kendra Web Crawler to index your own web pages, or web pages that you have authorization to index. To learn how to stop Amazon Kendra Web Crawler from indexing your website(s), please see [Configuring the `robots.txt` file for Amazon Kendra Web Crawler](stop-web-crawler.md).*. Abusing Amazon Kendra Web Crawler to aggressively crawl websites or web pages you don't own is **not** considered acceptable use.

For troubleshooting your Amazon Kendra web crawler data source connector, see [Troubleshooting data sources](troubleshooting-data-sources.md).

**Note**  
Web Crawler connector v2.0 does *not* support crawling web site lists from AWS KMS encrypted Amazon S3 buckets. It supports only server-side encryption with Amazon S3 managed keys.

**Important**  
Web Crawler v2.0 connector creation is not supported by CloudFormation. Use the Web Crawler v1.0 connector if you need CloudFormation support.

**Topics**
+ [Supported features](#supported-features-v2-web-crawler)
+ [Prerequisites](#prerequisites-v2-web-crawler)
+ [Connection instructions](#data-source-v2-procedure-web-crawler)

## Supported features
<a name="supported-features-v2-web-crawler"></a>
+ Field mappings
+ Inclusion/exclusion filters
+ Full and incremental content syncs
+ Web proxy
+ Basic, NTLM/Kerberos, SAML, and form authentication for your websites
+ Virtual private cloud (VPC)

## Prerequisites
<a name="prerequisites-v2-web-crawler"></a>

Before you can use Amazon Kendra to index your websites, check the details of your websites and AWS accounts.

**For your websites, make sure you have:**
+ Copied the seed or sitemap URLs of the websites you want to index. You can store the URLs in a text file and upload this to an Amazon S3 bucket. Each URL in the text file must be formatted on a separate line. If you want to store your sitemaps in an Amazon S3 bucket, make sure you have copied the sitemap XML and saved this in an XML file. You can also club multiple sitemap XML files into a ZIP file.
**Note**  
(On-premise/server) Amazon Kendra checks if the endpoint information included in AWS Secrets Manager is the same the endpoint information specified in your data source configuration details. This helps protect against the [confused deputy problem](https://docs.aws.amazon.com/IAM/latest/UserGuide/confused-deputy.html), which is a security issue where a user doesn’t have permission to perform an action but uses Amazon Kendra as a proxy to access the configured secret and perform the action. If you later change your endpoint information, you must create a new secret to sync this information.
+ **For websites that require basic, NTLM, or Kerberos authentication**:
  + Noted your website authentication credentials, which include a user name and password.
**Note**  
Amazon Kendra Web Crawler v2.0 supports the NTLM authentication protocol that includes password hashing, and Kerberos authentication protocol that includes password encryption.
+ **For websites that require SAML or login form authentication**:
  + Noted your website authentication credentials, which include a user name and password.
  + Copied the XPaths (XML Path Language) of the user name field (and the user name button if using SAML), password field and button, and copied the login page URL. You can find the XPaths of elements using your web browser’s developer tools. XPaths usually follow this format: `//tagname[@Attribute='Value']`.
**Note**  
Amazon Kendra Web Crawler v2.0 uses a headless Chrome browser and the information from the form to authenticate and authorize access with an OAuth 2.0 protected URL.
+ **Optional**: Copied the host name and the port number of the web proxy server if you want to use a web proxy to connect to internal websites you want to crawl. The web proxy must be public facing. Amazon Kendra supports connecting to web proxy servers that are backed by basic authentication or you can connect with no authentication.
+ **Optional**: Copied the virtual private cloud (VPC) subnet ID if you want to use a VPC to connect to internal websites you want to crawl. For more information, see [Configuring an Amazon VPC](https://docs.aws.amazon.com/kendra/latest/dg/vpc-configuration.html).
+ Checked each web page document you want to index is unique and across other data sources you plan to use for the same index. Each data source that you want to use for an index must not contain the same document across the data sources. Document IDs are global to an index and must be unique per index.

**In your AWS account, make sure you have:**
+ [Created an Amazon Kendra index](https://docs.aws.amazon.com/kendra/latest/dg/create-index.html) and, if using the API, noted the index ID.
+ [Created an IAM role](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds) for your data source and, if using the API, noted the Amazon Resource Name of the IAM role.
**Note**  
If you change your authentication type and credentials, you must update your IAM role to access the correct AWS Secrets Manager secret ID.
+ For websites that require authentication, or if using a web proxy with authentication, stored your authentication credentials in an AWS Secrets Manager secret and, if using the API, noted the ARN of the secret.
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).

If you don't have an existing IAM role or secret, you can use the console to create a new IAM role and Secrets Manager secret when you connect your web crawler data source to Amazon Kendra. If you are using the API, you must provide the ARN of an existing IAM role and Secrets Manager secret, and an index ID.

## Connection instructions
<a name="data-source-v2-procedure-web-crawler"></a>

To connect Amazon Kendra to your web crawler data source, you must provide the necessary details of your web crawler data source so that Amazon Kendra can access your data. If you have not yet configured web crawler for Amazon Kendra see [Prerequisites](#prerequisites-v2-web-crawler).

------
#### [ Console ]

**To connect Amazon Kendra to web crawler** 

1. Sign in to the AWS Management Console and open the [Amazon Kendra console](https://console.aws.amazon.com/kendra/).

1. From the left navigation pane, choose **Indexes** and then choose the index you want to use from the list of indexes.
**Note**  
You can choose to configure or edit your **User access control** settings under **Index settings**. 

1. On the **Getting started** page, choose **Add data source**.

1. On the **Add data source** page, choose **web crawler connector**, and then choose **Add connector**. If using version 2 (if applicable), choose **web crawler connector** with the "V2.0" tag.

1. On the **Specify data source details** page, enter the following information:

   1. In **Name and description**, for **Data source name**—Enter a name for your data source. You can include hyphens but not spaces.

   1. (Optional)** Description**—Enter an optional description for your data source.

   1. In **Default language**—Choose a language to filter your documents for the index. Unless you specify otherwise, the language defaults to English. Language specified in the document metadata overrides the selected language.

   1. In **Tags**, for **Add new tag**—Include optional tags to search and filter your resources or track your AWS costs.

   1. Choose **Next**.

1. On the **Define access and security** page, enter the following information:

   1. **Source**—Choose either **Source URLs**, **Source sitemaps**, **Source URLs file**, **Source sitemaps file**. If you choose to use a text file that includes a list of up to 100 seed URLs, you specify the path to the Amazon S3 bucket where your file is stored. If you choose to use a sitemap XML file, you specify the path to the Amazon S3 bucket where your file is stored. You can also club multiple sitemap XML files into a ZIP file. Otherwise, you can manually enter up to 10 seed or starting point URLs, and up to three sitemap URLs.
**Note**  
If you want to crawl a sitemap, check that the base or root URL is the same as the URLs listed on your sitemap page. For example, if your sitemap URL is *https://example.com/sitemap-page.html*, the URLs listed on this sitemap page should also use the base URL "https://example.com/".

      If your websites require authentication to access the websites, you can choose ether basic, NTLM/Kerberos, SAML, or form authentication. Otherwise, choose the option for no authentication.
**Note**  
If you want to later edit your data source to change your seed URLs with authentication to sitemaps, you must create a new data source. Amazon Kendra configures the data source using the seed URLs endpoint information in the Secrets Manager secret for authentication, and therefore cannot re-configure the data source when changing to sitemaps.

      1. **AWS Secrets Manager secret**—If your websites require the same authentication to access the websites, choose an existing secret or create a new Secrets Manager secret to store your website credentials. If you choose to create a new secret, an AWS Secrets Manager secret window opens.

        If you chose **Basic** or **NTML/Kerberos** authentication, enter a name for the secret, plus the user name and password. NTLM authentication protocol includes password hashing, and Kerberos authentication protocol includes password encryption.

        If you chose **SAML** or **Form** authentication, enter a name for the secret, plus the user name and password. Use XPath for the user name field (and XPath for the user name button if using SAML). Use XPaths for the password field and button, and login page URL. You can find the XPaths (XML Path Language) of elements using your web browser's developer tools. XPaths usually follow this format: `//tagname[@Attribute='Value']`.

   1. (Optional) **Web proxy**—Enter the host name and the port number of the proxy sever you want to use to connect to internal websites. For example, the host name of *https://a.example.com/page1.html* is "a.example.com" and the port number is is 443, the standard port for HTTPS. If web proxy credentials are required to connect to a website host, you can create an AWS Secrets Manager that stores the credentials.

   1. **Virtual Private Cloud (VPC)**—You can choose to use a VPC. If so, you must add **Subnets** and **VPC security groups**.

   1. **IAM role**—Choose an existing IAM role or create a new IAM role to access your repository credentials and index content.
**Note**  
IAM roles used for indexes cannot be used for data sources. If you are unsure if an existing role is used for an index or FAQ, choose **Create a new role** to avoid errors.

   1. Choose **Next**.

1. On the **Configure sync settings** page, enter the following information:

   1. **Sync scope**—Set limits for crawling web pages including their domains, file sizes and links; and filter URLs using regex patterns.

      1. (Optional) **Crawl domain range**—Choose whether to crawl website domains only, domains with subdomains, or also crawl other domains that the web pages link to. By default, Amazon Kendra only crawls the domains of the websites you want to crawl.

      1. (Optional) **Additional configuration**—Set the following settings:
         + **Crawl depth**—The 'depth' or number of levels from the seed level to crawl. For example, the seed URL page is depth 1 and any hyperlinks on this page that are also crawled are depth 2.
         + **Maximum file size**—The maximum size in MB of a web page or attachment to crawl.
         + **Maximum links per page**—The maximum number of URLs on a single webpage to crawl.
         + **Maximum throttling of crawling speed**—The maximum number of URLs crawled per website host per minute.
         + **Files**—Choose to crawl files that the web pages link to.
         + **Crawl and index URLs**—Add regular expression patterns to include or exclude crawling certain URLs, and indexing any hyperlinks on these URL web pages.

   1. **Sync mode**—Choose how you want to update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option.
      + Full sync: Freshly index all content, replacing existing content each time your data source syncs with your index.
      + New, modified, deleted sync: Index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

   1. **Sync run schedule**—For **Frequency**, choose how often Amazon Kendra will sync with your data source.

   1. Choose **Next**.

1. On the **Set field mappings** page, enter the following information:

   1. Select from the Amazon Kendra generated default fields of web pages and files that you want to map to your index.

   1. Choose **Next**.

1. On the **Review and create** page, check that the information you have entered is correct and then select **Add data source**. You can also choose to edit your information from this page. Your data source will appear on the **Data sources** page after the data source has been added successfully.

------
#### [ API ]

**To connect Amazon Kendra to web crawler**

You must specify a JSON of the [data source schema](https://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html#ds-web-crawler-schema) using the [https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) API. You must provide the following information:
+ **Data source**—Specify the data source type as `WEBCRAWLERV2` when you use the [https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) JSON schema. Also specify the data source as `TEMPLATE` when you call the [https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html](https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html) API.
+ **URLs**—Specify the seed or starting point URLs of the websites or the sitemap URLs of the websites you want to crawl. You can specify the path to an Amazon S3 bucket that stores your list of seed URLs. Each URL in the text file for seed URLs must be formatted on a separate line. You can also specify the path to an Amazon S3 bucket that stores your sitemap XML files. You can club together multiple sitemap files into a ZIP file and store the ZIP file in your Amazon S3 bucket.
**Note**  
If you want to crawl a sitemap, check that the base or root URL is the same as the URLs listed on your sitemap page. For example, if your sitemap URL is *https://example.com/sitemap-page.html*, the URLs listed on this sitemap page should also use the base URL "https://example.com/".
+ **Sync mode**—Specify how Amazon Kendra should update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option. You can choose between:
  + `FORCED_FULL_CRAWL` to freshly index all content, replacing existing content each time your data source syncs with your index.
  + `FULL_CRAWL` to index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source’s mechanism for tracking content changes and index content that changed since the last sync.
+ **Authentication**—If your websites require the same authentication, specify either `BasicAuth`, `NTLM_Kerberos`, `SAML`, or `Form` authentication. If your websites don't require authentication, specify `NoAuthentication`.
+ **Secret Amazon Resource Name (ARN)**—If your websites require basic, NTLM, or Kerberos authentication, you provide a secret that stores your authentication credentials of your user name and password. You provide the Amazon Resource Name (ARN) of an AWS Secrets Manager secret. The secret is stored in a JSON structure with the following keys:

  ```
  {
      "seedUrlsHash": "Hash representation of all seed URLs",
      "userName": "user name",
      "password": "password"
  }
  ```

  If your websites require SAML authentication, the secret is stored in a JSON structure with the following keys:

  ```
  {
      "seedUrlsHash": "Hash representation of all seed URLs",                                
      "userName": "user name",
      "password": "password",
      "userNameFieldXpath": "XPath for user name field",
      "userNameButtonXpath": "XPath for user name button",
      "passwordFieldXpath": "XPath for password field",
      "passwordButtonXpath": "XPath for password button",
      "loginPageUrl": "Full URL for website login page"
  }
  ```

  If your websites require form authentication, the secret is stored in a JSON structure with the following keys:

  ```
  {
      "seedUrlsHash": "Hash representation of all seed URLs",
      "userName": "user name",
      "password": "password",
      "userNameFieldXpath": "XPath for user name field",
      "passwordFieldXpath": "XPath for password field",
      "passwordButtonXpath": "XPath for password button",
      "loginPageUrl": "Full URL for website login page"
  }
  ```

  You can find the XPaths (XML Path Language) of elements using your web browser's developer tools. XPaths usually follow this format: `//tagname[@Attribute='Value']`.

  You can also provide web proxy credentials using and AWS Secrets Manager secret.
+ **IAM role**—Specify `RoleArn` when you call `CreateDataSource` to provide an IAM role with permissions to access your Secrets Manager secret and to call the required public APIs for the web crawler connector and Amazon Kendra. For more information, see [IAM roles for web crawler data sources](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds).

You can also add the following optional features:
+  **Virtual Private Cloud (VPC)**—Specify `VpcConfiguration` when you call `CreateDataSource`. For more information, see [Configuring Amazon Kendra to use an Amazon VPC](vpc-configuration.md).
+ **Domain range**—Choose whether to crawl website domains with subdomains only, or also crawl other domains the web pages link to. By default, Amazon Kendra only crawls the domains of the websites you want to crawl.
+ The 'depth' or number of levels from the seed level to crawl. For example, the seed URL page is depth 1 and any hyperlinks on this page that are also crawled are depth 2.
+ The maximum number of URLs on a single web page to crawl.
+ The maximum size in MB of a web page or attachment to crawl.
+ The maximum number of URLs crawled per website host per minute.
+ The web proxy host and port number to connect to and crawl internal websites. For example, the host name of *https://a.example.com/page1.html* is "a.example.com" and the port number is is 443, the standard port for HTTPS. If web proxy credentials are required to connect to a website host, you can create an AWS Secrets Manager that stores the credentials.
+ **Inclusion and exclusion filters**—Specify whether to include or exclude crawling certain URLs and indexing any hyperlinks on these URL web pages.
**Note**  
Most data sources use regular expression patterns, which are inclusion or exclusion patterns referred to as filters. If you specify an inclusion filter, only content that matches the inclusion filter is indexed. Any document that doesn’t match the inclusion filter isn’t indexed. If you specify an inclusion and exclusion filter, documents that match the exclusion filter are not indexed, even if they match the inclusion filter.
+ **Field mappings**—Choose to map the fields of web pages and web page files to your Amazon Kendra index fields. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html).

For a list of other important JSON keys to configure, see [Amazon Kendra Web Crawler template schema](https://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html#ds-schema-web-crawler).

------

# Configuring the `robots.txt` file for Amazon Kendra Web Crawler
<a name="stop-web-crawler"></a>

Amazon Kendra is an intelligent search service that AWS customers use to index and search documents of their choice. In order to index documents on the web, customers may use Amazon Kendra Web Crawler, indicating which URL(s) should be indexed and other operational parameters. Amazon Kendra customers are required to obtain authorization before indexing any particular website.

Amazon Kendra Web Crawler respects standard robots.txt directives like `Allow` and `Disallow`. You can modify the `robots.txt` file of your website to control how Amazon Kendra Web Crawler crawls your website.

## Configuring how Amazon Kendra Web Crawler accesses your website
<a name="configure-web-crawler-website-access"></a>

You can control how the Amazon Kendra Web Crawler indexes your website using `Allow` and `Disallow` directives. You can also control which web pages are indexed and which web pages are not crawled.

**To allow Amazon Kendra Web Crawler to crawl all web pages except disallowed web pages, use the following directive:**

```
User-agent: amazon-kendra    # Amazon Kendra Web Crawler
Disallow: /credential-pages/ # disallow access to specific pages
```

**To allow Amazon Kendra Web Crawler to crawl only specific web pages, use the following directive:**

```
User-agent: amazon-kendra    # Amazon Kendra Web Crawler
Allow: /pages/ # allow access to specific pages
```

**To allow Amazon Kendra Web Crawler to crawl all website content and disallow crawling for any other robots, use the following directive:**

```
User-agent: amazon-kendra # Amazon Kendra Web Crawler
Allow: / # allow access to all pages
User-agent: * # any (other) robot
Disallow: / # disallow access to any pages
```

## Stopping Amazon Kendra Web Crawler from crawling your website
<a name="stop-web-crawler-access"></a>

You can stop Amazon Kendra Web Crawler from indexing your website using the `Disallow` directive. You can also control which web pages are crawled and which are not.

**To stop Amazon Kendra Web Crawler from crawling the website, use the following directive:**

```
User-agent: amazon-kendra # Amazon Kendra Web Crawler
Disallow: / # disallow access to any pages
```

If you have any questions or concerns regarding Amazon Kendra Web Crawler, you can reach out to the [AWS support team](https://aws.amazon.com/contact-us/?nc1=f_m).

# Box
<a name="data-source-box"></a>

Box is a cloud storage service that offers file hosting capabilities. You can use Amazon Kendra to index content in your Box content, including comments, tasks, and web links.

You can connect Amazon Kendra to your Box data source using the [Amazon Kendra console](https://console.aws.amazon.com/kendra/) and the [BoxConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_BoxConfiguration.html) API.

For troubleshooting your Amazon Kendra Box data source connector, see [Troubleshooting data sources](troubleshooting-data-sources.md).

**Topics**
+ [Supported features](#supported-features-box)
+ [Prerequisites](#prerequisites-box)
+ [Connection instructions](#data-source-procedure-box)
+ [Learn more](#box-learn-more)
+ [Notes](#box-notes)

## Supported features
<a name="supported-features-box"></a>

Amazon Kendra Box data source connector supports the following features:
+ Field mappings
+ User access control
+ Inclusion/exclusion filters
+ Change log, full and incremental content syncs
+ Virtual private cloud (VPC)

## Prerequisites
<a name="prerequisites-box"></a>

Before you can use Amazon Kendra to index your Box data source, make these changes in your Box and AWS accounts.

**In Box, make sure you have:**
+ A Box Enterprise or Box Enterprise Plus account.
+ Configured a Box custom app in the Box Developer Console, with Server-side authentication using JSON Web Tokens (JWT). See [Box documentation on creating a Custom App](https://developer.box.com/guides/applications/app-types/platform-apps/) and [Box documentation of configuring JWT Auth](https://developer.box.com/guides/authentication/jwt/) for more details.
+ Set your **App Access Level** to **App \$1 Enterprise Access** and allowed it to **Make API calls using the as-user header**.
+ Used the admin user to add the following **Application Scopes** in your Box app:
  + Write all files and folders stored in a Box
  + Manage users
  + Manage groups
  + Manage enterprise properties
+ Configured Public/Private key pair including a client ID, a client secret, a public key ID, private key ID, a pass phrase, and an enterprise ID to use as your authentication credentials. See [Public and private key pair](https://developer.box.com/guides/authentication/jwt/jwt-setup/#public-and-private-key-pair) for more details.
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).
+ Copied your Box enterprise ID either from your Box Developer Console settings or from your Box app. For example, *801234567*.
+ Checked each document is unique in Box and across other data sources you plan to use for the same index. Each data source that you want to use for an index must not contain the same document across the data sources. Document IDs are global to an index and must be unique per index.

**In your AWS account, make sure you have:**
+ [Created an Amazon Kendra index](https://docs.aws.amazon.com/kendra/latest/dg/create-index.html) and, if using the API, noted the index ID.
+ [Created an IAM role](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds) for your data source and, if using the API, noted the ARN of the IAM role.
**Note**  
If you change your authentication type and credentials, you must update your IAM role to access the correct AWS Secrets Manager secret ID.
+ Stored your Box authentication credentials in an AWS Secrets Manager secret and, if using the API, noted the ARN of the secret.
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).

If you don’t have an existing IAM role or secret, you can use the console to create a new IAM role and Secrets Manager secret when you connect your Box data source to Amazon Kendra. If you are using the API, you must provide the ARN of an existing IAM role and Secrets Manager secret, and an index ID.

## Connection instructions
<a name="data-source-procedure-box"></a>

To connect Amazon Kendra to your Box data source, you must provide the necessary details of your Box data source so that Amazon Kendra can access your data. If you have not yet configured Box for Amazon Kendra, see [Prerequisites](#prerequisites-box).

------
#### [ Console ]

**To connect Amazon Kendra to Box** 

1. Sign in to the AWS Management Console and open the [Amazon Kendra console](https://console.aws.amazon.com/kendra/).

1. From the left navigation pane, choose **Indexes** and then choose the index you want to use from the list of indexes.
**Note**  
You can choose to configure or edit your **User access control** settings under **Index settings**. 

1. On the **Getting started** page, choose **Add data source**.

1. On the **Add data source** page, choose **Box connector**, and then choose **Add connector**. If using version 2 (if applicable), choose **Box connector** with the "V2.0" tag.

1. On the **Specify data source details** page, enter the following information:

   1. In **Name and description**, for **Data source name**—Enter a name for your data source. You can include hyphens but not spaces.

   1. (Optional)** Description**—Enter an optional description for your data source.

   1. In **Default language**—Choose a language to filter your documents for the index. Unless you specify otherwise, the language defaults to English. Language specified in the document metadata overrides the selected language.

   1. In **Tags**, for **Add new tag**—Include optional tags to search and filter your resources or track your AWS costs.

   1. Choose **Next**.

1. On the **Define access and security** page, enter the following information:

   1. **Box enterprise ID**—Enter your Box Enterprise ID. For example, *801234567*.

   1. **Authorization**—Turn on or off access control list (ACL) information for your documents, if you have an ACL and want to use it for access control. The ACL specifies which documents that users and groups can access. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see [User context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources).

   1. **AWS Secrets Manager secret**—Choose an existing secret or create a new Secrets Manager secret to store your Box authentication credentials. If you choose to create a new secret an AWS Secrets Manager secret window opens.

      1. **Secret name**—A name for your secret. The prefix ‘AmazonKendra-Box-’ is automatically added to your secret name.

      1. For **Client ID**, **Client Secret**, **Public Key ID**, **Private Key ID**, and **Pass Phrase**—Enter the values from the Public/Private Key you configured in Box.

      1. Add and save your secret.

   1. **Virtual Private Cloud (VPC)**—You can choose to use a VPC. If so, you must add **Subnets** and **VPC security groups**.

   1. **Identity crawler**—Specify whether to turn on Amazon Kendra’s identity crawler. The identity crawler uses the access control list (ACL) information for your documents to filter search results based on the user or their group access to documents. If you have an ACL for your documents and choose to use your ACL, you can then also choose to turn on Amazon Kendra’s identity crawler to configure [user context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources) of search results. Otherwise, if identity crawler is turned off, all documents can be publicly searched. If you want to use access control for your documents and identity crawler is turned off, you can alternatively use the [PutPrincipalMapping](https://docs.aws.amazon.com/kendra/latest/APIReference/API_PutPrincipalMapping.html) API to upload user and group access information for user context filtering.

   1. **IAM role**—Choose an existing IAM role or create a new IAM role to access your repository credentials and index content.
**Note**  
IAM roles used for indexes cannot be used for data sources. If you are unsure if an existing role is used for an index or FAQ, choose **Create a new role** to avoid errors.

   1. Choose **Next**.

1. On the **Configure sync settings** page, enter the following information:

   1. **Box files**—Choose whether to crawl web links, comments, and tasks.

   1. For **Additional configuration**—Add regular expression patterns to include or exclude certain content.

   1. **Sync mode**—Choose how you want to update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option.
      + Full sync: Freshly index all content, replacing existing content each time your data source syncs with your index.
      + New, modified sync: Index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.
      + New, modified, deleted sync: Index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

   1. In **Sync run schedule** for **Frequency**—Choose how often to sync your data source content and update your index.

   1. Choose **Next**.

1. On the **Set field mappings** page, enter the following information:

   1. **Default data source fields**—Select from the Amazon Kendra generated default data source fields you want to map to your index.

   1.  **Add field**—To add custom data source fields to create an index field name to map to and the field data type.

   1. Choose **Next**.

1. On the **Review and create** page, check that the information you have entered is correct and then select **Add data source**. You can also choose to edit your information from this page. Your data source will appear on the **Data sources** page after the data source has been added successfully.

------
#### [ API ]

**To connect Amazon Kendra to Box**

You must specify the following using the [BoxConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_BoxConfiguration.html) API:

**Box enterprise ID**—Provide your Box Enterprise ID. You can find the enterprise ID in the Box Developer Console settings or when you configure an app in Box.
+ **Secret Amazon Resource Name (ARN)**—Provide the Amazon Resource Name (ARN) of an Secrets Manager secret that contains the authentication credentials for your Box account. The secret is stored in a JSON structure with the following keys:

  ```
  {
      "clientID": "client-id",
      "clientSecret": "client-secret",
      "publicKeyID": "public-key-id",
      "privateKey": "private-key",
      "passphrase": "pass-phrase"
  }
  ```
+ **IAM role**—Specify `RoleArn` when you call `CreateDataSource` to provide an IAM role with permissions to access your Secrets Manager secret and to call the required public APIs for the Box connector and Amazon Kendra. For more information, see [IAM roles for Box data sources](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds).

You can also add the following optional features:
+ **Virtual Private Cloud (VPC)**—Specify `VpcConfiguration` as part of the data source configuration. See [Configuring Amazon Kendra to use a VPC](https://docs.aws.amazon.com/kendra/latest/dg/vpc-configuration.html). 
+  **Change log**—Whether Amazon Kendra should use the Box data source change log mechanism to determine if a document must be updated in the index.
**Note**  
Use the change log if you don’t want Amazon Kendra to scan all of the documents. If your change log is large, it might take Amazon Kendra less time to scan the documents in the Box data source than to process the change log. If you are syncing your Box data source with your index for the first time, all documents are scanned. 
+  **Comments, tasks, web links**—Specify whether to crawl these types of content.
**Note**  
Most data sources use regular expression patterns, which are inclusion or exclusion patterns referred to as filters. If you specify an inclusion filter, only content that matches the inclusion filter is indexed. Any document that doesn’t match the inclusion filter isn’t indexed. If you specify an inclusion and exclusion filter, documents that match the exclusion filter are not indexed, even if they match the inclusion filter.
+  **Inclusion and exclusion filters**—Specify whether to include or exclude certain Box files and folders.
**Note**  
Most data sources use regular expression patterns, which are inclusion or exclusion patterns referred to as filters. If you specify an inclusion filter, only content that matches the inclusion filter is indexed. Any document that doesn’t match the inclusion filter isn’t indexed. If you specify an inclusion and exclusion filter, documents that match the exclusion filter are not indexed, even if they match the inclusion filter.
+  **User context filtering and access control**—Amazon Kendra crawls the access control list (ACL) for your documents, if you have an ACL for your documents. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see [User context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#datasource-context-filter).
+  **Field mappings**—Choose to map your Box data source fields to your Amazon Kendra index fields. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html).
**Note**  
The document body field or the document body equivalent for your documents is required in order for Amazon Kendra to search your documents. You must map your document body field name in your data source to the index field name `_document_body`. All other fields are optional.

------

## Learn more
<a name="box-learn-more"></a>

To learn more about integrating Amazon Kendra with your Box data source, see:
+ [Getting started with the Amazon Kendra Box connector](https://aws.amazon.com/blogs/machine-learning/getting-started-with-the-amazon-kendra-box-connector/)

## Notes
<a name="box-notes"></a>
+ When Access Control Lists (ACLs) are enabled, the "Sync only new or modified content" option is not available due to Box API limitations. We recommend using "Full sync" or "New, modified, or deleted content sync" modes instead, or disable ACLs if you need to use this sync mode.

# Confluence
<a name="data-source-confluence"></a>

Confluence is a collaborative work-management tool designed for sharing, storing, and working on project planning, software development, and product management. Amazon Kendra supports both Confluence Server/Data Center and Confluence Cloud. You can use Amazon Kendra to index the following Confluence entities:
+ **Spaces** – Top-level designated areas for organizing related content. Each space serves as a container, capable of holding multiple pages, blogs, and attachments.
+ **Pages** – Individual documents within a space where users create and manage content. Pages can contain text, images, tables, and multimedia elements, and can have nested sub-pages. Each page is considered a single document.
+ **Blogs** – Content similar to pages, typically used for updates or announcements. Each blog post is considered as a single document.
+ **Comments** – Allows users to give feedback or engage in discussions on specific content within pages or blog posts.
+ **Attachments** – Files uploaded to pages or blog posts in Confluence, such as images, documents, or other file types.

By default, Amazon Kendra doesn't index Confluence archives and personal spaces. You can choose to index them when you create the data source. If you don't want Amazon Kendra to index a space, mark it private in Confluence.

You can connect Amazon Kendra to your Confluence data source using either the [Amazon Kendra console](https://console.aws.amazon.com/kendra/), the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html) API, or the [ConfluenceConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_ConfluenceConfiguration.html) API.

Amazon Kendra has two versions of the Confluence connector. The following features are supported.

****Confluence connector V2.0 / [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html) API****
+ Field mappings
+ User access control
+ Inclusion/exclusion patterns
+ Full and incremental content syncs
+ Virtual private cloud (VPC)

****Confluence connector V1.0 / [ConfluenceConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_ConfluenceConfiguration.html) API** (no longer supported)**
+ Field mappings
+ User access control
+ Inclusion/exclusion filters
+ (Confluence Server only) Virtual private cloud (VPC)

**Note**  
Confluence connector V1.0 / ConfluenceConfiguration API ended in 2023. We recommend migrating to or using Confluence connector V2.0 / TemplateConfiguration API.

For troubleshooting your Amazon Kendra Confluence data source connector, see [Troubleshooting data sources](troubleshooting-data-sources.md).

**Topics**
+ [ACLs in Confluence Connector](#data-source-confluence-acls)
+ [Confluence connector V2.0](data-source-v2-confluence.md)
+ [Confluence connector V1.0](data-source-v1-confluence.md)

## ACLs in Confluence Connector
<a name="data-source-confluence-acls"></a>

Connectors support crawling Access Control Lists (ACLs) and identifying information where applicable based on the data source. If you index documents without ACLs, all documents are considered public. Indexing documents with ACLs ensures data security.

The Amazon Kendra Confluence connector scans spaces to collect pages and blog posts along with their ACLs. If there is no restriction applied on a page or blog, the connector inherits permissions from its space. If specific user or group restriction is applied on a page, only those users will be able to access that page. If page is nested, the nested page inherits the permissions of parent page if no restrictions are applied. A similar permissions model applies to blogs; however, Confluence does not support nested blogs.

In addition, Amazon Kendra Confluence connector crawls user principal information (local user alias, local group and federated group identity configurations) and its permissions for each configured space. 

**Note**  
The Confluence Cloud connector does not support crawling macros, whiteboards, or databases. 

The Amazon Kendra Confluence connector updates ACL changes each time it crawls your data source content. To ensure the correct users have access to the correct content, regularly re-sync your data source to capture any ACL updates.

# Confluence connector V2.0
<a name="data-source-v2-confluence"></a>

Confluence is a collaborative work-management tool designed for sharing, storing, and working on project planning, software development, and product management. You can use Amazon Kendra to index your Confluence spaces, pages (including nested pages), blogs, and comments and attachments to indexed pages and blogs.

For troubleshooting your Amazon Kendra Confluence data source connector, see [Troubleshooting data sources](troubleshooting-data-sources.md).

**Topics**
+ [Supported features](#supported-features-v2-confluence)
+ [Prerequisites](#prerequisites-v2-confluence)
+ [Connection instructions](#data-source-procedure-v2-confluence)

## Supported features
<a name="supported-features-v2-confluence"></a>

Amazon Kendra Confluence data source connector supports the following features:
+ Field mappings
+ User access control
+ Inclusion/exclusion patterns
+ Full and incremental content syncs
+ Virtual private cloud (VPC)

## Prerequisites
<a name="prerequisites-v2-confluence"></a>

Before you can use Amazon Kendra to index your Confluence data source, make these changes in your Confluence and AWS accounts.

**In Confluence, make sure you have:**
+ Copied your Confluence instance URL. For example: *https://example.confluence.com*, or *https://www.example.confluence.com/*, or *https:// atlassian.net/*. You need your Confluence instance URL to connect to Amazon Kendra.

  If you're using Confluence Cloud, your host URL must end with *atlassian.net/*.
**Note**  
The following URL formats are **not** supported:  
*https://example.confluence.com/xyz*
*https://www.example.confluence.com//wiki/spacekey/xxx*
*https://atlassian.net/xyz*
**Note**  
(On-premise/server) Amazon Kendra checks if the endpoint information included in AWS Secrets Manager is the same the endpoint information specified in your data source configuration details. This helps protect against the [confused deputy problem](https://docs.aws.amazon.com/IAM/latest/UserGuide/confused-deputy.html), which is a security issue where a user doesn’t have permission to perform an action but uses Amazon Kendra as a proxy to access the configured secret and perform the action. If you later change your endpoint information, you must create a new secret to sync this information.
+ Configured basic authentication credentials containing a user name (email ID used to log into Confluence) and password (Confluence API token as the password). See [Manage API tokens for your Atlassian account](https://support.atlassian.com/atlassian-account/docs/manage-api-tokens-for-your-atlassian-account/#Create-an-API-token).
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).
+ **Optional:** Configured OAuth 2.0 credentials containing a Confluence app key, Confluence app secret, Confluence access token, and Confluence refresh token to allow Amazon Kendra to connect to your Confluence instance. If your access token expires, you can either use the refresh token to regenerate your access token and refresh token pair. Or, you can repeat the authorization process. For more information on access tokens, see [Manage OAuth access tokens](https://support.atlassian.com/confluence-cloud/docs/manage-oauth-access-tokens/).
+ (For Confluence Server/Data Center only) **Optional:** Configured a Personal Access Token (PAT) in Confluence. See [Using Personal Access Tokens](https://confluence.atlassian.com/enterprise/using-personal-access-tokens-1026032365.html).

**In your AWS account, make sure you have:**
+ [Created an Amazon Kendra index](https://docs.aws.amazon.com/kendra/latest/dg/create-index.html) and, if using the API, noted the index ID.
+ [Created an IAM role](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds) for your data source and, if using the API, noted the ARN of the IAM role.
**Note**  
If you change your authentication type and credentials, you must update your IAM role to access the correct AWS Secrets Manager secret ID.
+ Stored your Confluence authentication credentials in an AWS Secrets Manager secret and, if using the API, noted the ARN of the secret.
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).

If you don’t have an existing IAM role or secret, you can use the console to create a new IAM role and Secrets Manager secret when you connect your Confluence data source to Amazon Kendra. If you are using the API, you must provide the ARN of an existing IAM role and Secrets Manager secret, and an index ID.

## Connection instructions
<a name="data-source-procedure-v2-confluence"></a>

To connect Amazon Kendra to your Confluence data source, you must provide the necessary details of your Confluence data source so that Amazon Kendra can access your data. If you have not yet configured Confluence for Amazon Kendra see [Prerequisites](#prerequisites-v2-confluence).

------
#### [ Console ]

**To connect Amazon Kendra to Confluence** 

1. Sign in to the AWS Management Console and open the [Amazon Kendra console](https://console.aws.amazon.com/kendra/).

1. From the left navigation pane, choose **Indexes** and then choose the index you want to use from the list of indexes.
**Note**  
You can choose to configure or edit your **User access control** settings under **Index settings**. 

1. On the **Getting started** page, choose **Add data source**.

1. On the **Add data source** page, choose **Confluence connector**, and then choose **Add connector**. If using version 2 (if applicable), choose **Confluence connector** with the "V2.0" tag.

1. On the **Specify data source details** page, enter the following information:

   1. In **Name and description**, for **Data source name**—Enter a name for your data source. You can include hyphens but not spaces.

   1. (Optional)** Description**—Enter an optional description for your data source.

   1. In **Default language**—Choose a language to filter your documents for the index. Unless you specify otherwise, the language defaults to English. Language specified in the document metadata overrides the selected language.

   1. In **Tags**, for **Add new tag**—Include optional tags to search and filter your resources or track your AWS costs.

   1. Choose **Next**.

1. On the **Define access and security** page, enter the following information:

   1. In **Source**, choose either **Confluence Cloud** or **Confluence Server/Data Center**.

   1. **Confluence URL**—Enter the Confluence host URL. For example, *https://example.confluence.com*.

   1. (For Confluence Server/Data Center only) **SSL certificate location - *optional***—Enter the Amazon S3 path to your SSL certificate file for Confluence Server.

   1. (For Confluence Server/Data Center only) **Web proxy - *optional***—Enter the web proxy host name (without the `http://` or `https://` protocol) and port number (port used by the host URL transport protocol). The port number should be a numeric value between 0 and 65535.

   1. **Authorization**—Turn on or off access control list (ACL) information for your documents, if you have an ACL and want to use it for access control. The ACL specifies which documents that users and groups can access. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see [User context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources).

   1. **Authentication**—Choose either **Basic authentication**, **Oauth 2.0 authentication**, or (For Confluence Server/Data Center only) **Personal Access Token authentication**.

   1. **AWS Secrets Manager secret**—Choose an existing secret or create a new Secrets Manager secret to store your Confluence authentication credentials. If you choose to create a new secret an AWS Secrets Manager secret window opens. Enter the following information in the window:

      1. **Secret name**—A name for your secret. The prefix ‘AmazonKendra-Confluence-’ is automatically added to your secret name.

      1. If using **Basic Authentication**—Enter the secret name, user name, and password (Confluence API token as the password) you configured in Confluence.

         If using **OAuth2.0 Authentication**—Enter the secret name, app key, app secret, access token, and refresh token you configured in Confluence.

         (Confluence Server/Data Center only) If using **Personal Access Token authentication**—Enter the secret name and Confluence token you configured in your Confluence.

      1. Save and add your secret.

   1. **Virtual Private Cloud (VPC)**—You can choose to use a VPC. If so, you must add **Subnets** and **VPC security groups**.

   1. **Identity crawler**—Specify whether to turn on Amazon Kendra’s identity crawler. The identity crawler uses the access control list (ACL) information for your documents to filter search results based on the user or their group access to documents. If you have an ACL for your documents and choose to use your ACL, you can then also choose to turn on Amazon Kendra’s identity crawler to configure [user context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources) of search results. Otherwise, if identity crawler is turned off, all documents can be publicly searched. If you want to use access control for your documents and identity crawler is turned off, you can alternatively use the [PutPrincipalMapping](https://docs.aws.amazon.com/kendra/latest/APIReference/API_PutPrincipalMapping.html) API to upload user and group access information for user context filtering.

   1. **IAM role**—Choose an existing IAM role or create a new IAM role to access your repository credentials and index content.
**Note**  
IAM roles used for indexes cannot be used for data sources. If you are unsure if an existing role is used for an index or FAQ, choose **Create a new role** to avoid errors.

   1. Choose **Next**.

1. On the **Configure sync settings** page, enter the following information:

   1. In **Sync scope**, for **Sync contents**—Choose to sync from the following content types: Pages, page comments, page attachments, blogs, blog comments, blog attachments, personal spaces, and archived spaces.
**Note**  
Page comments and page attachments can only be seleted if you choose to sync **Pages**. Blog comments and blog attachments can only be seleted if you choose to sync **Blogs**.
**Important**  
If you don't specify a space key regex pattern in **Additional configuration**, all pages and blogs will be crawled by default.

   1. In **Additional configuration**, for **Maximum file size**—Specify the file size limit in MBs that Amazon Kendra will crawl. Amazon Kendra will crawl only the files within the size limit you define. The default file size is 50 MB. The maximum file size should be greater than 0 MB and less than or equal to 50 MB.

      For **Spaces regex patterns**—Specify whether to include or exclude specific spaces in your index using:
      + Space key (for example, *my-space-123*)
**Note**  
If you don't specify a space key regex pattern, all pages and blogs will be crawled by default.
      + URL (for example, *.\$1/MySite/MyDocuments/*)
      + File type (for example, *.\$1\$1.pdf, .\$1\$1.txt*)

      For **Entity title regex patterns**—Specify regular expression patterns to include or exclude certain blogs, pages, comments, and attachments by titles.
**Note**  
If you want to include or exclude crawling a specific page or subpage, you can use page title regex patterns.

   1. **Sync mode**—Choose how you want to update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option.
      + Full sync: Freshly index all content, replacing existing content each time your data source syncs with your index.
      + New, modified, deleted sync: Index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

   1. In **Sync run schedule**, for **Frequency**—Choose how often to sync your data source content and update your index.

   1. Choose **Next**.

1. On the **Set field mappings** page, enter the following information:

   1. Select from the Amazon Kendra generated default data source fields you want to map to your index. To add custom data source fields, create an index field name to map to and the field data type.

   1. Choose **Next**.

1. On the **Review and create** page, check that the information you have entered is correct and then select **Add data source**. You can also choose to edit your information from this page. Your data source will appear on the **Data sources** page after the data source has been added successfully.

------
#### [ API ]

**To connect Amazon Kendra to Confluence**

You must specify a JSON of the [data source schema](https://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html#ds-confluence-schema) using the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) API. You must provide the following information:
+ **Data source**—Specify the data source type as `CONFLUENCEV2` when you use the [https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) JSON schema. Also specify the data source as `TEMPLATE` when you call the [https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html](https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html) API.
+ **Host URL**—Specify the Confluence host URL instance. For example, *https://example.confluence.com*.
+ **Sync mode**—Specify how Amazon Kendra should update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option. You can choose between:
  + `FORCED_FULL_CRAWL` to freshly index all content, replacing existing content each time your data source syncs with your index.
  + `FULL_CRAWL` to index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source’s mechanism for tracking content changes and index content that changed since the last sync.
+ **Authentication type**—Specify the type of authentication, whether `Basic`, `OAuth2`, (Confluence Server only) `Personal-token`.
+ (Optional–For Confluence Server only) **SSL certificate location**—Specificy the `S3bucketName` and `s3certificateName` you used to store your SSL certificate.
+ **Secret Amazon Resource Name (ARN)**—Provide the Amazon Resource Name (ARN) of a Secrets Manager secret that contains the authentication credentials you configured in Confluence. If you use basic authentication, the secret is stored in a JSON structure with the following keys: 

  ```
  {
      "username": "email ID or user name",
      "password": "Confluence API token"
  }
  ```

  If you use OAuth 2.0 authentication, the secret is stored in a JSON structure with the following keys:

  ```
  {
      "confluenceAppKey": "app key",
      "confluenceAppSecret": "app secret",
      "confluenceAccessToken": "access token",
      "confluenceRefreshToken": "refresh token"
  }
  ```

  (For Confluence Server only) If you use basic authentication, the secret is stored in a JSON structure with the following keys:

  ```
  {
      "hostUrl": "Confluence Server host URL",
      "username": "Confluence Server user name",
      "password": "Confluence Server password"
  }
  ```

  (For Confluence Server only) If you use Personal Access Token authentication, the secret is stored in a JSON structure with the following keys:

  ```
  {
      "hostUrl": "Confluence Server host URL",
      "patToken": "personal access token"
  }
  ```
+ **IAM role**—Specify `RoleArn` when you call `CreateDataSource` to provide an IAM role with permissions to access your Secrets Manager secret and to call the required public APIs for the Confluence connector and Amazon Kendra. For more information, see [IAM roles for Confluence data sources](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds).

You can also add the following optional features:
+  **Virtual Private Cloud (VPC)**—Specify `VpcConfiguration` when you call `CreateDataSource`. For more information, see [Configuring Amazon Kendra to use an Amazon VPC](vpc-configuration.md).
+  **File size**—Specify the maximun file size to crawl.
+  **Document/content types**—Specify whether to crawl pages, page comments, page attachments, blogs, blog comments, blog attachments, spaces and archived spaces.
+ **Inclusion and exclusion filters**—Specify whether to include or exclude certain spaces, pages, blogs, and their comments and attachments.
**Note**  
Most data sources use regular expression patterns, which are inclusion or exclusion patterns referred to as filters. If you specify an inclusion filter, only content that matches the inclusion filter is indexed. Any document that doesn’t match the inclusion filter isn’t indexed. If you specify an inclusion and exclusion filter, documents that match the exclusion filter are not indexed, even if they match the inclusion filter.
+ **Web proxy**—Specify your web proxy information if you want to connect to your Confluence URL instance via a web proxy. You can use this option for Confluence Server.
+ **Access control list (ACL)**—Specify whether to crawl ACL information for your documents, if you have an ACL and want to use it for access control. The ACL specifies which documents that users and groups can access. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see [User context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources).
+ **Identity crawler**—Specify whether to turn on Amazon Kendra’s identity crawler. The identity crawler uses the access control list (ACL) information for your documents to filter search results based on the user or their group access to documents. If you have an ACL for your documents and choose to use your ACL, you can then also choose to turn on Amazon Kendra’s identity crawler to configure [user context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources) of search results. Otherwise, if identity crawler is turned off, all documents can be publicly searched. If you want to use access control for your documents and identity crawler is turned off, you can alternatively use the [PutPrincipalMapping](https://docs.aws.amazon.com/kendra/latest/APIReference/API_PutPrincipalMapping.html) API to upload user and group access information for user context filtering.
+  **Field mappings**—Choose to map your Confluence data source fields to your Amazon Kendra index fields. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html).
**Note**  
The document body field or the document body equivalent for your documents is required in order for Amazon Kendra to search your documents. You must map your document body field name in your data source to the index field name `_document_body`. All other fields are optional.

For a list of other important JSON keys to configure, see [Confluence template schema](https://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html#ds-confluence-schema).

------

### Notes
<a name="confluence-notes"></a>
+ Personal Access Token (PAT) is not available for Confluence Cloud.

# Confluence connector V1.0
<a name="data-source-v1-confluence"></a>

Confluence is a collaborative work-management tool designed for sharing, storing, and working on project planning, software development, and product management. You can use Amazon Kendra to index your Confluence spaces, pages (including nested pages), blogs, and comments and attachments to indexed pages and blogs.

**Note**  
Confluence connector V1.0 / ConfluenceConfiguration API ended in 2023. We recommend migrating to or using Confluence connector V2.0 / TemplateConfiguration API.

For troubleshooting your Amazon Kendra Confluence data source connector, see [Troubleshooting data sources](troubleshooting-data-sources.md).

**Topics**
+ [Supported features](#supported-features-v1-confluence)
+ [Prerequisites](#prerequisites-v1-confluence)
+ [Connection instructions](#data-source-procedure-v1-confluence)
+ [Learn more](#confluence-v1-learn-more)

## Supported features
<a name="supported-features-v1-confluence"></a>

Amazon Kendra Confluence data source connector supports the following features:
+ Field mappings
+ User access control
+ Inclusion/exclusion filters
+ (For Confluence Server only) Virtual private cloud (VPC)

## Prerequisites
<a name="prerequisites-v1-confluence"></a>

Before you can use Amazon Kendra to index your Confluence data source, make these changes in your Confluence and AWS accounts.

**In Confluence, make sure you have:**
+ Granted Amazon Kendra permissions to view all content within your Confluence instance by:
  + Making Amazon Kendra a member of `confluence-administrators` group.
  + Granting site-admin permissions for all existing spaces, blogs, and pages.
+ Copied the URL of your Confluence instance.
+ **For SSO (Single Sign-On) users:** Activated the **Show on login page** for the user name and password when you configure Confluence **Authentication methods** in Confluence Data Center.
+ **For Confluence Server**
  + Noted your basic authentication credentials containing your Confluence administrative account user name and password to connect to Amazon Kendra.
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).
  + **Optional:**Generated a personal access token in your Confluence account to connect to Amazon Kendra. For more information, see [Confluence documentation on generating personal access tokens](https://confluence.atlassian.com/enterprise/using-personal-access-tokens-1026032365.html).
+ **For Confluence Cloud**
  + Noted your basic authentication credentials containing your Confluence administrative account user name and password to connect to Amazon Kendra.
+ Checked each document is unique in Confluence and across other data sources you plan to use for the same index. Each data source that you want to use for an index must not contain the same document across the data sources. Document IDs are global to an index and must be unique per index.

**In your AWS account, make sure you have:**
+ [Created an Amazon Kendra index](https://docs.aws.amazon.com/kendra/latest/dg/create-index.html) and, if using the API, noted the index ID.
+ [Created an IAM role](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds) for your data source and, if using the API, noted the ARN of the IAM role.
**Note**  
If you change your authentication type and credentials, you must update your IAM role to access the correct AWS Secrets Manager secret ID.
+ Stored your Confluence authentication credentials in an AWS Secrets Manager secret and, if using the API, noted the ARN of the secret.
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).

If you don’t have an existing IAM role or secret, you can use the console to create a new IAM role and Secrets Manager secret when you connect your Confluence data source to Amazon Kendra. If you are using the API, you must provide the ARN of an existing IAM role and Secrets Manager secret, and an index ID.

## Connection instructions
<a name="data-source-procedure-v1-confluence"></a>

To connect Amazon Kendra to your Confluence data source, you must provide details of your Confluence credentials so that Amazon Kendra can access your data. If you have not yet configured Confluence for Amazon Kendra see [Prerequisites](#prerequisites-v1-confluence).

------
#### [ Console ]

**To connect Amazon Kendra to Confluence** 

1. Sign in to the AWS Management Console and open the [Amazon Kendra console](https://console.aws.amazon.com/kendra/).

1. From the left navigation pane, choose **Indexes** and then choose the index you want to use from the list of indexes.
**Note**  
You can choose to configure or edit your **User access control** settings under **Index settings**. 

1. On the **Getting started** page, choose **Add data source**.

1. On the **Add data source** page, choose **Confluence connector V1.0**, and then choose **Add data source**.

1. On the **Specify data source details** page, enter the following information:

   1. In **Name and description**, for **Data source name**—Enter a name for your data source. You can include hyphens but not spaces.

   1. (Optional)** Description**—Enter an optional description for your data source.

   1. In **Default language**—Choose a language to filter your documents for the index. Unless you specify otherwise, the language defaults to English. Language specified in the document metadata overrides the selected language.

   1. In **Tags**, for **Add new tag**—Include optional tags to search and filter your resources or track your AWS costs.

   1. Choose **Next**.

1. On the **Define access and security** page, enter the following information:

   1. Choose between **Confluence Cloud** and **Confluence Server**.

   1. If you choose **Confluence Cloud**, enter the following information:

      1. **Confluence URL**—Your Confluence URL.

      1. **AWS Secrets Manager secret**—Choose an existing secret or create a new Secrets Manager secret to store your Confluence authentication credentials. If you choose to create a new secret an AWS Secrets Manager secret window opens.

         1. Enter following information in the **Create an AWS Secrets Manager secret window**:

           1. **Secret name**—A name for your secret. The prefix ‘AmazonKendra-Confluence-’ is automatically added to your secret name.

           1. For **User name** and **Password**—Enter your Confluence user name and password.

           1. Choose **Save authentication**.

   1. If you choose **Confluence Server**, enter the following information:

      1. **Confluence URL**—Your Confluence user name and password.

      1. (Optional) For **Web proxy** enter the following information:

         1.  **Host name**—Host name for your Confluence account.

         1.  **Port number**—Port used by the host URL transport protocol.

      1. For **Authentication**, Choose either **Basic authentication** or (Confluence Server only) **Personal Access Token**.

      1. **AWS Secrets Manager secret**—Choose an existing secret or create a new Secrets Manager secret to store your Confluence authentication credentials. If you choose to create a new secret an AWS Secrets Manager secret window opens.

         1. Enter following information in the **Create an AWS Secrets Manager secret window**:

           1. **Secret name**—A name for your secret. The prefix ‘AmazonKendra-Confluence-’ is automatically added to your secret name.

           1. For **User name** and **Password**—Enter the authentication credential values you configured in Confluence. If using basic authentication, use your Confluence user name (email ID) and password (API token). If using personal access token, enter the details of the **Personal Access Token** you configured in Confluence account.

           1. Save and add your secret.

   1. **IAM role**—Choose an existing IAM role or create a new IAM role to access your repository credentials and index content.
**Note**  
IAM roles used for indexes cannot be used for data sources. If you are unsure if an existing role is used for an index or FAQ, choose **Create a new role** to avoid errors.

   1. Choose **Next**.

1. On the **Configure sync settings** page, enter the following information:

   1. For **Include personal spaces** and **Include archived spaces**—Choose the optional space types to include in this data source.

   1. For **Additional configuration**—Specify regular expression patterns to include or exclude certain content. You can add up to 100 patterns.

   1. You can also choose to **Crawl attachments within chosen spaces**.

   1. In **Sync run schedule**, for **Frequency**—Choose how often Amazon Kendra will sync with your data source.

   1. Choose **Next**.

1. On the **Set field mappings** page, enter the following information:

   1. For **Space**, **Page**, **Blog**—Select from the Amazon Kendra generated default data source fields or **Additional suggested field mappings** to add index fields.

   1.  **Add field**—To add custom data source fields to create an index field name to map to and the field data type.

   1. Choose **Next**.

1. On the **Review and create** page, check that the information you have entered is correct and then select **Add data source**. You can also choose to edit your information from this page. Your data source will appear on the **Data sources** page after the data source has been added successfully.

------
#### [ API ]

**To connect Amazon Kendra to Confluence**

You must specify the following using [ConfluenceConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_ConfluenceConfiguration.html) API:
+ **Confluence version**—Specify the version of the Confluence instance you are using as `CLOUD` or `SERVER`.
+ **Secret Amazon Resource Name (ARN)**—Provide the Amazon Resource Name (ARN) of an Secrets Manager secret that contains your Confluence authentication credentials.

  If you use Confluence Server, you can use either your Confluence user name and password, or your personal access token as the authentication credentials.

  If you use your Confluence user name and password as authentication credentials, you store the following credentials as a JSON structure in your Secrets Manager secret:

  ```
  {
      "username": "user name",
      "password": "password"
  }
  ```

  If you use a personal access token to connect Confluence Server to Amazon Kendra, you store the following credentials as a JSON structure in your Secrets Manager secret:

  ```
  {
      "patToken": "personal access token"
  }
  ```

  If you use Confluence Cloud, you use your Confluence user name and an API token, configured in Confluence, as your password. You store the following credentials as a JSON structure in your Secrets Manager secret:

  ```
  {
      "username": "user name",
      "password": "API token"
  }
  ```
+ **IAM role**—Specify `RoleArn` when you call `CreateDataSource` to provide an IAM role with permissions to access your Secrets Manager secret and to call the required public APIs for the Confluence connector and Amazon Kendra. For more information, see [IAM roles for Confluence data sources](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds).

You can also add the following optional features:
+ **Web proxy**—Whether to connect to your Confluence URL instance via a web proxy. You can use this option for Confluence Server.
+ (For Confluence Server only) **Virtual Private Cloud (VPC)**—Specify `VpcConfiguration` as part of the data source configuration. See [Configuring Amazon Kendra to use a VPC](https://docs.aws.amazon.com/kendra/latest/dg/vpc-configuration.html).
+  **Inclusion and exclusion filters**—Specify regular expression patterns to include or exclude certain spaces, blog posts, pages, spaces, and attachments. If you choose to index attachments, only attachments to the indexed pages and blogs are indexed.
**Note**  
Most data sources use regular expression patterns, which are inclusion or exclusion patterns referred to as filters. If you specify an inclusion filter, only content that matches the inclusion filter is indexed. Any document that doesn’t match the inclusion filter isn’t indexed. If you specify an inclusion and exclusion filter, documents that match the exclusion filter are not indexed, even if they match the inclusion filter.
+  **Field mappings**—Choose to map your Confluence data source fields to your Amazon Kendra index fields. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html).
**Note**  
The document body field or the document body equivalent for your documents is required in order for Amazon Kendra to search your documents. You must map your document body field name in your data source to the index field name `_document_body`. All other fields are optional.
+  **User context filtering and access control**—Amazon Kendra crawls the access control list (ACL) for your documents, if you have an ACL for your documents. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see [User context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#datasource-context-filter).

------

## Learn more
<a name="confluence-v1-learn-more"></a>

To learn more about integrating Amazon Kendra with your Confluence data source, see:
+ [Configuring your Amazon Kendra Confluence Server connector ](https://aws.amazon.com/blogs/machine-learning/configuring-your-amazon-kendra-confluence-server-connector/)

# Custom data source connector
<a name="data-source-custom"></a>

Use a custom data source when you have a repository that Amazon Kendra doesn’t yet provide a data source connector for. You can use it to see the same run history metrics that Amazon Kendra data sources provide even when you can't use Amazon Kendra's data sources to sync your repositories. Use this to create a consistent sync monitoring experience between Amazon Kendra data sources and custom ones. Specifically, use a custom data source to see sync metrics for a data source connector that you created using the [BatchPutDocument](https://docs.aws.amazon.com/kendra/latest/APIReference/API_BatchPutDocument.html) and [BatchDeleteDocument](https://docs.aws.amazon.com/kendra/latest/APIReference/API_BatchDeleteDocument.html) APIs.

For troubleshooting your Amazon Kendra custom data source connector, see [Troubleshooting data sources](troubleshooting-data-sources.md).

When you create a custom data source, you have complete control over how the documents to index are selected. Amazon Kendra only provides metric information that you can use to monitor your data source sync jobs. You must create and run the crawler that determines the documents your data source indexes.

You must specify the main title of your documents using the [Document](https://docs.aws.amazon.com/kendra/latest/APIReference/API_Document.html) object, and `_source_uri` in [DocumentAttribute](https://docs.aws.amazon.com/kendra/latest/APIReference/API_DocumentAttribute.html) in order to have `DocumentTitle` and `DocumentURI` included in the response of the `Query` result.

You create an identifier for your custom data source using the console or by using the [CreateDataSource](https://docs.aws.amazon.com/kendra/latest/APIReference/API_CreateDataSource.html) API. To use the console, give your data source a name, and optionally a description and resource tags. After the data source is created, a data source ID is shown. Copy this ID to use when you synchronize the data source with the index.

![\[Form for specifying data source details, including name, description, and optional tags.\]](http://docs.aws.amazon.com/kendra/latest/dg/images/CustomDataSource.png)


You can also create a custom data source using the `CreateDataSource` API. The API returns an ID to use when you synchronize the data source. When you use the `CreateDataSource` API to create a custom data source, you can't set the `Configuration`, `RoleArn` or `Schedule` parameters. If you set these parameters, Amazon Kendra returns a `ValidationException` exception.

To use a custom data source, create an application that is responsible for updating the Amazon Kendra index. The application depends on a crawler that you create. The crawler reads the documents in your repository and determines which should be sent to Amazon Kendra. Your application should perform the following steps: 

1. Crawl your repository and make a list of the documents in your repository that are added, updated, or deleted.

1. Call the [StartDataSourceSyncJob](https://docs.aws.amazon.com/kendra/latest/APIReference/API_StartDataSourceSyncJob.html) API to signal that a sync job is starting. You provide a data source ID to identify the data source that is synchronizing. Amazon Kendra returns a execution ID to identify a particular sync job.

1. Call the [BatchDeleteDocument](https://docs.aws.amazon.com/kendra/latest/APIReference/API_BatchDeleteDocument.html) API to remove documents from the index. You provide the data source ID and execution ID to identify the data source that is synchronizing and the job that this update is associated with.

1. Call the [StopDataSourceSyncJob](https://docs.aws.amazon.com/kendra/latest/APIReference/API_StopDataSourceSyncJob.html) API to signal the end of the sync job. After you call the `StopDataSourceSyncJob` API, the associated execution ID is no longer valid.

1. Call the [ListDataSourceSyncJobs](https://docs.aws.amazon.com/kendra/latest/APIReference/API_ListDataSourceSyncJobs.html) API with the index and data source identifiers to list the sync jobs for the data source and to see metrics for the sync jobs.

After you end a sync job, you can start a new synchronization job. There can be a period of time before all of the submitted documents are added to the index. Use the `ListDataSourceSyncJobs` API to see the status of the sync job. If the `Status` returned for the sync job is `SYNCING_INDEXING`, some documents are still being indexed. You can start a new sync job when the status of the previous job is `FAILED` or `SUCCEEDED`.

After you call the `StopDataSourceSyncJob` API, you can't use a sync job identifier in a call to the `BatchPutDocument` or `BatchDeleteDocument` APIs. If you do, all of the documents submitted are returned in the `FailedDocuments` response message from the API.

## Required attributes
<a name="custom-required-attributes"></a>

When you submit a document to Amazon Kendra using the `BatchPutDocument` API, each document requires two attributes to identify the data source and synchronization run that it belongs to. You must provide the following two attributes to map documents from your custom data source correctly to an Amazon Kendra index:
+ `_data_source_id`—The identifier of the data source. This is returned when you create the data source with the console or the `CreateDataSource` API.
+ `_data_source_sync_job_execution_id`—The identifier of the sync run. This is returned when you start the index synchronization with the `StartDataSourceSyncJob` API.

The following is the JSON required to index a document using a custom data source.

```
{
    "Documents": [
        {
            "Attributes": [
                {
                    "Key": "_data_source_id",
                    "Value": {
                        "StringValue": "data source identifier"
                    }
                },
                {
                    "Key": "_data_source_sync_job_execution_id",
                    "Value": {
                        "StringValue": "sync job identifier"
                    }
                }
            ],
            "Blob": "document content",
            "ContentType": "content type",
            "Id": "document identifier",
            "Title": "document title"
        }
    ],
    "IndexId": "index identifier",
    "RoleArn": "IAM role ARN"
}
```

When you remove a document from the index using the `BatchDeleteDocument` API, you need to specify the following two fields in the `DataSourceSyncJobMetricTarget` parameter:
+ `DataSourceId`—The identifier of the data source. This is returned when you create the data source with the console or the `CreateDataSource` API.
+ `DataSourceSyncJobId`—The identifier of the sync run. This is returned when you start the index synchronization with the `StartDataSourceSyncJob` API.

The following is the JSON required to delete a document from the index using the `BatchDeleteDocument` API.

```
{
    "DataSourceSyncJobMetricTarget": {
        "DataSourceId": "data source identifier",
        "DataSourceSyncJobId": "sync job identifier"
    },
    "DocumentIdList": [
        "document identifier"
    ],
    "IndexId": "index identifier"
}
```

## Viewing metrics
<a name="custom-metrics"></a>

After a sync job is finished, you can use the [DataSourceSyncJobMetrics](https://docs.aws.amazon.com/kendra/latest/APIReference/API_DataSourceSyncJobMetrics.html) API to get the metrics associated with the sync job. Use this to monitor your custom data source syncs.

If you submit the same document multiple times, either as part of the `BatchPutDocument` API, the `BatchDeleteDocument` API, or if the document is submitted for both addition and deletion, the document is only counted once in the metrics.
+ `DocumentsAdded`—The number of documents submitted using the `BatchPutDocument` API associated with this sync job added to the index for the first time. If a document is submitted for addition more than once in a sync, the document is only counted once in the metrics.
+ `DocumentsDeleted`—The number of documents submitted using the `BatchDeleteDocument` API associated with this sync job deleted from the index. If a document is submitted for deletion more than once in a sync, the document is only counted once in the metrics.
+ `DocumentsFailed`—The number of documents associated with this sync job that failed indexing. These are documents that were accepted by Amazon Kendra for indexing but could not be indexed or deleted. If a document isn't accepted by Amazon Kendra, the identifier for the document is returned in the `FailedDocuments` response property of the `BatchPutDocument` and `BatchDeleteDocument` APIs.
+ `DocumentsModified`—The number of modified documents submitted using the `BatchPutDocument` API associated with this sync job that were modified in the Amazon Kendra index.

Amazon Kendra also emits Amazon CloudWatch metrics while indexing documents. For more information, see [Monitoring Amazon Kendra with Amazon CloudWatch](https://docs.aws.amazon.com/kendra/latest/dg/cloudwatch-metrics.html).

Amazon Kendra doesn't return the `DocumentsScanned` metric for custom data sources. It also emits the CloudWatch metrics listed in the document [Metrics for Amazon Kendra data sources](https://docs.aws.amazon.com/kendra/latest/dg/cloudwatch-metrics.html#cloudwatch-metrics-data-source).

## Learn more
<a name="custom-learn-more"></a>

To learn more about integrating Amazon Kendra with your custom data source, see:
+ [Adding custom data sources to Amazon Kendra](https://aws.amazon.com/blogs/machine-learning/adding-custom-data-sources-to-amazon-kendra/)

# Custom data source (Java)
<a name="custom-java-sample"></a>

The following code provides a sample implementation of a custom data source using Java. The program first creates a custom data source and then synchronizes newly added documents to the index with the custom data source.

The following code demonstrates creating and using a custom data source. When you use a custom data source in your application you don't need to create a new data source (one-off process) each time that you synchronize your index with your data source. You use the index ID and data source ID to synchronize your data.

```
package com.amazonaws.kendra;

import java.util.concurrent.TimeUnit;
import software.amazon.awssdk.services.kendra.KendraClient;
import csoftware.amazon.awssdk.services.kendra.model.BatchPutDocumentRequest;
import csoftware.amazon.awssdk.services.kendra.model.BatchPutDocumentResponse;
import software.amazon.awssdk.services.kendra.model.CreateDataSourceRequest;
import software.amazon.awssdk.services.kendra.model.CreateDataSourceResponse;
import software.amazon.awssdk.services.kendra.model.DataSourceType;
import software.amazon.awssdk.services.kendra.model.Document;
import software.amazon.awssdk.services.kendra.model.ListDataSourceSyncJobsRequest;
import software.amazon.awssdk.services.kendra.model.ListDataSourceSyncJobsResponse;
import software.amazon.awssdk.services.kendra.model.StartDataSourceSyncJobRequest;
import software.amazon.awssdk.services.kendra.model.StartDataSourceSyncJobResponse;
import software.amazon.awssdk.services.kendra.model.StopDataSourceSyncJobRequest;
import software.amazon.awssdk.services.kendra.model.StopDataSourceSyncJobResponse;

public class SampleSyncForCustomDataSource {
  public static void main(String[] args) {
    KendraClient kendra = KendraClient.builder().build();

    String myIndexId = "yourIndexId";
    String dataSourceName = "custom data source";
    String dataSourceDescription = "Amazon Kendra custom data source connector"
	
    // Create custom data source
    CreateDataSourceRequest createDataSourceRequest = CreateDataSourceRequest
        .builder()
        .indexId(myIndexId)
        .name(dataSourceName)
        .description(dataSourceDescription)
        .type(DataSourceType.CUSTOM)
        .build();
    	
    CreateDataSourceResponse createDataSourceResponse = kendra.createDataSource(createDataSourceRequest);
    System.out.println(String.format("Response of creating data source: %s", createDataSourceResponse));
	
    // Get the data source ID from createDataSourceResponse
    String dataSourceId = createDataSourceResponse.Id();

    // Wait for the custom data source to become active
    System.out.println(String.format("Waiting for Amazon Kendra to create the data source %s", dataSourceId));
    // You can use the DescribeDataSource API to check the status
    DescribeDataSourceRequest describeDataSourceRequest = DescribeDataSourceRequest
        .builder()
        .indexId(myIndexId)
        .id(dataSourceId)
        .build();

    while (true) {
        DescribeDataSourceResponse describeDataSourceResponse = kendra.describeDataSource(describeDataSourceRequest);

        DataSourceStatus status = describeDataSourceResponse.status();
        System.out.println(String.format("Creating data source. Status: %s", status));
        if (status != DataSourceStatus.CREATING) {
            break;
        }
        
        TimeUnit.SECONDS.sleep(60);
    }
    
    // Start syncing yor data source by calling StartDataSourceSyncJob and providing your index ID 
    // and your custom data source ID
    System.out.println(String.format("Synchronize the data source %s", dataSourceId));
    StartDataSourceSyncJobRequest startDataSourceSyncJobRequest = StartDataSourceSyncJobRequest
        .builder()
        .indexId(myIndexId)
        .id(dataSourceId)
        .build();
    StartDataSourceSyncJobResponse startDataSourceSyncJobResponse = kendra.startDataSourceSyncJob(startDataSourceSyncJobRequest);
    
    // Get the  sync job execution ID from startDataSourceSyncJobResponse
    String executionId = startDataSourceSyncJobResponse.ExecutionId();
	System.out.println(String.format("Waiting for the data source to sync with the index %s for execution ID %s", indexId, startDataSourceSyncJobResponse.executionId()));
    
    // Add 2 documents uploaded to S3 bucket to your index using the BatchPutDocument API
    // The added documents should sync with your custom data source
    Document pollyDoc = Document
        .builder()
        .s3Path(
            S3Path.builder()
            .bucket("amzn-s3-demo-bucket")
            .key("what_is_Amazon_Polly.docx")
            .build())
        .title("What is Amazon Polly?")
        .id("polly_doc_1")
        .build();
    
    Document rekognitionDoc = Document
        .builder()
        .s3Path(
            S3Path.builder()
            .bucket("amzn-s3-demo-bucket")
            .key("what_is_amazon_rekognition.docx")
            .build())
        .title("What is Amazon rekognition?")
        .id("rekognition_doc_1")
        .build();
    
    BatchPutDocumentRequest batchPutDocumentRequest = BatchPutDocumentRequest
        .builder()
        .indexId(myIndexId)
        .documents(pollyDoc, rekognitionDoc)
        .build();
    
    BatchPutDocumentResponse result = kendra.batchPutDocument(batchPutDocumentRequest);
    System.out.println(String.format("BatchPutDocument result: %s", result));
    
    // Once custom data source synced, stop the sync job using the StopDataSourceSyncJob API
    StopDataSourceSyncJobResponse stopDataSourceSyncJobResponse = kendra.stopDataSourceSyncJob(
        StopDataSourceSyncJobRequest()
            .indexId(myIndexId)
            .id(dataSourceId)
    );
	
	// List your sync jobs
    ListDataSourceSyncJobsRequest listDataSourceSyncJobsRequest = ListDataSourceSyncJobsRequest
        .builder()
        .indexId(myIndexId)
        .id(dataSourceId)
        .build();
    
    while (true) {
        ListDataSourceSyncJobsResponse listDataSourceSyncJobsResponse = kendra.listDataSourceSyncJobs(listDataSourceSyncJobsRequest);
        DataSourceSyncJob job = listDataSourceSyncJobsResponse.history().get(0);
        System.out.println(String.format("Status: %s", job.status()));
    }
  }
}
```

# Dropbox
<a name="data-source-dropbox"></a>

Dropbox is a file hosting service that offers cloud storage, document organization, and document templating services. If you are a Dropbox user, you can use Amazon Kendra to index your Dropbox files, Dropbox Paper, Dropbox Paper Templates, and stored shortcuts to web pages. You can also configure Amazon Kendra to index specific Dropbox files, Dropbox Paper, Dropbox Paper Templates, and stored shortcuts to web pages.

Amazon Kendra supports both Dropbox and Dropbox Advanced for Dropbox Business.

You can connect Amazon Kendra to your Dropbox data source using the [Amazon Kendra console](https://console.aws.amazon.com/kendra/) and the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html) API.

For troubleshooting your Amazon Kendra Dropbox data source connector, see [Troubleshooting data sources](troubleshooting-data-sources.md).

**Topics**
+ [Supported features](#supported-features-dropbox)
+ [Prerequisites](#prerequisites-dropbox)
+ [Connection instructions](#data-source-procedure-dropbox)
+ [Learn more](#dropbox-learn-more)
+ [Notes](#dropbox-notes)

## Supported features
<a name="supported-features-dropbox"></a>

Amazon Kendra Dropbox data source connector supports the following features:
+ Field mappings
+ User access control
+ Inclusion/exclusion filters
+ Full and incremental content syncs
+ Virtual private cloud (VPC)

## Prerequisites
<a name="prerequisites-dropbox"></a>

Before you can use Amazon Kendra to index your Dropbox data source, make these changes in your Dropbox and AWS accounts.

**In Dropbox, make sure you have:**
+ Created a Dropbox Advanced account and set up an admin user.
+ Configured a Dropbox app with a unique **App name**, activated **Scoped Access**. See [Dropbox documentation on creating an app](https://www.dropbox.com/developers/reference/getting-started#app%20console).
+ Activated **Full Dropbox** permissions on the Dropbox console and added the following permissions:
  + files.content.read
  + files.metadata.read
  + sharing.read
  + file\$1requests.read
  + groups.read
  + team\$1info.read
  + team\$1data.content.read
+ Noted your Dropbox app key, Dropbox app secret, and Dropbox access token for basic authentication credentials.
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).
+ Configured and copied a temporary Oauth 2.0 access token for your Dropbox app. This token is temporary and expires after 4 hours. See [Dropbox documentation on OAuth authentication](https://developers.dropbox.com/oauth-guide).
**Note**  
It is recommended that you create a Dropbox refresh access token that never expires, rather that relying on a one-time access token that expires after 4 hours. A refresh access token is permanent and never expires so that you can continue to sync your data source in the future.
+ **Recommended:** Configured a Dropbox permanent refresh token that never expires to allow Amazon Kendra to continue to sync your data source without any disruptions. See [Dropbox documentation on refresh tokens](https://developers.dropbox.com/oauth-guide).
+ Checked each document is unique in Dropbox and across other data sources you plan to use for the same index. Each data source that you want to use for an index must not contain the same document across the data sources. Document IDs are global to an index and must be unique per index.

**In your AWS account, make sure you have:**
+ [Created an Amazon Kendra index](https://docs.aws.amazon.com/kendra/latest/dg/create-index.html) and, if using the API, noted the index ID.
+ [Created an IAM role](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds) for your data source and, if using the API, noted the ARN of the IAM role.
**Note**  
If you change your authentication type and credentials, you must update your IAM role to access the correct AWS Secrets Manager secret ID.
+ Stored your Dropbox authentication credentials in an AWS Secrets Manager secret and, if using the API, noted the ARN of the secret.
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).

If you don’t have an existing IAM role or secret, you can use the console to create a new IAM role and Secrets Manager secret when you connect your Dropbox data source to Amazon Kendra. If you are using the API, you must provide the ARN of an existing IAM role and Secrets Manager secret, and an index ID.

## Connection instructions
<a name="data-source-procedure-dropbox"></a>

To connect Amazon Kendra to your Dropbox data source, you must provide the necessary details of your Dropbox data source so that Amazon Kendra can access your data. If you have not yet configured Dropbox for Amazon Kendra, see [Prerequisites](#prerequisites-dropbox).

------
#### [ Console ]

**To connect Amazon Kendra to Dropbox** 

1. Sign in to the AWS Management Console and open the [Amazon Kendra console](https://console.aws.amazon.com/kendra/).

1. From the left navigation pane, choose **Indexes** and then choose the index you want to use from the list of indexes.
**Note**  
You can choose to configure or edit your **User access control** settings under **Index settings**. 

1. On the **Getting started** page, choose **Add data source**.

1. On the **Add data source** page, choose **Dropbox connector**, and then choose **Add connector**. If using version 2 (if applicable), choose **Dropbox connector** with the "V2.0" tag.

1. On the **Specify data source details** page, enter the following information:

   1. In **Name and description**, for **Data source name**—Enter a name for your data source. You can include hyphens but not spaces.

   1. (Optional)** Description**—Enter an optional description for your data source.

   1. In **Default language**—Choose a language to filter your documents for the index. Unless you specify otherwise, the language defaults to English. Language specified in the document metadata overrides the selected language.

   1. In **Tags**, for **Add new tag**—Include optional tags to search and filter your resources or track your AWS costs.

   1. Choose **Next**.

1. On the **Define access and security** page, enter the following information:

   1. **Authorization**—Turn on or off access control list (ACL) information for your documents, if you have an ACL and want to use it for access control. The ACL specifies which documents that users and groups can access. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see [User context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources).

   1. **Type of authentication token**—Choose either a permanent token (recommended) or a temporary access token.

   1. **AWS Secrets Manager secret**—Choose an existing secret or create a new Secrets Manager secret to store your Dropbox authentication credentials. If you choose to create a new secret an AWS Secrets Manager secret window opens.

      1. Enter following information in the **Create an AWS Secrets Manager secret window**:

         1. **Secret name**—A name for your secret. The prefix ‘AmazonKendra-Dropbox-’ is automatically added to your secret name.

         1. For **App key**, **App secret**, and token information (permanent or temporary)—Enter the authentication credential values configured in Dropbox.

      1. Save and add your secret.

   1. **Virtual Private Cloud (VPC)**—You can choose to use a VPC. If so, you must add **Subnets** and **VPC security groups**.

   1. **Identity crawler**—Specify whether to turn on Amazon Kendra’s identity crawler. The identity crawler uses the access control list (ACL) information for your documents to filter search results based on the user or their group access to documents. If you have an ACL for your documents and choose to use your ACL, you can then also choose to turn on Amazon Kendra’s identity crawler to configure [user context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources) of search results. Otherwise, if identity crawler is turned off, all documents can be publicly searched. If you want to use access control for your documents and identity crawler is turned off, you can alternatively use the [PutPrincipalMapping](https://docs.aws.amazon.com/kendra/latest/APIReference/API_PutPrincipalMapping.html) API to upload user and group access information for user context filtering.

   1. **IAM role**—Choose an existing IAM role or create a new IAM role to access your repository credentials and index content.
**Note**  
IAM roles used for indexes cannot be used for data sources. If you are unsure if an existing role is used for an index or FAQ, choose **Create a new role** to avoid errors.

   1. Choose **Next**.

1. On the **Configure sync settings** page, enter the following information:

   1. For **Select entities or content types**—Choose Dropbox entities or content types you want to crawl.

   1. In **Additional configuration** for **Regex patterns**—Add regular expression patterns to include or exclude certain files.

   1. **Sync mode**—Choose how you want to update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option.
      + Full sync: Freshly index all content, replacing existing content each time your data source syncs with your index.
      + New, modified sync: Index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.
      + New, modified, deleted sync: Index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

   1. In **Sync run schedule**, for **Frequency**—Choose how often to sync your data source content and update your index.

   1. Choose **Next**.

1. On the **Set field mappings** page, enter the following information:

   1. **Files**, **Dropbox Paper**, and **Dropbox Paper templates**—Select from the Amazon Kendra generated default data source fields you want to map to your index. 

   1.  **Add field**—To add custom data source fields to create an index field name to map to and the field data type.

   1. Choose **Next**.

1. On the **Review and create** page, check that the information you have entered is correct and then select **Add data source**. You can also choose to edit your information from this page. Your data source will appear on the **Data sources** page after the data source has been added successfully.

------
#### [ API ]

**To connect Amazon Kendra to Dropbox**

You must specify a JSON of the [data source schema](https://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html#ds-dropbox-schema) using the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html) API. You must provide the following information:
+ **Data source**—Specify the data source type as `DROPBOX` when you use the [https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) JSON schema. Also specify the data source as `TEMPLATE` when you call the [https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html](https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html) API.
+ **Sync mode**—Specify how Amazon Kendra should update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option. You can choose between:
  + `FORCED_FULL_CRAWL` to freshly index all content, replacing existing content each time your data source syncs with your index.
  + `FULL_CRAWL` to index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source’s mechanism for tracking content changes and index content that changed since the last sync.
  + `CHANGE_LOG` to index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source’s mechanism for tracking content changes and index content that changed since the last sync.
+ **Access token type**—Specify whether you want to use permanent or temporary access token for your AWS Secrets Manager secret that stores your authentication crednetials.
**Note**  
It's recommended that you create a refresh access token that never expires in Dropbox rather that relying on a one-time access token that expires after 4 hours. You create an app and a refresh access token in the Dropbox developer console and provide the access token in your secret.
+ **Secret Amazon Resource Name (ARN)**—Provide the Amazon Resource Name (ARN) of an Secrets Manager secret that contains the authentication credentials for your Dropbox account. The secret is stored in a JSON structure with the following keys:

  ```
  {
      "appKey": "Dropbox app key",
      "appSecret": "Dropbox app secret",
      "accesstoken": "temporary access token or refresh access token"
  }
  ```
+ **Identity crawler**—Specify whether to turn on Amazon Kendra’s identity crawler. The identity crawler uses the access control list (ACL) information for your documents to filter search results based on the user or their group access to documents. If you have an ACL for your documents and choose to use your ACL, you can then also choose to turn on Amazon Kendra’s identity crawler to configure [user context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources) of search results. Otherwise, if identity crawler is turned off, all documents can be publicly searched. If you want to use access control for your documents and identity crawler is turned off, you can alternatively use the [PutPrincipalMapping](https://docs.aws.amazon.com/kendra/latest/APIReference/API_PutPrincipalMapping.html) API to upload user and group access information for user context filtering.
+ **IAM role**—Specify `RoleArn` when you call `CreateDataSource` to provide an IAM role with permissions to access your Secrets Manager secret and to call the required public APIs for the Dropbox connector and Amazon Kendra. For more information, see [IAM roles for Dropbox data sources](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds).

You can also add the following optional features:
+  **Virtual Private Cloud (VPC)**—Specify `VpcConfiguration` when you call `CreateDataSource`. For more information, see [Configuring Amazon Kendra to use an Amazon VPC](vpc-configuration.md).
+ **Document/content types**—Specify whether to crawl files in your Dropbox, Dropbox Paper documents, Dropbox Paper templates, and web page shortcuts stored in your Dropbox.
+ **Inclusion and exclusion filters**—Specify whether to include or exclude certain files.
**Note**  
Most data sources use regular expression patterns, which are inclusion or exclusion patterns referred to as filters. If you specify an inclusion filter, only content that matches the inclusion filter is indexed. Any document that doesn’t match the inclusion filter isn’t indexed. If you specify an inclusion and exclusion filter, documents that match the exclusion filter are not indexed, even if they match the inclusion filter.
+ **Access control list (ACL)**—Specify whether to crawl ACL information for your documents, if you have an ACL and want to use it for access control. The ACL specifies which documents that users and groups can access. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see [User context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources).
+  **Field mappings**—Choose to map your Dropbox data source fields to your Amazon Kendra index fields. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html).
**Note**  
The document body field or the document body equivalent for your documents is required in order for Amazon Kendra to search your documents. You must map your document body field name in your data source to the index field name `_document_body`. All other fields are optional.

For a list of other important JSON keys to configure, see [Dropbox template schema](https://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html#ds-dropbox-schema).

------

## Learn more
<a name="dropbox-learn-more"></a>

To learn more about integrating Amazon Kendra with your Dropbox data source, see:
+ [Index your Dropbox content using the Dropbox connector for Amazon Kendra](https://aws.amazon.com/blogs/machine-learning/index-your-dropbox-content-using-the-dropbox-connector-for-amazon-kendra/)

## Notes
<a name="dropbox-notes"></a>
+ When Access Control Lists (ACLs) are enabled, the "Sync only new or modified content" option is not available due to Dropbox API limitations. We recommend using "Full sync" or "New, modified, or deleted content sync" modes instead, or disable ACLs if you need to use this sync mode.

# Drupal
<a name="data-source-drupal"></a>

**Note**  
Drupal connector remains fully supported for existing customers through May 31, 2026. While this connector is no longer available for new users, current users can continue to use it without interruption. We are continuously evolving our connector portfolio to offer more scalable and customizable solutions. For future integrations, we recommend exploring the Amazon Kendra Custom Connector Framework[1], designed to support a broader range of enterprise use cases with enhanced flexibility.

Drupal is an open-source content management system (CMS) that you can use to create websites and web applications. You can use Amazon Kendra to index the following in Drupal:
+ Content—Articles, Basic pages, Basic blocks, User defined content types, User defined block types, Custom content types, Custom block types
+ Comment—For any Content type and Block type
+ Attachments—For any Content type and Block type

You can connect Amazon Kendra to your Drupal data source using the [Amazon Kendra console](https://console.aws.amazon.com/kendra/) or the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) API.

For troubleshooting your Amazon Kendra Drupal data source connector, see [Troubleshooting data sources](troubleshooting-data-sources.md).

**Topics**
+ [Supported features](#supported-features-drupal)
+ [Prerequisites](#prerequisites-drupal)
+ [Connection instructions](#data-source-procedure-drupal)
+ [Notes](#drupal-notes)

## Supported features
<a name="supported-features-drupal"></a>

Amazon Kendra Drupal data source connector supports the following features:
+ Field mappings
+ User context filtering
+ Inclusion/exclusion filters
+ Full and incremental content syncs
+ Virtual private cloud (VPC)

## Prerequisites
<a name="prerequisites-drupal"></a>

Before you can use Amazon Kendra to index your Drupal data source, make these changes in your Drupal and AWS accounts.

**In Drupal, make sure you have:**
+ Created a Drupal (Standard) Suite account and a user with an administrator role.
+ Copied your Drupal site name and configured a host url. For example, *https://<hostname>/<drupalsitename>*.
+ Configured basic authentication credentials containing a user name (Drupal website login user name) and password (Drupal website password).
+ **Recommended:** Configured an OAuth 2.0 credential token. Use this token along with your Drupal password grant, client id, client secret, user name (Drupal website login user name) and password (Drupal website password) to connect to Amazon Kendra.
+ Added the following permissions in your Drupal account using an administrator role:
  + administer blocks
  + administer block\$1content display
  + administer block\$1content fields
  + administer block\$1content form display
  + administer views
  + view user email addresses
  + view own unpublished content
  + view page revisions
  + view article revisions
  + view all revisions
  + view the administration theme
  + access content
  + access content overview
  + access comments
  + search content
  + access files overview
  + access contextual links
**Note**  
If there are user defined content types or user defined block types, or any views and blocks are added to the Drupal website, they must be provided with administrator access.

**In your AWS account, make sure you have:**
+ [Created an Amazon Kendra index](https://docs.aws.amazon.com/kendra/latest/dg/create-index.html) and, if using the API, noted the index ID.
+ [Created an IAM role](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds) for your data source and, if using the API, noted the ARN of the IAM role.
**Note**  
If you change your authentication type and credentials, you must update your IAM role to access the correct AWS Secrets Manager secret ID.
+ Stored your Drupal authentication credentials in an AWS Secrets Manager secret and, if using the API, noted the ARN of the secret.
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).

If you don’t have an existing IAM role or secret, you can use the console to create a new IAM role and Secrets Manager secret when you connect your Drupal data source to Amazon Kendra. If you are using the API, you must provide the ARN of an existing IAM role and Secrets Manager secret, and an index ID.

## Connection instructions
<a name="data-source-procedure-drupal"></a>

To connect Amazon Kendra to your Drupal data source you must provide details of your Drupal credentials so that Amazon Kendra can access your data. If you have not yet configured Drupal for Amazon Kendra see [Prerequisites](#prerequisites-drupal).

------
#### [ Console ]

**To connect Amazon Kendra to Drupal** 

1. Sign in to the AWS Management Console and open the [Amazon Kendra console](https://console.aws.amazon.com/kendra/).

1. From the left navigation pane, choose **Indexes** and then choose the index you want to use from the list of indexes.
**Note**  
You can choose to configure or edit your **User access control** settings under **Index settings**. 

1. On the **Getting started** page, choose **Add data source**.

1. On the **Add data source** page, choose **Drupal connector**, and then choose **Add connector**. If using version 2 (if applicable), choose **Drupal connector** with the "V2.0" tag.

1. On the **Specify data source details** page, enter the following information:

   1. In **Name and description**, for **Data source name**—Enter a name for your data source. You can include hyphens but not spaces.

   1. (Optional)** Description**—Enter an optional description for your data source.

   1. In **Default language**—Choose a language to filter your documents for the index. Unless you specify otherwise, the language defaults to English. Language specified in the document metadata overrides the selected language.

   1. In **Tags**, for **Add new tag**—Include optional tags to search and filter your resources or track your AWS costs.

   1. Choose **Next**.

1. On the **Define access and security** page, enter the following information:

   1. In **Source**, for **Host URL**—The host URL of your Drupal site. For example, *https://<hostname>/<drupalsitename>*.

   1. For **SSL certificate location**—Enter the path to the SSL certificate stored in your Amazon S3 bucket.

   1. **Authorization**—Turn on or off access control list (ACL) information for your documents, if you have an ACL and want to use it for access control. The ACL specifies which documents that users and groups can access. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see [User context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources).

   1. For **Authentication**—Choose between **Basic authentication** and **OAuth 2.0 authentication** based on your use case.

   1. **AWS Secrets Manager secret**—Choose an existing secret or create a new Secrets Manager secret to store your Drupal authentication credentials. If you choose to create a new secret an AWS Secrets Manager secret window opens.

      1. Enter following information in the **Create an AWS Secrets Manager secret window**:

         1. If you chose **Basic authentication**, enter a **Secret Name**, the **User name**, (Drupal site user name), and **Password** (Drupal site password) that you copied and choose **Save and add secret**.

         1. If you chose **OAuth 2.0 authentication**, enter a **Secret Name**, **User name** (Drupal site user name), **Password** (Drupal site password), **Client ID**, and **Client secret** generated in your Drupal account and choose **Save and add secret**.

      1. Choose **Save**.

   1. **Virtual Private Cloud (VPC)**—You can choose to use a VPC. If so, you must add **Subnets** and **VPC security groups**.

   1. **Identity crawler**—Specify whether to turn on Amazon Kendra’s identity crawler. The identity crawler uses the access control list (ACL) information for your documents to filter search results based on the user or their group access to documents. If you have an ACL for your documents and choose to use your ACL, you can then also choose to turn on Amazon Kendra’s identity crawler to configure [user context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources) of search results. Otherwise, if identity crawler is turned off, all documents can be publicly searched. If you want to use access control for your documents and identity crawler is turned off, you can alternatively use the [PutPrincipalMapping](https://docs.aws.amazon.com/kendra/latest/APIReference/API_PutPrincipalMapping.html) API to upload user and group access information for user context filtering.

   1. **IAM role**—Choose an existing IAM role or create a new IAM role to access your repository credentials and index content.
**Note**  
IAM roles used for indexes cannot be used for data sources. If you are unsure if an existing role is used for an index or FAQ, choose **Create a new role** to avoid errors.

   1. Choose **Next**.

1. On the **Configure sync settings** page, enter the following information:

   1. For **Sync scope**, choose from the following options:
**Note**  
When you choose to crawl **Articles**, **Basic pages**, and **Basic blocks**, their default fields will be synced automatically. You can also choose to sync their comments, attachments, custom fields and other custom entities.

      1. For **Select entities**:
        +  **Articles**—Choose whether to crawl **Articles**, their comments **Comments**, and their **Attachments**.
        + **Basic pages**—Choose whether to crawl **Basic pages**, their **Comments**, and their **Attachments**.
        + **Basic blocks**—Choose whether to crawl **Basic blocks**, their **Comments**, and their **Attachments**.
        + You can also choose to add **Custom content types** and **Custom Blocks**.

   1. For **Additional configuration – optional**:
      + For **Regex pattern**—Add regular expression patterns to include or exclude specific entity titles and file names. You can add up to 100 patterns.

   1. **Sync mode**—Choose how you want to update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option.
      + Full sync: Freshly index all content, replacing existing content each time your data source syncs with your index.
      + New, modified, deleted sync: Index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

   1. In **Sync run schedule**, **Frequency**—How often Amazon Kendra will sync with your data source.

   1. Choose **Next**.

1. On the **Set field mappings** page, enter the following information:

   1. For **Contents**, **Comments**, and **Attachments**—Select from the Amazon Kendra generated default data source fields you want to map to your index. 

   1.  **Add field**—To add custom data source fields to create an index field name to map to and the field data type.

   1. Choose **Next**.

1. On the **Review and create** page, check that the information you have entered is correct and then select **Add data source**. You can also choose to edit your information from this page. Your data source will appear on the **Data sources** page after the data source has been added successfully.

------
#### [ API ]

**To connect Amazon Kendra to Drupal**

You must specify a JSON of the [data source schema](https://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html#ds-drupal-schema) using the [https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) API. You must provide the following information:
+ **Data source**—Specify the data source type as `DRUPAL` when you use the [https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) JSON schema. Also specify the data source as `TEMPLATE` when you call the [https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html](https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html) API.
+ **Sync mode**—Specify how Amazon Kendra should update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option. You can choose between:
  + `FORCED_FULL_CRAWL` to freshly index all content, replacing existing content each time your data source syncs with your index.
  + `FULL_CRAWL` to index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source’s mechanism for tracking content changes and index content that changed since the last sync.
  + `CHANGE_LOG` to index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source’s mechanism for tracking content changes and index content that changed since the last sync.
+ **Secret Amazon Resource Name (ARN)**—Provide the Amazon Resource Name (ARN) of a Secrets Manager secret that contains the authentication credentials you created in your Drupal account. 

  If you use basic authentication, the secret is stored in a JSON structure with the following keys: 

  ```
  {
      "username": "user name",
      "password": "password"
  }
  ```

  If you use OAuth 2.0 authentication, the secret is stored in a JSON structure with the following keys: 

  ```
  {
      "username": "user name",
      "password": "password",
      "clientId": "client id",
      "clientSecret": "client secret"
  }
  ```
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).
+ **IAM role**—Specify `RoleArn` when you call `CreateDataSource` to provide an IAM role with permissions to access your Secrets Manager secret and to call the required public APIs for the Drupal connector and Amazon Kendra. For more information, see [IAM roles for Drupal data sources](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds).

You can also add the following optional features:
+  **Virtual Private Cloud (VPC)**—Specify `VpcConfiguration` when you call `CreateDataSource`. For more information, see [Configuring Amazon Kendra to use an Amazon VPC](vpc-configuration.md).
+  **Inclusion and exclusion filters**—You can specify whether to include contents, comments, and attachments. You can also specify regular expression patterns to include or exclude contents, comments, and attachments.
**Note**  
Most data sources use regular expression patterns, which are inclusion or exclusion patterns referred to as filters. If you specify an inclusion filter, only content that matches the inclusion filter is indexed. Any document that doesn’t match the inclusion filter isn’t indexed. If you specify an inclusion and exclusion filter, documents that match the exclusion filter are not indexed, even if they match the inclusion filter.
+ **Identity crawler**—Specify whether to turn on Amazon Kendra’s identity crawler. The identity crawler uses the access control list (ACL) information for your documents to filter search results based on the user or their group access to documents. If you have an ACL for your documents and choose to use your ACL, you can then also choose to turn on Amazon Kendra’s identity crawler to configure [user context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources) of search results. Otherwise, if identity crawler is turned off, all documents can be publicly searched. If you want to use access control for your documents and identity crawler is turned off, you can alternatively use the [PutPrincipalMapping](https://docs.aws.amazon.com/kendra/latest/APIReference/API_PutPrincipalMapping.html) API to upload user and group access information for user context filtering.
+  **Field mappings**—Choose to map your Drupal data source fields to your Amazon Kendra index fields. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html).
**Note**  
The document body field or the document body equivalent for your documents is required in order for Amazon Kendra to search your documents. You must map your document body field name in your data source to the index field name `_document_body`. All other fields are optional.

For a list of other important JSON keys to configure, see [Drupal template schema](ds-schemas.md#ds-drupal-schema).

------

## Notes
<a name="drupal-notes"></a>
+ Drupal APIs have no official throttling limits.
+ Java SDKs are not available for Drupal.
+ Drupal data can be fetched only using native JSON API’s.
+ Content types not associated with any Drupal **View** cannot be crawled.
+ You need administrator access to crawl data from Drupal **Blocks**.
+ There is no JSON API available to create the user defined content type using HTTP verbs.
+ The document body and comments for **Articles**, **Basic pages**, **Basic blocks**, user defined content type, and user defined block type, are displayed in HTML format. If the HTML content is not well-formed, then the HTML related tags will appear in the document body and comments and will be visible in Amazon Kendra search results.
+ Content types and **Block** types without description or body will not be ingested into Amazon Kendra. Only **Comments** and **Attachments** of such **Content** or **Block** types will be ingested into your Amazon Kendra index.

# GitHub
<a name="data-source-github"></a>

GitHub is a web-based hosting service for software development providing code storage and management services with version control. You can use Amazon Kendra to index your GitHub Enterprise Cloud (SaaS) and GitHub Enterprise Server (On Prem) repository files, issue and pull requests, issue and pull request comments, and issue and pull request comment attachments. You can also choose to include or exclude certain files.

**Note**  
Amazon Kendra now supports an upgraded GitHub connector.  
The console has been automatically upgraded for you. Any new connectors you create in the console will use the upgraded architecture. If you use the API, you must now use the [https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html](https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html) object instead of the `GitHubConfiguration` object to configure your connector.  
Connectors configured using the older console and API architecture will continue to function as configured. However, you won’t be able to edit or update them. If you want to edit or update your connector configuration, you must create a new connector.  
We recommended migrating your connector workflow to the upgraded version. Support for connectors configured using the older architecture is scheduled to end by June 2024.

You can connect Amazon Kendra to your GitHub data source using the [Amazon Kendra console](https://console.aws.amazon.com/kendra/) and the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html) API.

For troubleshooting your Amazon Kendra GitHub data source connector, see [Troubleshooting data sources](troubleshooting-data-sources.md).

**Topics**
+ [Supported features](#supported-features-github)
+ [Prerequisites](#prerequisites-github)
+ [Connection instructions](#data-source-procedure-github)
+ [Learn more](#github-learn-more)

## Supported features
<a name="supported-features-github"></a>

Amazon Kendra GitHub data source connector supports the following features:
+ Field mappings
+ User access control
+ Inclusion/exclusion filters
+ Full and incremental content syncs
+ Virtual private cloud (VPC)

## Prerequisites
<a name="prerequisites-github"></a>

Before you can use Amazon Kendra to index your GitHub data source, make these changes in your GitHub and AWS accounts.

**In GitHub, make sure you have:**
+ Created a GitHub user with administrative permissions to the GitHub organization.
+ Configured a personal access token in Git Hub to use as your authentication credentials. See [GitHub documentation on creating a personal access token](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token).
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).
+ **Recommended:**Configured an OAuth token for authentication credentials. Use OAuth token for better API throttle limits and connector performance. See [GitHub documentation on OAuth authorization](https://docs.github.com/en/rest/apps/oauth-applications?apiVersion=2022-11-28#about-oauth-apps-and-oauth-authorizations-of-github-apps).
+ Noted the GitHub host URL for the type of GitHub service that you use. For example, the host URL for GitHub cloud could be *https://api.github.com* and the host URL for GitHub server could be *https://on-prem-host-url/api/v3/*.
+ Noted the name of your organization for GitHub the GitHub Enterprise Cloud (SaaS) account or GitHub Enterprise Server (on-premises) account you want to connect to. You can find your organization name by logging into GitHub desktop and selecting **Your organizations** under your profile picture dropdown.
+ **Optional (server only):** Generated a SSL certificate and copied the path to the certificate stored in an Amazon S3 bucket. You use this to connect to GitHub if you require a secure SSL connection. You can simply generate a self-signed X509 certificate on any computer using OpenSSL. For an example of using OpenSSL to create an X509 certificate, see [Create and sign an X509 certificate](https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/configuring-https-ssl.html).
+ Added the following permissions:

  **For GitHub Enterprise Cloud (SaaS)**
  + `repo:status` – Grants read/write access to commit statuses in public and private repositories. This scope is only necessary to grant other users or services access to private repository commit statuses without granting access to the code.
  + `repo_deployment` – Grants access to deployment statuses for public and private repositories. This scope is only necessary to grant other users or services access to deployment statuses, without granting access to the code.
  + `public_repo` – Limits access to public repositories. That includes read/write access to code, commit statuses, repository projects, collaborators, and deployment statuses for public repositories and organizations. Also required for starring public repositories.
  + `repo:invite` – Grants accept/decline abilities for invitations to collaborate on a repository. This scope is only necessary to grant other users or services access to invites without granting access to the code.
  + `security_events` – Grants: read and write access to security events in the code scanning API. This scope is only necessary to grant other users or services access to security events without granting access to the code.
  + `read:org` – Read-only access to organization membership, organization projects, and team membership.
  + `user:email` – Grants read access to a user's email addresses. Required by Amazon Kendra to crawl ACLs.
  + `user:follow` – Grants access to follow or unfollow other users. Required by Amazon Kendra to crawl ACLs.
  + `read:user` – Grants access to read a user's profile data. Required by Amazon Kendra to crawl ACLs.
  + `workflow` – Grants the ability to add and update GitHub Actions workflow files. Workflow files can be committed without this scope if the same file (with both the same path and contents) exists on another branch in the same repository.

  For more information, see [Scopes for OAuth apps](https://docs.github.com/en/apps/oauth-apps/building-oauth-apps/scopes-for-oauth-apps) in GitHub Docs.

  **For GitHub Enterprise Server (On Prem)**
  + `repo:status` – Grants read/write access to commit statuses in public and private repositories. This scope is only necessary to grant other users or services access to private repository commit statuses without granting access to the code.
  + `repo_deployment` – Grants access to deployment statuses for public and private repositories. This scope is only necessary to grant other users or services access to deployment statuses, without granting access to the code.
  + `public_repo` – Limits access to public repositories. That includes read/write access to code, commit statuses, repository projects, collaborators, and deployment statuses for public repositories and organizations. Also required for starring public repositories.
  + `repo:invite` – Grants accept/decline abilities for invitations to collaborate on a repository. This scope is only necessary to grant other users or services access to invites without granting access to the code.
  + `security_events` – Grants: read and write access to security events in the code scanning API. This scope is only necessary to grant other users or services access to security events without granting access to the code.
  + `read:user` – Grants access to read a user's profile data. Required by Amazon Q Business to crawl ACLs.
  + `user:email` – Grants read access to a user's email addresses. Required by Amazon Q Business to crawl ACLs.
  + `user:follow` – Grants access to follow or unfollow other users. Required by Amazon Q Business to crawl ACLs.
  + `site_admin` – Grants site administrators access to GitHub Enterprise Server Administration API endpoints.
  + `workflow` – Grants the ability to add and update GitHub Actions workflow files. Workflow files can be committed without this scope if the same file (with both the same path and contents) exists on another branch in the same repository.

  For more information, see [Scopes for OAuth apps](https://docs.github.com/en/apps/oauth-apps/building-oauth-apps/scopes-for-oauth-apps) in GitHub Docs and [Understanding scopes for OAuth Apps](https://developer.github.com/enterprise/2.16/apps/building-oauth-apps/understanding-scopes-for-oauth-apps/#available-scopes) in GitHub Developer.
+ Checked each document is unique in GitHub and across other data sources you plan to use for the same index. Each data source that you want to use for an index must not contain the same document across the data sources. Document IDs are global to an index and must be unique per index.

**In your AWS account, make sure you have:**
+ [Created an Amazon Kendra index](https://docs.aws.amazon.com/kendra/latest/dg/create-index.html) and, if using the API, noted the index ID.
+ [Created an IAM role](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds) for your data source and, if using the API, noted the ARN of the IAM role.
**Note**  
If you change your authentication type and credentials, you must update your IAM role to access the correct AWS Secrets Manager secret ID.
+ Stored your GitHub authentication credentials in an AWS Secrets Manager secret and, if using the API, noted the ARN of the secret.
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).

If you don’t have an existing IAM role or secret, you can use the console to create a new IAM role and Secrets Manager secret when you connect your GitHub data source to Amazon Kendra. If you are using the API, you must provide the ARN of an existing IAM role and Secrets Manager secret, and an index ID.

## Connection instructions
<a name="data-source-procedure-github"></a>

To connect Amazon Kendra to your GitHub data source, you must provide the necessary details of your GitHub data source so that Amazon Kendra can access your data. If you have not yet configured GitHub for Amazon Kendra, see [Prerequisites](#prerequisites-github).

------
#### [ Console ]

**To connect Amazon Kendra to GitHub** 

1. Sign in to the AWS Management Console and open the [Amazon Kendra console](https://console.aws.amazon.com/kendra/).

1. From the left navigation pane, choose **Indexes** and then choose the index you want to use from the list of indexes.
**Note**  
You can choose to configure or edit your **User access control** settings under **Index settings**. 

1. On the **Getting started** page, choose **Add data source**.

1. On the **Add data source** page, choose **GitHub connector**, and then choose **Add connector**. If using version 2 (if applicable), choose **GitHub connector** with the "V2.0" tag.

1. On the **Specify data source details** page, enter the following information:

   1. In **Name and description**, for **Data source name**—Enter a name for your data source. You can include hyphens but not spaces.

   1. (Optional)** Description**—Enter an optional description for your data source.

   1. In **Default language**—Choose a language to filter your documents for the index. Unless you specify otherwise, the language defaults to English. Language specified in the document metadata overrides the selected language.

   1. In **Tags**, for **Add new tag**—Include optional tags to search and filter your resources or track your AWS costs.

   1. Choose **Next**.

1. On the **Define access and security** page, enter the following information:

   1. **GitHub source**—Choose between **GitHub Enterprise Cloud** and **GitHub Enterprise Server**.

   1. **GitHub host URL**—For example, the host URL for GitHub cloud could be *https://api.github.com* and the host URL for GitHub server could be *https://on-prem-host-url/api/v3/*.

   1. **GitHub organization name**—Enter your GitHub organization name. You can find your organization information in your GitHub account.
**Note**  
GitHub connector supports crawling a single organization per data source connector instance.

   1. **Authorization**—Turn on or off access control list (ACL) information for your documents, if you have an ACL and want to use it for access control. The ACL specifies which documents that users and groups can access. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see [User context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources).

   1. **AWS Secrets Manager secret**—Choose an existing secret or create a new Secrets Manager secret to store your GitHub authentication credentials. If you choose to create a new secret an AWS Secrets Manager secret window opens.

      1. Enter following information in the **Create an AWS Secrets Manager secret window**:

         1. **Secret name**—A name for your secret. The prefix ‘AmazonKendra-GitHub-’ is automatically added to your secret name.

         1. For **GitHub token**—Enter the authentication credential value configured in GitHub.

      1. Save and add your secret.

   1. **Virtual Private Cloud (VPC)**—You can choose to use a VPC. If so, you must add **Subnets** and **VPC security groups**.

   1. **Identity crawler**—Specify whether to turn on Amazon Kendra’s identity crawler. The identity crawler uses the access control list (ACL) information for your documents to filter search results based on the user or their group access to documents. If you have an ACL for your documents and choose to use your ACL, you can then also choose to turn on Amazon Kendra’s identity crawler to configure [user context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources) of search results. Otherwise, if identity crawler is turned off, all documents can be publicly searched. If you want to use access control for your documents and identity crawler is turned off, you can alternatively use the [PutPrincipalMapping](https://docs.aws.amazon.com/kendra/latest/APIReference/API_PutPrincipalMapping.html) API to upload user and group access information for user context filtering.

   1. **IAM role**—Choose an existing IAM role or create a new IAM role to access your repository credentials and index content.
**Note**  
IAM roles used for indexes cannot be used for data sources. If you are unsure if an existing role is used for an index or FAQ, choose **Create a new role** to avoid errors.

   1. Choose **Next**.

1. On the **Configure sync settings** page, enter the following information:

   1. **Select repositories**—Choose to crawl all repositories or select.

      If you choose to crawl select repositories, add the names for the repositories and, optionally, the name of any specific branches.

   1. **Content types**—Choose the content types you want to crawl from files, issues, pull requests, and more.

   1. **Regex patterns**—Add regular expression patterns to include or exclude certain files.

   1. **Sync mode**—Choose how you want to update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option.
      + Full sync: Freshly index all content, replacing existing content each time your data source syncs with your index.
      + New, modified sync: Index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.
      + New, modified, deleted sync: Index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

   1. In **Sync run schedule** for **Frequency**—Choose how often to sync your data source content and update your index.

   1. Choose **Next**.

1. On the **Set field mappings** page, enter the following information:

   1. **Default data source fields**—Select from the Amazon Kendra generated default data source fields you want to map to your index.

   1.  **Add field**—To add custom data source fields to create an index field name to map to and the field data type.

   1. Choose **Next**.

1. On the **Review and create** page, check that the information you have entered is correct and then select **Add data source**. You can also choose to edit your information from this page. Your data source will appear on the **Data sources** page after the data source has been added successfully.

------
#### [ API ]

**To connect Amazon Kendra to GitHub**

You must specify a JSON of the [data source schema](https://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html#ds-github-schema) using the [https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) API. You must provide the following information:
+ **Data source**—Specify the data source type as `GITHUB` when you use the [https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) JSON schema. Also specify the data source as `TEMPLATE` when you call the [https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html](https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html) API.
+ **GitHub type**—Specify the type as either `SAAS` or `ON_PREMISE`.
+ **Host URL**—Specify the GitHub host URL or API endpoint URL. For example, if you use GitHub SaaS/Enterprise Cloud, the host URL could be `https://api.github.com`, and for GitHub on-premises/Enterprise Server the host URL could be `https://on-prem-host-url/api/v3/`.
+ **Organization name**—Specify the name of the organization of the GitHub account. You can find your organization name by logging into GitHub desktop and selecting **Your organizations** under your profile picture dropdown.
+ **Sync mode**—Specify how Amazon Kendra should update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don’t choose full sync as your sync mode option. You can choose between:
  + `FORCED_FULL_CRAWL` to freshly index all content, replacing existing content each time your data source syncs with your index.
  + `FULL_CRAWL` to index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source’s mechanism for tracking content changes and index content that changed since the last sync.
  + `CHANGE_LOG` to index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source’s mechanism for tracking content changes and index content that changed since the last sync.
+ **Identity crawler**—Specify whether to turn on Amazon Kendra’s identity crawler. The identity crawler uses the access control list (ACL) information for your documents to filter search results based on the user or their group access to documents. If you have an ACL for your documents and choose to use your ACL, you can then also choose to turn on Amazon Kendra’s identity crawler to configure [user context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources) of search results. Otherwise, if identity crawler is turned off, all documents can be publicly searched. If you want to use access control for your documents and identity crawler is turned off, you can alternatively use the [PutPrincipalMapping](https://docs.aws.amazon.com/kendra/latest/APIReference/API_PutPrincipalMapping.html) API to upload user and group access information for user context filtering.
+ **Secret Amazon Resource Name (ARN)**—Provide the Amazon Resource Name (ARN) of an Secrets Manager secret that contains the authentication credentials for your GitHub account. The secret is stored in a JSON structure with the following keys:

  ```
  {
      "personalToken": "token"
  }
  ```
+ **IAM role**—Specify `RoleArn` when you call `CreateDataSource` to provide an IAM role with permissions to access your Secrets Manager secret and to call the required public APIs for the GitHub connector and Amazon Kendra. For more information, see [IAM roles for GitHub data sources](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds).

You can also add the following optional features:
+  **Virtual Private Cloud (VPC)**—Specify `VpcConfiguration` when you call `CreateDataSource`. For more information, see [Configuring Amazon Kendra to use an Amazon VPC](vpc-configuration.md).
**Note**  
If you use GitHub server, you must use an Amazon VPC to connect to your GitHub server.
+  **Repository filter**—Filter repositories by their name and branch names.
+  **Document/content types**—Specify whether to crawl repository documents, issues, issue comments, issue comment attachments, pull requests, pull request comments, pull request comment attachments.
+  **Inclusion and exclusion filters**—Specify whether to include or exclude certain files and folders.
**Note**  
Most data sources use regular expression patterns, which are inclusion or exclusion patterns referred to as filters. If you specify an inclusion filter, only content that matches the inclusion filter is indexed. Any document that doesn’t match the inclusion filter isn’t indexed. If you specify an inclusion and exclusion filter, documents that match the exclusion filter are not indexed, even if they match the inclusion filter.
+ **Access control list (ACL)**—Specify whether to crawl ACL information for your documents, if you have an ACL and want to use it for access control. The ACL specifies which documents that users and groups can access. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see [User context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources).
+  **Field mappings**—Choose to map your GitHub data source fields to your Amazon Kendra index fields. You can include fields of documents, commits, issues, issue attachments, issue comments, pull requests, pull request attachments, pull request comments. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html).
**Note**  
The document body field or the document body equivalent for your documents is required in order for Amazon Kendra to search your documents. You must map your document body field name in your data source to the index field name `_document_body`. All other fields are optional.

For a list of other important JSON keys to configure, see [GitHub template schema](https://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html#ds-github-schema).

------

## Learn more
<a name="github-learn-more"></a>

To learn more about integrating Amazon Kendra with your GitHub data source, see:
+ [Reimagine search on GitHub repositories with the power of the Amazon Kendra GitHub connector](https://aws.amazon.com/blogs/machine-learning/reimagine-search-on-github-repositories-with-the-power-of-the-amazon-kendra-github-connector/)

# Gmail
<a name="data-source-gmail"></a>

Gmail is email client developed by Google through which you can send email messages with file attachments. Gmail messages can be sorted and stored inside your email inbox using folders and labels. You can use Amazon Kendra to index your email messages and message attachments. You can also configure Amazon Kendra to include or exclude specific email messages, message attachments, and labels for indexing.

You can connect Amazon Kendra to your Gmail data source using the [Amazon Kendra console](https://console.aws.amazon.com/kendra/) and the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html) API.

For troubleshooting your Amazon Kendra Gmail data source connector, see [Troubleshooting data sources](troubleshooting-data-sources.md).

**Topics**
+ [Supported features](#supported-features-gmail)
+ [Prerequisites](#prerequisites-gmail)
+ [Connection instructions](#data-source-procedure-gmail)
+ [Learn more](#gmail-learn-more)
+ [Notes](#gmail-notes)

## Supported features
<a name="supported-features-gmail"></a>
+ Field mappings
+ User access control
+ Inclusion/exclusion filters
+ Full and incremental content syncs
+ Virtual private cloud (VPC)

## Prerequisites
<a name="prerequisites-gmail"></a>

Before you can use Amazon Kendra to index your Gmail data source, make these changes in your Gmail and AWS accounts.

**In Gmail, make sure you have:**
+ Created a Google Cloud Platform admin account and have created a Google Cloud project.
+ Activated Gmail API and Admin SDK API in your admin account.
+ Created a service account and downloaded a JSON private key for your Gmail. For information on how to create and access your private key, see Google Cloud documentation on how to [Create a service account key](https://cloud.google.com/iam/docs/keys-create-delete#creating) and [Service account credentials](https://cloud.google.com/iam/docs/service-account-creds#key-types).
+ Copied your admin account email, your service account email, and your private key to use as your authentication credentials.
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).
+ Added the following Oauth scopes (using an admin role) for your user and the shared directories you want to index:
  + https://www.googleapis.com/auth/admin.directory.user.readonly
  + https://www.googleapis.com/auth/gmail.readonly
+ Checked each document is unique in Gmail and across other data sources you plan to use for the same index. Each data source that you want to use for an index must not contain the same document across the data sources. Document IDs are global to an index and must be unique per index.

**In your AWS account, make sure you have:**
+ [Created an Amazon Kendra index](https://docs.aws.amazon.com/kendra/latest/dg/create-index.html) and, if using the API, noted the index ID.
+ [Created an IAM role](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds) for your data source and, if using the API, noted the ARN of the IAM role.
**Note**  
If you change your authentication type and credentials, you must update your IAM role to access the correct AWS Secrets Manager secret ID.
+ Stored your Gmail authentication credentials in an AWS Secrets Manager secret and, if using the API, noted the ARN of the secret.
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).

If you don’t have an existing IAM role or secret, you can use the console to create a new IAM role and Secrets Manager secret when you connect your Gmail data source to Amazon Kendra. If you are using the API, you must provide the ARN of an existing IAM role and Secrets Manager secret, and an index ID.

## Connection instructions
<a name="data-source-procedure-gmail"></a>

To connect Amazon Kendra to your Gmail data source you must provide details of your Gmail credentials so that Amazon Kendra can access your data. If you have not yet configured Gmail for Amazon Kendra, see [Prerequisites](#prerequisites-gmail).

------
#### [ Console ]

**To connect Amazon Kendra to Gmail** 

1. Sign in to the AWS Management Console and open the [Amazon Kendra console](https://console.aws.amazon.com/kendra/).

1. From the left navigation pane, choose **Indexes** and then choose the index you want to use from the list of indexes.
**Note**  
You can choose to configure or edit your **User access control** settings under **Index settings**. 

1. On the **Getting started** page, choose **Add data source**.

1. On the **Add data source** page, choose **Gmail connector**, and then choose **Add connector**. If using version 2 (if applicable), choose **Gmail connector** with the "V2.0" tag.

1. On the **Specify data source details** page, enter the following information:

   1. In **Name and description**, for **Data source name**—Enter a name for your data source. You can include hyphens but not spaces.

   1. (Optional)** Description**—Enter an optional description for your data source.

   1. In **Default language**—Choose a language to filter your documents for the index. Unless you specify otherwise, the language defaults to English. Language specified in the document metadata overrides the selected language.

   1. In **Tags**, for **Add new tag**—Include optional tags to search and filter your resources or track your AWS costs.

   1. Choose **Next**.

1. On the **Define access and security** page, enter the following information:

   1. **Authorization**—Turn on or off access control list (ACL) information for your documents, if you have an ACL and want to use it for access control. The ACL specifies which documents that users and groups can access. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see [User context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources).

   1. In **Authentication** for **AWS Secrets Manager secret**—Choose an existing secret or create a new Secrets Manager secret to store your Gmail authentication credentials. If you choose to create a new secret an AWS Secrets Manager secret window opens.

      1. Enter following information in the **Create an AWS Secrets Manager secret window**:

        1. **Secret Name**—A name for your secret.

        1. **Client email**—The client email that you copied from your Google service account.

        1. **Admin account email**—The admin account email that you would like to use.

        1. **Private key**—The private key you copied from your Google service account.

        1. Save and add your secret.

   1. **Virtual Private Cloud (VPC)**—You can choose to use a VPC. If so, you must add **Subnets** and **VPC security groups**.

   1. **IAM role**—Choose an existing IAM role or create a new IAM role to access your repository credentials and index content.
**Note**  
IAM roles used for indexes cannot be used for data sources. If you are unsure if an existing role is used for an index or FAQ, choose **Create a new role** to avoid errors.

   1. Choose **Next**.

1. On the **Configure sync settings** page, enter the following information:

   1. For**Entity types**—Choose to sync message attachments.

   1. (Optional) For **Additional configuration**, enter the following information:

      1. **Date range**—Enter a date range to specify the start and end date of emails you want to crawl.

      1. **Email domains**—Include or exclude certain emails based on "to", "from", "cc" and "bcc" email domains.

      1. **Keywords in subjects**—Include or exclude emails based on keywords in their email subjects.
**Note**  
You can also choose to include any documents that match all the subject keywords you have entered.

      1. **Labels**—Add regular expression patterns to include or exclude certain email labels.

      1. **Attachments**—Add regular expression patterns to include or exclude certain email attachments.

   1. **Sync mode**—Choose how you want to update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option.
      + Full sync: Freshly index all content, replacing existing content each time your data source syncs with your index.
      + New, modified, deleted sync: Index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.
**Important**  
Because there is no API to update permanently deleted Gmail messages, new, modified, or deleted content sync:  
Won't remove messages that were permanently deleted from Gmail from your Amazon Kendra index
Won't sync changes in Gmail email labels
To sync your Gmail data source label changes and permanently deleted email messages to your Amazon Kendra index, you must run full crawls periodically.

   1. In **Sync run schedule**, for **Frequency**—Choose how often to sync your data source content and update your index.

   1. Choose **Next**.

1. On the **Set field mappings** page, enter the following information:

   1. **Default data source fields**—Select from the Amazon Kendra generated default data source fields you want to map to your index.
**Note**  
Amazon Kendra Gmail data source connector does not support creating custom index fields due to API limitations.

   1. Choose **Next**.

1. On the **Review and create** page, check that the information you have entered is correct and then select **Add data source**. You can also choose to edit your information from this page. Your data source will appear on the **Data sources** page after the data source has been added successfully.

------
#### [ API ]

**To connect Amazon Kendra to Gmail**

You must specify a JSON of the [data source schema](https://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html#ds-gmail-schema) using the [https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) API. You must provide the following information:
+ **Data source**—Specify the data source type as `GMAIL` when you use the [https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) JSON schema. Also specify the data source as `TEMPLATE` when you call the [https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html](https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html) API.
+ **Sync mode**—Specify how Amazon Kendra should update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option. You can choose between:
  + `FORCED_FULL_CRAWL` to freshly index all content, replacing existing content each time your data source syncs with your index.
  + `FULL_CRAWL` to index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source’s mechanism for tracking content changes and index content that changed since the last sync.
**Important**  
Because there is no API to update permanently deleted Gmail messages, new, modified, or deleted content sync:  
Won't remove messages that were permanently deleted from Gmail from your Amazon Kendra index
Won't sync changes in Gmail email labels
To sync your Gmail data source label changes and permanently deleted email messages to your Amazon Kendra index, you must run full crawls periodically.
+ **Secret Amazon Resource Name (ARN)**—Provide the Amazon Resource Name (ARN) of an Secrets Manager secret that contains the authentication credentials for your Gmail account. The secret is stored in a JSON structure with the following keys:

  ```
  {
      "adminAccountEmailId": "service account email",
      "clientEmailId": "user account email",
      "privateKey": "private key"
  }
  ```
+ **IAM role**—Specify `RoleArn` when you call `CreateDataSource` to provide an IAM role with permissions to access your Secrets Manager secret and to call the required public APIs for the Gmail connector and Amazon Kendra. For more information, see [IAM roles for Gmail data sources](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds).

You can also add the following optional features:
+  **Virtual Private Cloud (VPC)**—Specify `VpcConfiguration` when you call `CreateDataSource`. For more information, see [Configuring Amazon Kendra to use an Amazon VPC](vpc-configuration.md).
+  **Inclusion and exclusion filters**—Specify whether to include or exclude certain "to", "from", "cc", "bcc" emails.
**Note**  
Most data sources use regular expression patterns, which are inclusion or exclusion patterns referred to as filters. If you specify an inclusion filter, only content that matches the inclusion filter is indexed. Any document that doesn’t match the inclusion filter isn’t indexed. If you specify an inclusion and exclusion filter, documents that match the exclusion filter are not indexed, even if they match the inclusion filter.
+  **User context filtering and access control**—Amazon Kendra crawls the access control list (ACL) for your documents, if you have an ACL for your documents. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see [User context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#datasource-context-filter).
+  **Field mappings**—Choose to map your Gmail data source fields to your Amazon Kendra index fields. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html).
**Note**  
The document body field or the document body equivalent for your documents is required in order for Amazon Kendra to search your documents. You must map your document body field name in your data source to the index field name `_document_body`. All other fields are optional.
**Note**  
Amazon Kendra Gmail data source connector does not support creating custom index fields due to API limitations.

For a list of other important JSON keys to configure, see [Gmail template schema](https://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html#ds-gmail-schema).

------

## Learn more
<a name="gmail-learn-more"></a>

To learn more about integrating Amazon Kendra with your Gmail data source, see:
+ [Perform intelligent search across emails in your Google workspace using the Gmail connector for Amazon Kendra](https://aws.amazon.com/blogs/machine-learning/perform-intelligent-search-across-emails-in-your-google-workspace-using-the-gmail-connector-for-amazon-kendra/).

## Notes
<a name="gmail-notes"></a>
+ Because there is no API to update permanently deleted Gmail messages, a `FULL_CRAWL`/**New, modified, or deleted content sync**:
  + Won’t remove messages that were permanently deleted from Gmail from your Amazon Kendra index
  + Won’t sync changes in Gmail email labels

  To sync your Gmail data source label changes and permanently deleted email messages to your Amazon Kendra index, you must run full crawls periodically.
+ Amazon Kendra Gmail data source connector does not support creating custom index fields due to API limitations.

# Google Drive
<a name="data-source-google-drive"></a>

Google Drive is a cloud-based file storage service. You can use Amazon Kendra to index documents stored in shared drives, My Drives, and Shared with me folders in your Google Drive data source. You can index both Google Workspace documents as well as documents listed in [Types of documentation](https://docs.aws.amazon.com/kendra/latest/dg/index-document-types.html). You can also use inclusion and exclusion filters to index content by file name, file type, and file path.

You can connect Amazon Kendra to your Google Drive data source using the [Amazon Kendra console](https://console.aws.amazon.com/kendra/), the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html) API, or the [GoogleDriveConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_GoogleDriveConfiguration.html) API.

Amazon Kendra has two versions of the Google Drive connector. Supported features of each version include:

**Google Drive connector V1.0 / [GoogleDriveConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_GoogleDriveConfiguration.html) API**
+ Field mappings
+ User access control
+ Inclusion/exclusion filters

**Google Drive connector V2.0 / [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html) API**
+ Field mappings
+ User access control
+ Inclusion/exclusion filters
+ Full and incremental content syncs
+ Virtual private cloud (VPC)

**Note**  
Google Drive connector V1.0 / Google DriveConfiguration API ended in 2023. We recommend migrating to or using Google Drive connector V2.0 / TemplateConfiguration API.

For troubleshooting your Amazon Kendra Google Drive data source connector, see [Troubleshooting data sources](troubleshooting-data-sources.md).

**Topics**
+ [Google Drive connector V1.0](data-source-v1-google-drive.md)
+ [Google Drive connector V2.0](data-source-v2-google-drive.md)

# Google Drive connector V1.0
<a name="data-source-v1-google-drive"></a>

Google Drive is a cloud-based file storage service. You can use Amazon Kendra to index documents and comments stored in shared drives, My Drives, and Shared with me folders in your Google Drive data source. You can index Google Workspace documents, as well as documents listed in [Types of documentation](https://docs.aws.amazon.com/kendra/latest/dg/index-document-types.html). You can also use inclusion and exclusion filters to index content by file name, file type, and file path.

**Note**  
Google Drive connector V1.0 / Google DriveConfiguration API ended in 2023. We recommend migrating to or using Google Drive connector V2.0 / TemplateConfiguration API.

For troubleshooting your Amazon Kendra Google Drive data source connector, see [Troubleshooting data sources](troubleshooting-data-sources.md).

**Topics**
+ [Supported features](#supported-features-v1-google-drive)
+ [Prerequisites](#prerequisites-v1-google-drive)
+ [Connection instructions](#data-source-v1-procedure-google-drive)
+ [Learn more](#google-drive-learn-more)

## Supported features
<a name="supported-features-v1-google-drive"></a>
+ Field mappings
+ User access control
+ Inclusion/exclusion filters

## Prerequisites
<a name="prerequisites-v1-google-drive"></a>

Before you can use Amazon Kendra to index your Google Drive data source, make these changes in your Google Drive and AWS accounts.

**In Google Drive, make sure you have:**
+ **Either** been granted access by a super admin role **or** are a user with administrative privileges. You do not need a super admin role for yourself if you have been granted access by a super admin role.
+ Created a service account with **Enable G Suite Domain-wide Delegation** activated and a JSON key as private key using the account.
+ Copied your user account email and your service account email. When you connect to Amazon Kendra you enter your user account email as admin account email and your service account email as client email in your AWS Secrets Manager secret.
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).
+ Added Admin SDK API and Google Drive API in your account.
+ Added (or asked a user with a super admin role to add) the following permissions to your service account using a super admin role:
  + https://www.googleapis.com/auth/drive.readonly
  + https://www.googleapis.com/auth/drive.metadata.readonly
  + https://www.googleapis.com/auth/admin.directory.user.readonly
  + https://www.googleapis.com/auth/admin.directory.group.readonly
+ Checked each document is unique in Google Drive and across other data sources you plan to use for the same index. Each data source that you want to use for an index must not contain the same document across the data sources. Document IDs are global to an index and must be unique per index.

**In your AWS account, make sure you have:**
+ [Created an Amazon Kendra index](https://docs.aws.amazon.com/kendra/latest/dg/create-index.html) and, if using the API, noted the index ID.
+ [Created an IAM role](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds) for your data source and, if using the API, noted the ARN of the IAM role.
**Note**  
If you change your authentication type and credentials, you must update your IAM role to access the correct AWS Secrets Manager secret ID.
+ Stored your Google Drive authentication credentials in an AWS Secrets Manager secret and, if using the API, noted the ARN of the secret.
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).

If you don’t have an existing IAM role or secret, you can use the console to create a new IAM role and Secrets Manager secret when you connect your Google Drive data source to Amazon Kendra. If you are using the API, you must provide the ARN of an existing IAM role and Secrets Manager secret, and an index ID.

## Connection instructions
<a name="data-source-v1-procedure-google-drive"></a>

To connect Amazon Kendra to your Google Drive data source, you must provide the necessary details of your Google Drive data source so that Amazon Kendra can access your data. If you have not yet configured Google Drive for Amazon Kendra see [Prerequisites](#prerequisites-v1-google-drive).

------
#### [ Console ]

**To connect Amazon Kendra to Google Drive** 

1. Sign in to the AWS Management Console and open the [Amazon Kendra console](https://console.aws.amazon.com/kendra/).

1. From the left navigation pane, choose **Indexes** and then choose the index you want to use from the list of indexes.
**Note**  
You can choose to configure or edit your **User access control** settings under **Index settings**. 

1. On the **Getting started** page, choose **Add data source**.

1. On the **Add data source** page, choose **Google Drive connector V1.0 **, and then choose **Add connector**.

1. On the **Specify data source details** page, enter the following information:

   1. In **Name and description**, for **Data source name**—Enter a name for your data source. You can include hyphens but not spaces.

   1. (Optional)** Description**—Enter an optional description for your data source.

   1. In **Default language**—Choose a language to filter your documents for the index. Unless you specify otherwise, the language defaults to English. Language specified in the document metadata overrides the selected language.

   1. In **Tags**, for **Add new tag**—Include optional tags to search and filter your resources or track your AWS costs.

   1. Choose **Next**.

1. On the **Define access and security** page, enter the following information:

   1. For **Type of authentication**—Choose between **Existing** and **New**. If you choose to use an existing secret, use **Select secret** to choose your secret.

   1. If you choose to create a new secret an AWS Secrets Manager secret option opens.

      1. Enter following information in the **Create an AWS Secrets Manager secret window**:

        1. **Secret name**—A name for your secret. The prefix ‘AmazonKendra-Google Drive-’ is automatically added to your secret name.

        1. For **Admin account email**, **Client email**, and **Private key**—Enter the authentication credential values you generated and downloaded from your Google Drive account. 

        1. Choose **Save authentication**.

   1. **IAM role**—Choose an existing IAM role or create a new IAM role to access your repository credentials and index content.
**Note**  
IAM roles used for indexes cannot be used for data sources. If you are unsure if an existing role is used for an index or FAQ, choose **Create a new role** to avoid errors.

   1. Choose **Next**.

1. On the **Configure sync settings** page, enter the following information:

   1. **Exclude user accounts**—The Google Drive users you want to exclude from the index. You can add up to 100 user accounts.

   1. **Exclude shared drives**—The Google Drive shared drives you want to exclude from your index. You can add up to 100 shared drives.

   1. **Exclude file types drives**—The Google Drive file types you want to exclude from your index. You can also choose to edit MIME type selections.

   1. **Additional configurations**—Regular expression patterns to include or exclude certain content. You can add up to 100 patterns.

   1. **Frequency**—How often Amazon Kendra will sync with your data source.

   1. Choose **Next**.

1. On the **Set field mappings** page, enter the following information:

   1. For **GoogleDrive field name** and **Additional suggested field mappings**—Select from the Amazon Kendra generated default data source fields you want to map to your index. 

   1.  **Add field**—To add custom data source fields to create an index field name to map to and the field data type.

   1. Choose **Next**.

1. On the **Review and create** page, check that the information you have entered is correct and then select **Add data source**. You can also choose to edit your information from this page. Your data source will appear on the **Data sources** page after the data source has been added successfully.

------
#### [ API ]

**To connect Amazon Kendra to Google Drive**

You must specify the following using the [GoogleDriveConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_GoogleDriveConfiguration.html) API:
+ **Secret Amazon Resource Name (ARN)**—Provide the Amazon Resource Name (ARN) of an Secrets Manager secret that contains the authentication credentials for your Google Drive account. The secret is stored in a JSON structure with the following keys:

  ```
  {
      "clientAccount": "service account email",
      "adminAccount": "user account email"",
      "privateKey": "private key"
  }
  ```
+ **IAM role**—Specify `RoleArn` when you call `CreateDataSource` to provide an IAM role with permissions to access your Secrets Manager secret and to call the required public APIs for the Google Drive connector and Amazon Kendra. For more information, see [IAM roles for Google Drive data sources](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds).

You can also add the following optional features:
+  **Inclusion and exclusion filters**—By default Amazon Kendra indexes all documents in Google Drive. You can specify whether to include or exclude certain content in shared drives, user accounts, document MIME types, and files. If you choose to exclude user accounts, none of the files in the My Drive owned by the account are indexed. Files shared with the user are indexed unless the owner of the file is also excluded.
**Note**  
Most data sources use regular expression patterns, which are inclusion or exclusion patterns referred to as filters. If you specify an inclusion filter, only content that matches the inclusion filter is indexed. Any document that doesn’t match the inclusion filter isn’t indexed. If you specify an inclusion and exclusion filter, documents that match the exclusion filter are not indexed, even if they match the inclusion filter.
+  **Field mappings**—Choose to map your Google Drive data source fields to your Amazon Kendra index fields. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html).
**Note**  
The document body field or the document body equivalent for your documents is required in order for Amazon Kendra to search your documents. You must map your document body field name in your data source to the index field name `_document_body`. All other fields are optional.
+  **User context filtering and access control**—Amazon Kendra crawls the access control list (ACL) for your documents, if you have an ACL for your documents. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see [User context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#datasource-context-filter).

------

## Learn more
<a name="google-drive-learn-more"></a>

To learn more about integrating Amazon Kendra with your Google Drive data source, see:
+ [Getting started with the Amazon Kendra Google Drive connector](https://aws.amazon.com/blogs/machine-learning/getting-started-with-the-amazon-kendra-google-drive-connector/)

# Google Drive connector V2.0
<a name="data-source-v2-google-drive"></a>

Google Drive is a cloud-based file storage service. You can use Amazon Kendra to index documents and comments stored in shared drives, My Drives, and Shared with me folders in your Google Drive data source. You can index Google Workspace documents, as well as documents listed in [Types of documentation](https://docs.aws.amazon.com/kendra/latest/dg/index-document-types.html). You can also use inclusion and exclusion filters to index content by file name, file type, and file path.

**Note**  
Google Drive connector V1.0 / Google DriveConfiguration API ended in 2023. We recommend migrating to or using Google Drive connector V2.0 / TemplateConfiguration API.

For troubleshooting your Amazon Kendra Google Drive data source connector, see [Troubleshooting data sources](troubleshooting-data-sources.md).

**Topics**
+ [Supported features](#supported-features-v2-google-drive)
+ [Prerequisites](#prerequisites-v2-google-drive)
+ [Connection instructions](#data-source-procedure-v2-google-drive)
+ [Notes](#google-drive-notes)

## Supported features
<a name="supported-features-v2-google-drive"></a>
+ Field mappings
+ User access control
+ Inclusion/exclusion filters
+ Full and incremental content syncs
+ Virtual private cloud (VPC)

## Prerequisites
<a name="prerequisites-v2-google-drive"></a>

Before you can use Amazon Kendra to index your Google Drive data source, make these changes in your Google Drive and AWS accounts.

**In Google Drive, make sure you have:**
+ **Either** been granted access by a super admin role **or** are a user with administrative privileges. You do not need a super admin role for yourself if you have been granted access by a super admin role.
+ Configured Google Drive Service Account connection credentials containing your admin account email, client email (service account email), and private key. See [Google Cloud documentation on creating and deleting service account keys](https://cloud.google.com/iam/docs/keys-create-delete).
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).
+ Created a Google Cloud Service Account (an account with delegated authority to assume a user identity) with **Enable G Suite Domain-wide Delegation** activated for server-to-server authentication, and then generated a JSON private key using the account.
**Note**  
The private key should be generated after the creation of the service account.
+ Added Admin SDK API and Google Drive API in your user account.
+ **Optional:** Configured Google Drive OAuth 2.0 connection credentials containing client ID, client secret, and refresh token as connection credentials for a specific user. You need this to crawl individual account data. See [Google documentation on using OAuth 2.0 to access APIs](https://developers.google.com/identity/protocols/oauth2).
+ Added (or asked a user with a super admin role to add) the following OAuth scopes to your service account using a super admin role. These API scopes are needed to crawl all documents, and access control (ACL) information for all users in a Google Workspace domain:
  + https://www.googleapis.com/auth/drive.readonly—View and download all your Google Drive files
  + https://www.googleapis.com/auth/drive.metadata.readonly—View metadata for files in your Google Drive
  + https://www.googleapis.com/auth/admin.directory.group.readonly—Scope for only retrieving group, group alias, and member information. This is needed for the Amazon Kendra Identity Crawler.
  + https://www.googleapis.com/auth/admin.directory.user.readonly—Scope for only retrieving users or user aliases. This is needed for listing users in the Amazon Kendra Identity Crawler and for setting ACLs.
  + https://www.googleapis.com/auth/cloud-platform—Scope for generating access token for fetching content of large Google Drive files.
  + https://www.googleapis.com/auth/forms.body.readonly—Scope for fetching data from Google Forms.

  ** To support the Forms API, add the following additonal scope:**
  + https://www.googleapis.com/auth/forms.body.readonly
+ Checked each document is unique in Google Drive and across other data sources you plan to use for the same index. Each data source that you want to use for an index must not contain the same document across the data sources. Document IDs are global to an index and must be unique per index.

**In your AWS account, make sure you have:**
+ [Created an Amazon Kendra index](https://docs.aws.amazon.com/kendra/latest/dg/create-index.html) and, if using the API, noted the index ID.
+ [Created an IAM role](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds) for your data source and, if using the API, noted the ARN of the IAM role.
**Note**  
If you change your authentication type and credentials, you must update your IAM role to access the correct AWS Secrets Manager secret ID.
+ Stored your Google Drive authentication credentials in an AWS Secrets Manager secret and, if using the API, noted the ARN of the secret.
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).

If you don’t have an existing IAM role or secret, you can use the console to create a new IAM role and Secrets Manager secret when you connect your Google Drive data source to Amazon Kendra. If you are using the API, you must provide the ARN of an existing IAM role and Secrets Manager secret, and an index ID.

## Connection instructions
<a name="data-source-procedure-v2-google-drive"></a>

To connect Amazon Kendra to your Google Drive data source, you must provide the necessary details of your Google Drive data source so that Amazon Kendra can access your data. If you have not yet configured Google Drive for Amazon Kendra see [Prerequisites](#prerequisites-v2-google-drive).

------
#### [ Console ]

**To connect Amazon Kendra to Google Drive** 

1. Sign in to the AWS Management Console and open the [Amazon Kendra console](https://console.aws.amazon.com/kendra/).

1. From the left navigation pane, choose **Indexes** and then choose the index you want to use from the list of indexes.
**Note**  
You can choose to configure or edit your **User access control** settings under **Index settings**. 

1. On the **Getting started** page, choose **Add data source**.

1. On the **Add data source** page, choose **Google Drive connector**, and then choose **Add connector**. If using version 2 (if applicable), choose **Google Drive connector** with the "V2.0" tag.

1. On the **Specify data source details** page, enter the following information:

   1. In **Name and description**, for **Data source name**—Enter a name for your data source. You can include hyphens but not spaces.

   1. (Optional)** Description**—Enter an optional description for your data source.

   1. In **Default language**—Choose a language to filter your documents for the index. Unless you specify otherwise, the language defaults to English. Language specified in the document metadata overrides the selected language.

   1. In **Tags**, for **Add new tag**—Include optional tags to search and filter your resources or track your AWS costs.

   1. Choose **Next**.

1. On the **Define access and security** page, enter the following information:

   1. **Authorization**—Turn on or off access control list (ACL) information for your documents, if you have an ACL and want to use it for access control. The ACL specifies which documents that users and groups can access. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see [User context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources).

   1. For **Authentication**—Choose between **Google service account** and **OAuth 2.0 authentication** based on your use case.

   1. **AWS Secrets Manager secret**—Choose an existing secret, or create a new Secrets Manager secret to store your Google Drive authentication credentials. If you choose to create a new secret an AWS Secrets Manager secret window opens.

      1. If you chose **Google service account**, enter a name for your secret, the email ID of the admin user or "Service Account User" in your service account configuration (admin email), the email ID of the service account (client email), and the private key that you created in your service account.

         Save and add your secret

      1. If you chose **OAuth 2.0 authentication**, enter a name for your secret, client ID, client secret, and refresh token that you created in your OAuth account. The user mail id (user whose connection details are configured) will be set as ACL. The connector doesn't set other user/group principal info as ACL due to API limitations.

         Save and add your secret.

   1. **Virtual Private Cloud (VPC)**—You can choose to use a VPC. If so, you must add **Subnets** and **VPC security groups**.

   1. (For Google service account authentication users only)

      **Identity crawler**—Specify whether to turn on Amazon Kendra’s identity crawler. The identity crawler uses the access control list (ACL) information for your documents to filter search results based on the user or their group access to documents. If you have an ACL for your documents and choose to use your ACL, you can then also choose to turn on Amazon Kendra’s identity crawler to configure [user context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources) of search results. Otherwise, if identity crawler is turned off, all documents can be publicly searched. If you want to use access control for your documents and identity crawler is turned off, you can alternatively use the [PutPrincipalMapping](https://docs.aws.amazon.com/kendra/latest/APIReference/API_PutPrincipalMapping.html) API to upload user and group access information for user context filtering.

   1. **IAM role**—Choose an existing IAM role or create a new IAM role to access your repository credentials and index content.
**Note**  
IAM roles used for indexes cannot be used for data sources. If you are unsure if an existing role is used for an index or FAQ, choose **Create a new role** to avoid errors.

   1. Choose **Next**.

1. On the **Configure sync settings** page, enter the following information:

   1. **Sync contents**—Select which options or the content that you want to crawl. You can choose to crawl My Drive (personal folders), Shared Drive (folders shared with you), or both. You can also include file comments.

   1. In **Additional configuration - optional** You can also enter the following optional information:

      1. **Maximum file size**—Set the maximum size limit in MBs of files to crawl.

      1. **User email**—Add user emails that you want to include or exclude.

      1. **Shared drives**—Add the shared drive names that you want to include or exclude.

      1. **Mime types**—Add MIME types that you want to include or exclude.

      1. **Entity regex patterns**—Add regular expression patterns to include or exclude certain attachments for all supported entities. You can add up to 100 patterns.

         You can configure include/exclude regex patterns for **File name**, **File type**, and **File path**.
         + **File name** – The name of the file to include or exclude. For example, to index a file with name `teamroster.txt`, provide `teamroster`.
         + **File type** – The type of the file to include or exclude. For example, .pdf .txt .docx.
         + **File path** – The path of the file to include or exclude. For example, to index files only inside the folder `Products list` of a drive, provide `/Products list`.

   1. **Sync mode**—Choose how you want to update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option.
      + Full sync: Freshly index all content, replacing existing content each time your data source syncs with your index.
      + New, modified sync: Index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.
      + New, modified, deleted sync: Index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.
**Important**  
Google Drive API does not support retrieving comments from a permanently deleted file. Comments from trashed files are retrievable. When a file is trashed, the connector will delete comments from the Amazon Kendra index.

   1. In **Sync run schedule**, for **Frequency**—choose how often to sync your data source content and update your index.

   1. In **Sync run history**, choose to store auto-generated reports in an Amazon S3 when syncing your data source. This is useful for tracking issues when sycning your data source.

   1. Choose **Next**.

1. On the **Set field mappings** page, enter the following information:

   1. For **Files**—Select from the Amazon Kendra generated default data source fields that you want to map to your index.
**Note**  
Google Drive API does not support creating custom fields. Custom field mapping is not available for the Google Drive connector.

   1. Choose **Next**.

1. On the **Review and create** page, check that the information you have entered is correct and then select **Add data source**. You can also choose to edit your information from this page. Your data source will appear on the **Data sources** page after the data source has been added successfully.

------
#### [ API ]

**To connect Amazon Kendra to Google Drive**

You must specify a JSON of the [data source schema](https://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html) using the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html) API. You must provide the following information:
+ **Data source**—Specify the data source type as `GOOGLEDRIVEV2` when you use the [https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) JSON schema. Also specify the data source as `TEMPLATE` when you call the [https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html](https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html) API.
+ **Authentication type**—Specify whether to use service account authentication or OAuth 2.0 authentication.
+ **Sync mode**—Specify how Amazon Kendra should update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option. You can choose between:
  + `FORCED_FULL_CRAWL` to freshly index all content, replacing existing content each time your data source syncs with your index.
  + `FULL_CRAWL` to index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source’s mechanism for tracking content changes and index content that changed since the last sync.
  + `CHANGE_LOG` to index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source’s mechanism for tracking content changes and index content that changed since the last sync.
**Important**  
Google Drive API does not support retrieving comments from a permanently deleted file. Comments from trashed files are retrievable. When a file is trashed, the connector will delete comments from the Amazon Kendra index.
+ **Secret Amazon Resource Name (ARN)**—Provide the Amazon Resource Name (ARN) of a Secrets Manager secret that contains the authentication credentials you created in your Google Drive account. If you use Google service account authentication, the secret is stored in a JSON structure with the following keys: 

  ```
  {
      "clientEmail": "user account email",
      "adminAccountEmail": "service account email",
      "privateKey": "private key"
  }
  ```

  If you use OAuth 2.0 authentication, the secret is stored in a JSON structure with the following keys:

  ```
  {
      "clientID": "OAuth client ID",
      "clientSecret": "client secret",
      "refreshToken": "refresh token"
  }
  ```
+ **IAM role**—Specify `RoleArn` when you call `CreateDataSource` to provide an IAM role with permissions to access your Secrets Manager secret and to call the required public APIs for the Google Drive connector and Amazon Kendra. For more information, see [IAM roles for Google Drive data sources](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds).

You can also add the following optional features:
+  **Virtual Private Cloud (VPC)**—Specify `VpcConfiguration` when you call `CreateDataSource`. For more information, see [Configuring Amazon Kendra to use an Amazon VPC](vpc-configuration.md).
+ **My Drives, Shared Drives, Comments**—You can specify whether to crawl these types of content.
+  **Inclusion and exclusion filters**—You can specify whether to include or exclude certain user accounts, shared drives, and MIME types.
**Note**  
Most data sources use regular expression patterns, which are inclusion or exclusion patterns referred to as filters. If you specify an inclusion filter, only content that matches the inclusion filter is indexed. Any document that doesn’t match the inclusion filter isn’t indexed. If you specify an inclusion and exclusion filter, documents that match the exclusion filter are not indexed, even if they match the inclusion filter.
+ **Access control list (ACL)**—Specify whether to crawl ACL information for your documents, if you have an ACL and want to use it for access control. The ACL specifies which documents that users and groups can access. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see [User context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources).
+ **Identity crawler**—Specify whether to turn on Amazon Kendra’s identity crawler. The identity crawler uses the access control list (ACL) information for your documents to filter search results based on the user or their group access to documents. If you have an ACL for your documents and choose to use your ACL, you can then also choose to turn on Amazon Kendra’s identity crawler to configure [user context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources) of search results. Otherwise, if identity crawler is turned off, all documents can be publicly searched. If you want to use access control for your documents and identity crawler is turned off, you can alternatively use the [PutPrincipalMapping](https://docs.aws.amazon.com/kendra/latest/APIReference/API_PutPrincipalMapping.html) API to upload user and group access information for user context filtering.
+  **Field mappings**—Choose to map your Google Drive data source fields to your Amazon Kendra index fields. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html).
**Note**  
The document body field or the document body equivalent for your documents is required in order for Amazon Kendra to search your documents. You must map your document body field name in your data source to the index field name `_document_body`. All other fields are optional.

For a list of other important JSON keys to configure, see [Google Drive template schema](https://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html#ds-google-drive-schema).

------

## Notes
<a name="google-drive-notes"></a>
+ Custom field mapping is not available for Google Drive connector as the Google Drive UI does not support creating custom fields.
+ Google Drive API does not support retrieving comments from a permamently deleted file. Comments are retrievable, however, for trashed files. When a file is trashed, the Amazon Kendra connector will delete comments from the Amazon Kendra index.
+ Google Drive API does not return comments present in a .docx file.
+ If permission for a particular Google document (document, spreadsheet, slide, etc) is set to **General access: Anyone with the link** or **Shared to your specific company domain**, the document will not be visible to Amazon Kendra search users until the user making the query has accessed the document.

# IBM DB2
<a name="data-source-ibm-db2"></a>

**Note**  
IBM DB2 connector remains fully supported for existing customers through May 31, 2026. While this connector is no longer available for new users, current users can continue to use it without interruption. We are continuously evolving our connector portfolio to offer more scalable and customizable solutions. For future integrations, we recommend exploring the Amazon Kendra Custom Connector Framework[1], designed to support a broader range of enterprise use cases with enhanced flexibility.

IBM DB2 is a relational database management system developed by IBM. If you are a IBM DB2 user, you can use Amazon Kendra to index your IBM DB2 data source. The Amazon Kendra IBM DB2 data source connector supports DB2 11.5.7.

You can connect Amazon Kendra to your IBM DB2 data source using the [Amazon Kendra console](https://console.aws.amazon.com/kendra/) and the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) API.

For troubleshooting your Amazon Kendra IBM DB2 data source connector, see [Troubleshooting data sources](troubleshooting-data-sources.md).

**Topics**
+ [Supported features](#supported-features-ibm-db2)
+ [Prerequisites](#prerequisites-ibm-db2)
+ [Connection instructions](#data-source-procedure-ibm-db2)
+ [Notes](#ibm-db2-notes)

## Supported features
<a name="supported-features-ibm-db2"></a>
+ Field mappings
+ User context filtering
+ Inclusion/exclusion filters
+ Full and incremental content syncs
+ Virtual private cloud (VPC)

## Prerequisites
<a name="prerequisites-ibm-db2"></a>

Before you can use Amazon Kendra to index your IBM DB2 data source, make these changes in your IBM DB2 and AWS accounts.

**In IBM DB2, make sure you have:**
+ Noted your database user name and password.
**Important**  
As a best practice, provide Amazon Kendra with read-only database credentials.
+ Copied your database host url, port, and instance.
+ Checked each document is unique in IBM DB2 and across other data sources you plan to use for the same index. Each data source that you want to use for an index must not contain the same document across the data sources. Document IDs are global to an index and must be unique per index.

**In your AWS account, make sure you have:**
+ [Created an Amazon Kendra index](https://docs.aws.amazon.com/kendra/latest/dg/create-index.html) and, if using the API, noted the index ID.
+ [Created an IAM role](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds) for your data source and, if using the API, noted the ARN of the IAM role.
**Note**  
If you change your authentication type and credentials, you must update your IAM role to access the correct AWS Secrets Manager secret ID.
+ Stored your IBM DB2 authentication credentials in an AWS Secrets Manager secret and, if using the API, noted the ARN of the secret.
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).

If you don’t have an existing IAM role or secret, you can use the console to create a new IAM role and Secrets Manager secret when you connect your IBM DB2 data source to Amazon Kendra. If you are using the API, you must provide the ARN of an existing IAM role and Secrets Manager secret, and an index ID.

## Connection instructions
<a name="data-source-procedure-ibm-db2"></a>

To connect Amazon Kendra to your IBM DB2 data source you must provide details of your IBM DB2 credentials so that Amazon Kendra can access your data. If you have not yet configured IBM DB2 for Amazon Kendra see [Prerequisites](#prerequisites-ibm-db2).

------
#### [ Console ]

**To connect Amazon Kendra to IBM DB2** 

1. Sign in to the AWS Management Console and open the [Amazon Kendra console](https://console.aws.amazon.com/kendra/).

1. From the left navigation pane, choose **Indexes** and then choose the index you want to use from the list of indexes.
**Note**  
You can choose to configure or edit your **User access control** settings under **Index settings**. 

1. On the **Getting started** page, choose **Add data source**.

1. On the **Add data source** page, choose **IBM DB2 connector**, and then choose **Add connector**. If using version 2 (if applicable), choose **IBM DB2 connector** with the "V2.0" tag.

1. On the **Specify data source details** page, enter the following information:

   1. In **Name and description**, for **Data source name**—Enter a name for your data source. You can include hyphens but not spaces.

   1. (Optional)** Description**—Enter an optional description for your data source.

   1. In **Default language**—Choose a language to filter your documents for the index. Unless you specify otherwise, the language defaults to English. Language specified in the document metadata overrides the selected language.

   1. In **Tags**, for **Add new tag**—Include optional tags to search and filter your resources or track your AWS costs.

   1. Choose **Next**.

1. On the **Define access and security** page, enter the following information:

   1. In **Source**, enter the following information:

   1.  **Host**— Enter the database host name.

   1.  **Port**— Enter the database port.

   1.  **Instance**— Enter the database instance.

   1. **Enable SSL certificate location**—Choose to enter the Amazon S3 path to your SSL certificate file.

   1. In **Authentication**—enter the following information:

      1. **AWS Secrets Manager secret**—Choose an existing secret or create a new Secrets Manager secret to store your IBM DB2 authentication credentials. If you choose to create a new secret an AWS Secrets Manager secret window opens.

        1. Enter following information in the **Create an AWS Secrets Manager secret window**:

           1. **Secret name**—A name for your secret. The prefix ‘AmazonKendra-IBM DB2-’ is automatically added to your secret name.

           1. For **Database user name**, and **Password**—Enter the authentication credential values you copied from your database. 

        1. Choose **Save**.

   1. **Virtual Private Cloud (VPC)**—You can choose to use a VPC. If so, you must add **Subnets** and **VPC security groups**.

   1. **IAM role**—Choose an existing IAM role or create a new IAM role to access your repository credentials and index content.
**Note**  
IAM roles used for indexes cannot be used for data sources. If you are unsure if an existing role is used for an index or FAQ, choose **Create a new role** to avoid errors.

   1. Choose **Next**.

1. On the **Configure sync settings** page, enter the following information:

   1. In **Sync scope**, choose from the following options :
      + **SQL query**—Enter SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB. Amazon Kendra will crawl all database content that matches your query.
      + **Primary key column**—Provide the primary key for the database table. This identifies a table within your database.
      + **Title column**—Provide the name of the document title column within your database table.
      + **Body column**—Provide the name of the document body column within your database table.

   1. In **Additional configuration – *optional***, choose from the following options to sync specific content instead of syncing all files:
      + **Change-detecting columns**—Enter the names of the columns that Amazon Kendra will use to detect content changes. Amazon Kendra will re-index content when there is a change in any of these columns.
      + **User IDs column**—Enter the name of the column which contains User IDs to be allowed access to content.
      + **Groups column**—Enter the name of the column that contains groups to be allowed access to content.
      + **Source URLs column**—Enter the name of the column which contains Source URLs to be indexed.
      + **Time stamps column**—Enter the name of the column which contains time stamps. Amazon Kendra uses time stamp information to detect changes in your content and sync only changed content. 
      + **Time zones column**—Enter the name of the column which contains time zones for the content to be crawled.
      + **Time stamps format**—Enter the name of the column which contains time stamp formats to use to detect content changes and re-sync your content.

   1. **Sync mode**—Choose how you want to update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option.
      + Full sync: Freshly index all content, replacing existing content each time your data source syncs with your index.
      + New, modified sync: Index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.
      + New, modified, deleted sync: Index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

   1. In **Sync run schedule**, for **Frequency**—How often Amazon Kendra will sync with your data source.

   1. Choose **Next**.

1. On the **Set field mappings** page, enter the following information:

   1. Select from the generated default data source fields—**Document IDs**, **Document titles**, and **Source URLs**—you want to map to Amazon Kendra index.

   1.  **Add field**—To add custom data source fields to create an index field name to map to and the field data type.

   1. Choose **Next**.

1. On the **Review and create** page, check that the information you have entered is correct and then select **Add data source**. You can also choose to edit your information from this page. Your data source will appear on the **Data sources** page after the data source has been added successfully.

------
#### [ API ]

**To connect Amazon Kendra to IBM DB2**

You must specify the following using the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) API:
+ **Data source**—Specify the data source type as `JDBC` when you use the [https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) JSON schema. Also specify the data source as `TEMPLATE` when you call the [https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html](https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html) API.
+ **Database type**—You must specify the database type as `db2`.
+ **SQL query**—Specify SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB. Amazon Kendra will crawl all database content that matches your query.
+ **Sync mode**—Specify how Amazon Kendra should update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option. You can choose between:
  + `FORCED_FULL_CRAWL` to freshly index all content, replacing existing content each time your data source syncs with your index.
  + `FULL_CRAWL` to index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source’s mechanism for tracking content changes and index content that changed since the last sync.
  + `CHANGE_LOG` to index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source’s mechanism for tracking content changes and index content that changed since the last sync.
+ **Secret Amazon Resource Name (ARN)**—Provide the Amazon Resource Name (ARN) of an Secrets Manager secret that contains the authentication credentials you created in your IBM DB2 account. The secret is stored in a JSON structure with the following keys:

  ```
  {
      "user name": "database user name",
      "password": "password"
  }
  ```
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).
+ **IAM role**—Specify `RoleArn` when you call `CreateDataSource` to provide an IAM role with permissions to access your Secrets Manager secret and to call the required public APIs for the IBM DB2 connector and Amazon Kendra. For more information, see [IAM roles for IBM DB2 data sources](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds).

You can also add the following optional features:
+  **Virtual Private Cloud (VPC)**—Specify `VpcConfiguration` when you call `CreateDataSource`. For more information, see [Configuring Amazon Kendra to use an Amazon VPC](vpc-configuration.md).
+  **Inclusion and exclusion filters**—You can specify whether to include specific content using user IDs, groups, source URLs, time stamps, and time zones. 
+  **User context filtering and access control**—Amazon Kendra crawls the access control list (ACL) for your documents, if you have an ACL for your documents. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see [User context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#datasource-context-filter).
+  **Field mappings**—Choose to map your IBM DB2 data source fields to your Amazon Kendra index fields. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html).
**Note**  
The document body field or the document body equivalent for your documents is required in order for Amazon Kendra to search your documents. You must map your document body field name in your data source to the index field name `_document_body`. All other fields are optional.

For a list of other important JSON keys to configure, see [IBM DB2 template schema](ds-schemas.md#ds-ibm-db2-schema).

------

## Notes
<a name="ibm-db2-notes"></a>
+ Deleted database rows will not be tracked in when Amazon Kendra checks for updated content.
+ The size of field names and values in a row of your database can't exceed 400KB.
+ If you have a large amount of data in your database data source, and do not want Amazon Kendra to index all your database content after the first sync, you can choose to sync only new, modified, or deleted documents.
+ As a best practice, provide Amazon Kendra with read-only database credentials.
+ As a best practice, avoid adding tables with sensitive data or personal identifiable information (PII).

# Jira
<a name="data-source-jira"></a>

Jira is a project management tool for software development, product management, and bug tracking. You can use Amazon Kendra to index your Jira projects, issues, comments, attachments, worklogs, and statuses.

Amazon Kendra currently only supports Jira Cloud.

You can connect Amazon Kendra to your Jira data source using either the [Amazon Kendra console](https://console.aws.amazon.com/kendra/) or the [JiraConfiguration ](https://docs.aws.amazon.com/kendra/latest/APIReference/API_JiraConfiguration.html) API. For a list of features supported by each, see [Supported features](#supported-features-jira).

For troubleshooting your Amazon Kendra Jira data source connector, see [Troubleshooting data sources](troubleshooting-data-sources.md).

**Topics**
+ [Supported features](#supported-features-jira)
+ [Prerequisites](#prerequisites-jira)
+ [Connection instructions](#data-source-procedure-jira)
+ [Learn more](#jira-learn-more)

## Supported features
<a name="supported-features-jira"></a>

Amazon Kendra Jira data source connector supports the following features:
+ Field mappings
+ User access control
+ Inclusion/exclusion filters
+ Full and incremental content syncs
+ Virtual private cloud (VPC)

## Prerequisites
<a name="prerequisites-jira"></a>

Before you can use Amazon Kendra to index your Jira data source, make these changes in your Jira and AWS accounts.

**In Jira, make sure you have:**
+ Configured API token authentication credentials, which include a Jira ID (user name or email) and a Jira credential (Jira API token). See [Atlassian documentation on managing API tokens](https://support.atlassian.com/atlassian-account/docs/manage-api-tokens-for-your-atlassian-account/).
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).
+ Noted the Jira account URL from your Jira account settings. For example, *https://company.atlassian.net/*.
+ Checked each document is unique in Jira and across other data sources you plan to use for the same index. Each data source that you want to use for an index must not contain the same document across the data sources. Document IDs are global to an index and must be unique per index.

**In your AWS account, make sure you have:**
+ [Created an Amazon Kendra index](https://docs.aws.amazon.com/kendra/latest/dg/create-index.html) and, if using the API, noted the index ID.
+ [Created an IAM role](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds) for your data source and, if using the API, noted the ARN of the IAM role.
**Note**  
If you change your authentication type and credentials, you must update your IAM role to access the correct AWS Secrets Manager secret ID.
+ Stored your Jira authentication credentials in an AWS Secrets Manager secret and, if using the API, noted the ARN of the secret.
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).

If you don’t have an existing IAM role or secret, you can use the console to create a new IAM role and Secrets Manager secret when you connect your Jira data source to Amazon Kendra. If you are using the API, you must provide the ARN of an existing IAM role and Secrets Manager secret, and an index ID.

## Connection instructions
<a name="data-source-procedure-jira"></a>

To connect Amazon Kendra to your Jira data source, you must provide the necessary details of your Jira data source so that Amazon Kendra can access your data. If you have not yet configured Jira for Amazon Kendra, see [Prerequisites](#prerequisites-jira).

------
#### [ Console ]

**To connect Amazon Kendra to Jira** 

1. Sign in to the AWS Management Console and open the [Amazon Kendra console](https://console.aws.amazon.com/kendra/).

1. From the left navigation pane, choose **Indexes** and then choose the index you want to use from the list of indexes.
**Note**  
You can choose to configure or edit your **User access control** settings under **Index settings**. 

1. On the **Getting started** page, choose **Add data source**.

1. On the **Add data source** page, choose **Jira connector**, and then choose **Add connector**. If using version 2 (if applicable), choose **Jira connector** with the "V2.0" tag.

1. On the **Specify data source details** page, enter the following information:

   1. In **Name and description**, for **Data source name**—Enter a name for your data source. You can include hyphens but not spaces.

   1. (Optional)** Description**—Enter an optional description for your data source.

   1. In **Default language**—Choose a language to filter your documents for the index. Unless you specify otherwise, the language defaults to English. Language specified in the document metadata overrides the selected language.

   1. In **Tags**, for **Add new tag**—Include optional tags to search and filter your resources or track your AWS costs.

   1. Choose **Next**.

1. On the **Define access and security** page, enter the following information:

   1. **Jira account URL**—Enter your Jira Account URL. For example: *https://company.atlassian.net/*.

   1. **Authorization**—Turn on or off access control list (ACL) information for your documents, if you have an ACL and want to use it for access control. The ACL specifies which documents that users and groups can access. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see [User context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources).

   1. **AWS Secrets Manager secret**—Choose an existing secret or create a new Secrets Manager secret to store your Jira authentication credentials. If you choose to create a new secret an AWS Secrets Manager secret window opens.

      1. Enter following information in the **Create an AWS Secrets Manager secret window**:

         1. **Secret name**—A name for your secret. The prefix ‘AmazonKendra-Jira-’ is automatically added to your secret name.

         1. For **Jira ID**—Enter the Jira user name or email.

         1. For **Password/Token**—Enter the Jira API token configured in Jira.

      1. Save and add your secret.

   1. **Virtual Private Cloud (VPC)**—You can choose to use a VPC. If so, you must add **Subnets** and **VPC security groups**.

   1. **Identity crawler**—Specify whether to turn on Amazon Kendra’s identity crawler. The identity crawler uses the access control list (ACL) information for your documents to filter search results based on the user or their group access to documents. If you have an ACL for your documents and choose to use your ACL, you can then also choose to turn on Amazon Kendra’s identity crawler to configure [user context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources) of search results. Otherwise, if identity crawler is turned off, all documents can be publicly searched. If you want to use access control for your documents and identity crawler is turned off, you can alternatively use the [PutPrincipalMapping](https://docs.aws.amazon.com/kendra/latest/APIReference/API_PutPrincipalMapping.html) API to upload user and group access information for user context filtering.

   1. **IAM role**—Choose an existing IAM role or create a new IAM role to access your repository credentials and index content.
**Note**  
IAM roles used for indexes cannot be used for data sources. If you are unsure if an existing role is used for an index or FAQ, choose **Create a new role** to avoid errors.

   1. Choose **Next**.

1. On the **Configure sync settings** page, enter the following information:

   1. **Select which Jira projects to index**—Choose to crawl all project or specific projects.

   1. **Additional configuration**—Specify certain statuses, and issue types. Choose to crawl comments, attachments, and worklogs. Use regular expression patterns to include or exclude certain content.

   1. **Sync mode**—Choose how you want to update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option.
      + Full sync: Freshly index all content, replacing existing content each time your data source syncs with your index.
      + New, modified sync: Index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.
      + New, modified, deleted sync: Index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

   1. In **Sync run schedule**, for **Frequency**—Choose how often to sync your data source content and update your index.

   1. Choose **Next**.

1. On the **Set field mappings** page, enter the following information:

   1. **Default data source fields**—Select from the Amazon Kendra generated default data source fields you want to map to your index. 

   1.  **Add field**—To add custom data source fields to create an index field name to map to and the field data type.

   1. Choose **Next**.

1. On the **Review and create** page, check that the information you have entered is correct and then select **Add data source**. You can also choose to edit your information from this page. Your data source will appear on the **Data sources** page after the data source has been added successfully.

------
#### [ API ]

**To connect Amazon Kendra to Jira**

You must specify the following using the [JiraConfiguration ](https://docs.aws.amazon.com/kendra/latest/APIReference/API_JiraConfiguration.html) API:
+ **Data source URL**—Specify your Jira account URL. For example, *company.atlassian.net*.
+ **Secret Amazon Resource Name (ARN)**—Provide the Amazon Resource Name (ARN) of an Secrets Manager secret that contains the authentication credentials for your Jira account. The secret is stored in a JSON structure with the following keys:

  ```
  {
      "jiraId": "Jira user name or email",
      "jiraCredential": "Jira API token"
  }
  ```
+ **IAM role**—Specify `RoleArn` when you call `CreateDataSource` to provide an IAM role with permissions to access your Secrets Manager secret and to call the required public APIs for the Jira connector and Amazon Kendra. For more information, see [IAM roles for Jira data sources](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds).

You can also add the following optional features:
+ **Virtual Private Cloud (VPC)**—Specify `VpcConfiguration` as part of the data source configuration. See [Configuring Amazon Kendra to use a VPC](https://docs.aws.amazon.com/kendra/latest/dg/vpc-configuration.html).
+  **Change log**—Whether Amazon Kendra should use the Jira data source change log mechanism to determine if a document must be updated in the index.
**Note**  
Use the change log if you don’t want Amazon Kendra to scan all of the documents. If your change log is large, it might take Amazon Kendra less time to scan the documents in the Jira data source than to process the change log. If you are syncing your Jira data source with your index for the first time, all documents are scanned. 
+  **Inclusion and exclusion filters**—You can specify whether to include or exclude certain files.
**Note**  
Most data sources use regular expression patterns, which are inclusion or exclusion patterns referred to as filters. If you specify an inclusion filter, only content that matches the inclusion filter is indexed. Any document that doesn’t match the inclusion filter isn’t indexed. If you specify an inclusion and exclusion filter, documents that match the exclusion filter are not indexed, even if they match the inclusion filter.
+ **Comment, attachments, and work logs**—You can specify whether to crawl certain comments, attachments, and work logs of issues.
+ **Projects, Issues, Statuses**—You can specify whether to crawl certain project IDs, issue types, and statuses.
+  **User context filtering and access control**—Amazon Kendra crawls the access control list (ACL) for your documents, if you have an ACL for your documents. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see [User context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#datasource-context-filter).
+  **Field mappings**—Choose to map your Jira data source fields to your Amazon Kendra index fields. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html).
**Note**  
The document body field or the document body equivalent for your documents is required in order for Amazon Kendra to search your documents. You must map your document body field name in your data source to the index field name `_document_body`. All other fields are optional.

------

## Learn more
<a name="jira-learn-more"></a>

To learn more about integrating Amazon Kendra with your Jira data source, see:
+ [Intelligently search your Jira projects with Amazon Kendra Jira Cloud connector](https://aws.amazon.com/blogs/machine-learning/intelligently-search-your-jira-projects-with-amazon-kendra-jira-cloud-connector/)

# Microsoft Exchange
<a name="data-source-exchange"></a>

Microsoft Exchange is an enterprise collaboration tool for messaging, meetings and file sharing. If you are a Microsoft Exchange user, you can use Amazon Kendra to index your Microsoft Exchange data source.

You can connect Amazon Kendra to your Microsoft Exchange data source using the [Amazon Kendra console](https://console.aws.amazon.com/kendra/) and the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html) API.

For troubleshooting your Amazon Kendra Microsoft Exchange data source connector, see [Troubleshooting data sources](troubleshooting-data-sources.md).

## Supported features
<a name="supported-features-exchange"></a>
+ Field mappings
+ User access control
+ Inclusion/exclusion filters
+ Full and incremental content syncs
+ Virtual private cloud (VPC)

## Prerequisites
<a name="prerequisites-exchange"></a>

Before you can use Amazon Kendra to index your Microsoft Exchange data source, make these changes in your Microsoft Exchange and AWS accounts.

**In Microsoft Exchange, make sure you have:**
+ Created a Microsoft Exchange account in Office 365.
+ Noted your Microsoft 365 tenant ID. You can find your tenant ID in the Properties of your Azure Active Directory Portal or in your OAuth application.
+ Configured an OAuth application in the Azure portal and noted the client ID and client secret or client credentials. See [Microsoft tutorial](https://learn.microsoft.com/en-us/power-apps/developer/data-platform/walkthrough-register-app-azure-active-directory) and [Registered app example](https://learn.microsoft.com/en-us/azure/healthcare-apis/register-application) for more information.
**Note**  
When you create or register an app in the Azure portal, the secret ID represents the actual secret value. You must take note or save the actual secret value immediately when creating the secret and app. You can access your secret by selecting the name of your application in the Azure portal and then navigating to the menu option on certificates and secrets.  
You can access your client ID by selecting the name of your application in the Azure portal and then navigating to the overview page. The Application (client) ID is the client ID.
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).
+ Added the following permissions for the connector application:    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/data-source-exchange.html)
+ Checked each document is unique in Microsoft Exchange and across other data sources you plan to use for the same index. Each data source that you want to use for an index must not contain the same document across the data sources. Document IDs are global to an index and must be unique per index.

**In your AWS account, make sure you have:**
+ [Created an Amazon Kendra index](https://docs.aws.amazon.com/kendra/latest/dg/create-index.html) and, if using the API, noted the index ID.
+ [Created an IAM role](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds) for your data source and, if using the API, noted the ARN of the IAM role.
**Note**  
If you change your authentication type and credentials, you must update your IAM role to access the correct AWS Secrets Manager secret ID.
+ Stored your Microsoft Exchange authentication credentials in an AWS Secrets Manager secret and, if using the API, noted the ARN of the secret.
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).

If you don’t have an existing IAM role or secret, you can use the console to create a new IAM role and Secrets Manager secret when you connect your Microsoft Exchange data source to Amazon Kendra. If you are using the API, you must provide the ARN of an existing IAM role and Secrets Manager secret, and an index ID.

## Connection instructions
<a name="data-source-procedure-exchange"></a>

To connect Amazon Kendra to your Microsoft Exchange data source, you must provide the necessary details of your Microsoft Exchange data source so that Amazon Kendra can access your data. If you have not yet configured Microsoft Exchange for Amazon Kendra, see [Prerequisites](#prerequisites-exchange).

------
#### [ Console ]

**To connect Amazon Kendra to Microsoft Exchange** 

1. Sign in to the AWS Management Console and open the [Amazon Kendra console](https://console.aws.amazon.com/kendra/).

1. From the left navigation pane, choose **Indexes** and then choose the index you want to use from the list of indexes.
**Note**  
You can choose to configure or edit your **User access control** settings under **Index settings**. 

1. On the **Getting started** page, choose **Add data source**.

1. On the **Add data source** page, choose **Microsoft Exchange connector**, and then choose **Add connector**. If using version 2 (if applicable), choose **Microsoft Exchange connector** with the "V2.0" tag.

1. On the **Specify data source details** page, enter the following information:

   1. In **Name and description**, for **Data source name**—Enter a name for your data source. You can include hyphens but not spaces.

   1. (Optional)** Description**—Enter an optional description for your data source.

   1. In **Default language**—Choose a language to filter your documents for the index. Unless you specify otherwise, the language defaults to English. Language specified in the document metadata overrides the selected language.

   1. In **Tags**, for **Add new tag**—Include optional tags to search and filter your resources or track your AWS costs.

   1. Choose **Next**.

1. On the **Define access and security** page, enter the following information:

   1. **Tenant ID**—Enter your Microsoft 365 tenant ID. You can find your tenant ID in the Properties of your Azure Active Directory Portal or in your OAuth application.

   1. **Authorization**—Turn on or off access control list (ACL) information for your documents, if you have an ACL and want to use it for access control. The ACL specifies which documents that users and groups can access. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see [User context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources).

   1. **AWS Secrets Manager secret**—Choose an existing secret or create a new Secrets Manager secret to store your Microsoft Exchange authentication credentials. If you choose to create a new secret an AWS Secrets Manager secret window opens.

      1. Enter following information in the **Create an AWS Secrets Manager secret window**:

         1. **Secret name**—A name for your secret. The prefix 'AmazonKendra-Microsoft Exchange

         1. For **Client ID**, **Client secret**—Enter the authentication credentials configured in Microsoft Exchange in the Azure portal.

      1. Save and add your secret.

   1. **Virtual Private Cloud (VPC)**—You can choose to use a VPC. If so, you must add **Subnets** and **VPC security groups**.

   1. **IAM role**—Choose an existing IAM role or create a new IAM role to access your repository credentials and index content.
**Note**  
IAM roles used for indexes cannot be used for data sources. If you are unsure if an existing role is used for an index or FAQ, choose **Create a new role** to avoid errors.

   1. Choose **Next**.

1. On the **Configure sync settings** page, enter the following information:

   1. **User IDs**—Provide the user emails if you want to filter content by certain emails.

   1. **Additional configuration**—Specify the types of content you want to crawl.
      + **Entity types**—You can choose to crawl calendar, OneNotes, or contacts content.
      + **Calendar crawling**—Enter the start and end date to crawl content between certain dates.
      + **Include email**—Enter "to", "from", and email subject lines to filter certain emails you want to crawl.
      + **Shared folders access**—Choose to enable crawling of access control list for access control of your Microsoft Exchange data source.
      + **Regex for domains**—Add regular expression patterns to include or exclude certain email domains.
      + **Regex patterns**—Add regular expression patterns to include or exclude certain files.

   1. **Sync mode**—Choose how you want to update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option.
      + Full sync: Freshly index all content, replacing existing content each time your data source syncs with your index.
      + New, modified sync: Index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.
      + New, modified, deleted sync: Index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

   1. In **Sync run schedule**, for **Frequency**—Choose how often to sync your data source content and update your index.

   1. Choose **Next**.

1. On the **Set field mappings** page, enter the following information:

   1. **Default data source fields**—Select from the Amazon Kendra generated default data source fields you want to map to your index.
**Note**  
The Amazon Kendra Microsoft Exchange data source connector doesn't support custom field mappings.

   1. Choose **Next**.

1. On the **Review and create** page, check that the information you have entered is correct and then select **Add data source**. You can also choose to edit your information from this page. Your data source will appear on the **Data sources** page after the data source has been added successfully.

------
#### [ API ]

**To connect Amazon Kendra to Microsoft Exchange**

You must specify a JSON of the [data source schema](https://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html#ds-msexchange-schema.html) using the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html) API. You must provide the following information:
+ **Data source**—Specify the data source type as `MSEXCHANGE` when you use the [https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) JSON schema. Also specify the data source as `TEMPLATE` when you call the [https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html](https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html) API.
+ **Tenant ID**—You can find your tenant ID in the Properties of your Azure Active Directory Portal or in your OAuth application.
+ **Sync mode**—Specify how Amazon Kendra should update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option. You can choose between:
  + `FORCED_FULL_CRAWL` to freshly index all content, replacing existing content each time your data source syncs with your index.
  + `FULL_CRAWL` to index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source’s mechanism for tracking content changes and index content that changed since the last sync.
  + `CHANGE_LOG` to index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source’s mechanism for tracking content changes and index content that changed since the last sync.
+ **Secret Amazon Resource Name (ARN)**—Provide the Amazon Resource Name (ARN) of an Secrets Manager secret that contains the authentication credentials for your Microsoft Exchange account. The secret is stored in a JSON structure with the following keys:

  ```
  {
      "clientId": "client ID",
      "clientSecret": "client secret"
  }
  ```
+ **IAM role**—Specify `RoleArn` when you call `CreateDataSource` to provide an IAM role with permissions to access your Secrets Manager secret and to call the required public APIs for the Microsoft Exchange connector and Amazon Kendra. For more information, see [IAM roles for Microsoft Exchange data sources](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds).

You can also add the following optional features:
+  **Virtual Private Cloud (VPC)**—Specify `VpcConfiguration` when you call `CreateDataSource`. For more information, see [Configuring Amazon Kendra to use an Amazon VPC](vpc-configuration.md).
+ **Inclusion and exclusion filters**—Specify whether to include or exclude certain content.
**Note**  
Most data sources use regular expression patterns, which are inclusion or exclusion patterns referred to as filters. If you specify an inclusion filter, only content that matches the inclusion filter is indexed. Any document that doesn’t match the inclusion filter isn’t indexed. If you specify an inclusion and exclusion filter, documents that match the exclusion filter are not indexed, even if they match the inclusion filter.
+ **Access control list (ACL)**—Specify whether to crawl ACL information for your documents, if you have an ACL and want to use it for access control. The ACL specifies which documents that users and groups can access. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see [User context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources).
+  **Field mappings**—Choose to map your Microsoft Exchange data source fields to your Amazon Kendra index fields. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html).
**Note**  
The document body field or the document body equivalent for your documents is required in order for Amazon Kendra to search your documents. You must map your document body field name in your data source to the index field name `_document_body`. All other fields are optional.

For a list of other important JSON keys to configure, see [Microsoft Exchange template schema](https://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html#ds-msexchange-schema).

------

## Learn more
<a name="exchange-learn-more"></a>

To learn more about integrating Amazon Kendra with your Microsoft Exchange data source, see:
+ [Index your Microsoft Exchange content using the Exchange connector for Amazon Kendra](https://aws.amazon.com/blogs/machine-learning/index-your-microsoft-exchange-content-using-the-exchange-connector-for-amazon-kendra/)

## Notes
<a name="exchange-notes"></a>
+ When Access Control Lists (ACLs) are enabled, the "Sync only new or modified content" option is not available due to Microsoft Exchange API limitations. We recommend using "Full sync" or "New, modified, or deleted content sync" modes instead, or disable ACLs if you need to use this sync mode.

# Microsoft OneDrive
<a name="data-source-onedrive"></a>

Microsoft OneDrive is cloud-based storage service that you can use to store, share, and host your content. You can use Amazon Kendra to index your OneDrive data source.

You can connect Amazon Kendra to your OneDrive data source using the [Amazon Kendra console](https://console.aws.amazon.com/kendra/) and the [OneDriveConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_OneDriveConfiguration.html) API.

Amazon Kendra has two versions of the OneDrive connector. Supported features of each version include:

**Microsoft OneDrive connector V1.0 / [OneDriveConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_OneDriveConfiguration.html) API**
+ Field mappings
+ Inclusion/exclusion filters

**Microsoft OneDrive connector V2.0 / [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html) API**
+ User context filtering
+ User identity crawler
+ Inclusion/exclusion filters
+ Full and incremental content syncs
+ Virtual private cloud (VPC)

**Note**  
Support for OneDrive connector V1.0 / OneDriveConfiguration API is scheduled to end by June 2023. We recommend using OneDrive connector V2.0 / TemplateConfiguration API.

For troubleshooting your Amazon Kendra OneDrive data source connector, see [Troubleshooting data sources](troubleshooting-data-sources.md).

**Topics**
+ [Microsoft OneDrive connector V1.0](data-source-v1-onedrive.md)
+ [Microsoft OneDrive connector V2.0](data-source-v2-onedrive.md)
+ [Learn more](#onedrive-learn-more)
+ [Notes](#onedrive-notes)

# Microsoft OneDrive connector V1.0
<a name="data-source-v1-onedrive"></a>

Microsoft OneDrive is a cloud-based storage service that you can use to store, share, and host your content. You can use Amazon Kendra to index your Microsoft OneDrive data source. 

**Note**  
Support for OneDrive connector V1.0 / Microsft OneDrive API is scheduled to end by June 2023. We recommend using OneDrive connector V2.0 / TemplateConfiguration API.

For troubleshooting your Amazon Kendra OneDrive data source connector, see [Troubleshooting data sources](troubleshooting-data-sources.md).

**Topics**
+ [Supported features](#supported-features-v1-onedrive)
+ [Prerequisites](#prerequisites-v1-onedrive)
+ [Connection instructions](#data-source-v1-procedure-onedrive)

## Supported features
<a name="supported-features-v1-onedrive"></a>
+ Field mappings
+ Inclusion/exclusion filters

## Prerequisites
<a name="prerequisites-v1-onedrive"></a>

Before you can use Amazon Kendra to index your OneDrive data source, make these changes in your OneDrive and AWS accounts.

**In your Azure Active Directory (AD), make sure you have:**
+ Created an Azure Active Directory (AD) application.
+ Used the AD application ID to register a secret key for the application on the AD site. The secret key must contain the application ID and a secret key.
+ Copied the AD domain of the organization.
+ Added the following application permissions to your AD application on the Microsoft Graph option:
  + Read files in all site collections (File.Read.All)
  + Read all users' full profile (User.Read.All)
  + Read directory data (Directory.Read.All)
  + Read all groups (Group.Read.All)
  + Read items in all site collections (Site.Read.All)
+ Copied the list of users whose documents must be indexed. You can choose to provide a list of user names, or you can provide the user names in a file stored in an Amazon S3. After you create the data source, you can:
  + Modify the list of users.
  + Change from a list of users to a list stored in an Amazon S3 bucket.
  + Change the Amazon S3 bucket location of a list of users. If you change the bucket location, you must also update the IAM role for the data source so that it has access to the bucket.
**Note**  
If you store the list of user names in an Amazon S3 bucket, the IAM policy for the data source must provide access to the bucket and access to the key that the bucket was encrypted with, if any.
+ Checked each document is unique in OneDrive and across other data sources you plan to use for the same index. Each data source that you want to use for an index must not contain the same document across the data sources. Document IDs are global to an index and must be unique per index.

**In your AWS account, make sure you have:**
+ [Created an Amazon Kendra index](https://docs.aws.amazon.com/kendra/latest/dg/create-index.html) and, if using the API, noted the index ID.
+ [Created an IAM role](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds) for your data source and, if using the API, noted the ARN of the IAM role.
**Note**  
If you change your authentication type and credentials, you must update your IAM role to access the correct AWS Secrets Manager secret ID.
+ Stored your OneDrive authentication credentials in an AWS Secrets Manager secret and, if using the API, noted the ARN of the secret.
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).

If you don’t have an existing IAM role or secret, you can use the console to create a new IAM role and Secrets Manager secret when you connect your OneDrive data source to Amazon Kendra. If you are using the API, you must provide the ARN of an existing IAM role and Secrets Manager secret, and an index ID.

## Connection instructions
<a name="data-source-v1-procedure-onedrive"></a>

To connect Amazon Kendra to your OneDrive data source you must provide details of your OneDrive credentials so that Amazon Kendra can access your data. If you have not yet configured OneDrive for Amazon Kendra see [Prerequisites](#prerequisites-v1-onedrive).

------
#### [ Console ]

**To connect Amazon Kendra to OneDrive** 

1. Sign in to the AWS Management Console and open the [Amazon Kendra console](https://console.aws.amazon.com/kendra/).

1. From the left navigation pane, choose **Indexes** and then choose the index you want to use from the list of indexes.
**Note**  
You can choose to configure or edit your **User access control** settings under **Index settings**. 

1. On the **Getting started** page, choose **Add data source**.

1. On the **Add data source** page, choose **OneDrive connector**, and then choose **Add connector**. If using version 2 (if applicable), choose **OneDrive connector** with the "V2.0" tag.

1. On the **Specify data source details** page, enter the following information:

   1. In **Name and description**, for **Data source name**—Enter a name for your data source. You can include hyphens but not spaces.

   1. (Optional)** Description**—Enter an optional description for your data source.

   1. In **Default language**—Choose a language to filter your documents for the index. Unless you specify otherwise, the language defaults to English. Language specified in the document metadata overrides the selected language.

   1. In **Tags**, for **Add new tag**—Include optional tags to search and filter your resources or track your AWS costs.

   1. Choose **Next**.

1. On the **Define access and security** page, enter the following information:

   1. **OneDrive tenant ID**—Enter the OneDrive tenant ID without the protocol.

   1. **Type of authentication**—Choose between **New** and **Existing**.

   1. 

      1. If you choose **Existing**, select an existing secret for **Select secret**.

      1. If you choose **New**, enter following information in the **New AWS Secrets Manager secret** section:

         1. **Secret name**—A name for your secret. The prefix ‘AmazonKendra-OneDrive-’ is automatically added to your secret name.

         1. For **Application ID** and **Application password**—Enter the authentication credential values from your OneDrive account and then choose **Save authentication**. 

   1. **IAM role**—Choose an existing IAM role or create a new IAM role to access your repository credentials and index content.
**Note**  
IAM roles used for indexes cannot be used for data sources. If you are unsure if an existing role is used for an index or FAQ, choose **Create a new role** to avoid errors.

   1. Choose **Next**.

1. On the **Configure sync settings** page, enter the following information:

   1. Choose between **List file** and **Names list** based on your use case.

      1. If you choose **List file**, enter the following information:

         1.  **Select location**—Enter the path to your Amazon S3 bucket. 

            **Add user list file to Amazon S3**—Select to add your user list files to your Amazon S3 bucket. 

            **User local group mappings**—Select to use local group mapping to filter your content.

      1. If you choose **Names list**, enter the following information:

         1.  **User name**—Enter up to 10 user drives to index. To add more than 10 users, create a file that contains the names.

            **Add another**—Choose to add more users.

            **User local group mappings**—Select to use local group mapping to filter your content.

   1. For **Additional configurations**—Add regular expression patterns to include or exclude certain files. You can add up to 100 patterns.

   1. In **Sync run schedule**, for **Frequency**—Choose how often Amazon Kendra will sync with your data source.

   1. Choose **Next**.

1. On the **Set field mappings** page, enter the following information:

   1. For **Default data source fields** and **Additional suggested field mappings**—Select from the Amazon Kendra generated default data source fields you want to map to your index. 

   1. Choose **Next**.

1. On the **Review and create** page, check that the information you have entered is correct and then select **Add data source**. You can also choose to edit your information from this page. Your data source will appear on the **Data sources** page after the data source has been added successfully.

------
#### [ API ]

**To connect Amazon Kendra to OneDrive**

You must specify the following using the [OneDriveConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_OneDriveConfiguration.html) API:
+ **Tenant ID**—Specify the Azure Active Directory domain of the organization.
+ **OneDrive Users**—Specify the list of user accounts whose documents should be indexed.
+ **Secret Amazon Resource Name (ARN)**—Provide the Amazon Resource Name (ARN) of an Secrets Manager secret that contains the authentication credentials for your OneDrive account. The secret is stored in a JSON structure with the following keys:

  ```
  {
      "username": "OAuth client ID",
      "password": "client secret"
  }
  ```
+ **IAM role**—Specify `RoleArn` when you call `CreateDataSource` to provide an IAM role with permissions to access your Secrets Manager secret and to call the required public APIs for the OneDrive connector and Amazon Kendra. For more information, see [IAM roles for OneDrive data sources](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds).

You can also add the following optional features:
+  **Inclusion and exclusion filters**—Specify whether to include or exclude certain documents.
**Note**  
Most data sources use regular expression patterns, which are inclusion or exclusion patterns referred to as filters. If you specify an inclusion filter, only content that matches the inclusion filter is indexed. Any document that doesn’t match the inclusion filter isn’t indexed. If you specify an inclusion and exclusion filter, documents that match the exclusion filter are not indexed, even if they match the inclusion filter.
+  **Field mappings**—Choose to map your OneDrive data source fields to your Amazon Kendra index fields. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html).
**Note**  
The document body field or the document body equivalent for your documents is required in order for Amazon Kendra to search your documents. You must map your document body field name in your data source to the index field name `_document_body`. All other fields are optional.
+  **User context filtering and access control**—Amazon Kendra crawls the access control list (ACL) for your documents, if you have an ACL for your documents. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see [User context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#datasource-context-filter).

------

# Microsoft OneDrive connector V2.0
<a name="data-source-v2-onedrive"></a>

Microsoft OneDrive is cloud-based storage service that you can use to store, share, and host your content. You can use Amazon Kendra to index your OneDrive data source.

You can connect Amazon Kendra to your OneDrive data source using the [Amazon Kendra console](https://console.aws.amazon.com/kendra/) and the [OneDriveConfiguration](https://docs.aws.amazon.com/kendra/latest/dg/OneDriveConfiguration.html) API. 


**Note**  
Support for OneDrive Connector V1.0 / OneDriveConfiguration API is scheduled to end by June 2023. We recommend using OneDrive Connector V2.0 / TemplateConfiguration API. Version 2.0 provides additional ACLs and identity crawler functionality.

For troubleshooting your Amazon Kendra OneDrive data source connector, see [Troubleshooting data sources](troubleshooting-data-sources.md).

**Topics**
+ [Supported features](#supported-features-v2-onedrive)
+ [Prerequisites](#prerequisites-v2-onedrive)
+ [Connection instructions](#data-source-procedure-v2-onedrive)

## Supported features
<a name="supported-features-v2-onedrive"></a>

Amazon Kendra OneDrive data source connector supports the following features:
+ Field mappings
+ User access control
+ Inclusion/exclusion filters
+ Full and incremental content syncs
+ Virtual private cloud (VPC)

## Prerequisites
<a name="prerequisites-v2-onedrive"></a>

Before you can use Amazon Kendra to index your OneDrive data source, make these changes in your OneDrive and AWS accounts.

**In OneDrive, make sure you have:**
+ Created a OneDrive account in Office 365.
+ Noted your Microsoft 365 tenant ID. You can find your tenant ID in the Properties of your Azure Active Directory Portal or in your OAuth application.
+ Created an OAuth application in the Azure portal and noted the client ID and client secret or client credentials used for authentication with an AWS Secrets Manager secret. See [Microsoft tutorial](https://learn.microsoft.com/en-us/power-apps/developer/data-platform/walkthrough-register-app-azure-active-directory) and [Registered app example](https://learn.microsoft.com/en-us/azure/healthcare-apis/register-application) for more information.
**Note**  
When you create or register an app in the Azure portal, the secret ID represents the actual secret value. You must take note or save the actual secret value immediately when creating the secret and app. You can access your secret by selecting the name of your application in the Azure portal and then navigating to the menu option on certificates and secrets.  
You can access your client ID by selecting the name of your application in the Azure portal and then navigating to the overview page. The Application (client) ID is the client ID.
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).
+ Used the AD application ID to register a secret key for the application on the AD site. The secret key must contain the application ID and a secret key.
+ Copied the AD domain of the organization.
+ Added the following permissions to your AD application on the Microsoft Graph option:
  + Read files in all site collections (File.Read.All)
  + Read all users' full profiles(User.Read.All)
  + Read all groups (Group.Read.All)
  + Read all notes (Notes.Read.All)
+ Copied the list of users whose documents must be indexed. You can choose to provide a list of user names, or you can provide the user names in a file stored in an Amazon S3. After you create the data source, you can:
  + Modify the list of users.
  + Change from a list of users to a list stored in an Amazon S3 bucket.
  + Change the Amazon S3 bucket location of a list of users. If you change the bucket location, you must also update the IAM role for the data source so that it has access to the bucket.
**Note**  
If you store the list of user names in an Amazon S3 bucket, the IAM policy for the data source must provide access to the bucket and access to the key that the bucket was encrypted with, if any.  
The OneDrive connector uses **Email from Contact Information** present in the **Onedrive User Properties**. Make sure the user whose data you want to crawl has the email field configured in the **Contact Information** page as for new users this might be blank.

**In your AWS account, make sure you have:**
+ Created an Amazon Kendra index and, if using the API, noted the index id.
+ Created an IAM role for your data source and, if using the API, noted the ARN of the IAM role.
+ Stored your OneDrive authentication credentials in an AWS Secrets Manager secret and, if using the API, noted the ARN of the secret.

If you don’t have an existing IAM role or secret, you can use the console to create a new IAM role and Secrets Manager secret when you connect your OneDrive data source to Amazon Kendra. If you are using the API, you must provide the ARN of an existing IAM role and Secrets Manager secret, and an index id.

## Connection instructions
<a name="data-source-procedure-v2-onedrive"></a>

To connect Amazon Kendra to your OneDrive data source you must provide details of your OneDrive credentials so that Amazon Kendra can access your data. If you have not yet configured OneDrive for Amazon Kendra, see [Prerequisites](#prerequisites-v2-onedrive).

------
#### [ Console ]

**To connect Amazon Kendra to OneDrive** 

1. Sign in to the AWS Management Console and open the [Amazon Kendra console](https://console.aws.amazon.com/kendra/).

1. From the left navigation pane, choose **Indexes** and then choose the index you want to use from the list of indexes.
**Note**  
You can choose to configure or edit your **User access control** settings under **Index settings**. 

1. On the **Getting started** page, choose **Add data source**.

1. On the **Add data source** page, choose **OneDrive connector**, and then choose **Add connector**. If using version 2 (if applicable), choose **OneDrive connector** with the "V2.0" tag.

1. On the **Specify data source details** page, enter the following information:

   1. In **Name and description**, for **Data source name**—Enter a name for your data source. You can include hyphens but not spaces.

   1. (Optional)** Description**—Enter an optional description for your data source.

   1. In **Default language**—Choose a language to filter your documents for the index. Unless you specify otherwise, the language defaults to English. Language specified in the document metadata overrides the selected language.

   1. In **Tags**, for **Add new tag**—Include optional tags to search and filter your resources or track your AWS costs.

   1. Choose **Next**.

1. On the **Define access and security** page, enter the following information:

   1. **OneDrive tenant ID**—Enter the OneDrive tenant ID without the protocol.

   1. **Authorization**—Turn on or off access control list (ACL) information for your documents, if you have an ACL and want to use it for access control. The ACL specifies which documents that users and groups can access. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see [User context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources).

   1. In **Authentication**—Choose between **New** and **Existing**.

   1. 

      1. If you choose **Existing**, select an existing secret for **Select secret**.

      1. If you choose **New**, enter following information in the **New AWS Secrets Manager secret** section:

         1. **Secret name**—A name for your secret. The prefix ‘AmazonKendra-OneDrive-’ is automatically added to your secret name.

         1. For **Client ID** and **Client Secret**—Enter the client ID and client secret.

   1. **Virtual Private Cloud (VPC)**—You can choose to use a VPC. If so, you must add **Subnets** and **VPC security groups**.

   1. **Identity crawler**—Specify whether to turn on Amazon Kendra’s identity crawler. The identity crawler uses the access control list (ACL) information for your documents to filter search results based on the user or their group access to documents. If you have an ACL for your documents and choose to use your ACL, you can then also choose to turn on Amazon Kendra’s identity crawler to configure [user context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources) of search results. Otherwise, if identity crawler is turned off, all documents can be publicly searched. If you want to use access control for your documents and identity crawler is turned off, you can alternatively use the [PutPrincipalMapping](https://docs.aws.amazon.com/kendra/latest/APIReference/API_PutPrincipalMapping.html) API to upload user and group access information for user context filtering.

   1. **IAM role**—Choose an existing IAM role or create a new IAM role to access your repository credentials and index content.
**Note**  
IAM roles used for indexes cannot be used for data sources. If you are unsure if an existing role is used for an index or FAQ, choose **Create a new role** to avoid errors.

   1. Choose **Next**.

1. On the **Configure sync settings** page, enter the following information:

1. 

   1. For **Sync scope**—Choose which users' OneDrive data to index. You can add a maximum of 10 users manually.

   1. For **Additional configurations**—Add regular expression patterns to include or exclude certain content. You can add up to 100 patterns.

   1. **Sync mode**—Choose how you want to update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option.
      + Full sync: Freshly index all content, replacing existing content each time your data source syncs with your index.
      + New, modified sync: Index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.
      + New, modified, deleted sync: Index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

   1. In **Sync run schedule**, for **Frequency**—Choose how often to sync your data source content and update your index.

   1. Choose **Next**.

1. On the **Set field mappings** page, enter the following information:

   1. **Default data source fields**—Select from the Amazon Kendra generated default data source fields that you want to map to your index.

   1. Choose **Next**.

1. On the **Review and create** page, check that the information you have entered is correct and then select **Add data source**. You can also choose to edit your information from this page. Your data source will appear on the **Data sources** page after the data source has been added successfully.

------
#### [ API ]

**To connect Amazon Kendra to OneDrive**

You must specify a JSON of the [data source schema](https://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html#ds-onedrive-schema) using the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html) API. You must provide the following information:
+ **Data source**—Specify the data source type as `ONEDRIVEV2` when you use the [https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) JSON schema. Also specify the data source as `TEMPLATE` when you call the [https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html](https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html) API.
+ **Tenant ID**—Specify the Microsoft 365 tenant ID. You can find your tenant ID in the Properties of your Azure Active Directory Portal or in your OAuth application.
+ **Sync mode**—Specify how Amazon Kendra should update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option. You can choose between:
  + `FORCED_FULL_CRAWL` to freshly index all content, replacing existing content each time your data source syncs with your index.
  + `FULL_CRAWL` to index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source’s mechanism for tracking content changes and index content that changed since the last sync.
  + `CHANGE_LOG` to index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source’s mechanism for tracking content changes and index content that changed since the last sync.
+ **Secret Amazon Resource Name (ARN)**—Provide the Amazon Resource Name (ARN) of a Secrets Manager secret that contains the authentication credentials you created in your OneDrive account.

  If you use OAuth 2.0 authentication, the secret is stored in a JSON structure with the following keys:

  ```
  {
      "clientId": "client ID",
      "clientSecret": "client secret"
  }
  ```
+ **IAM role**—Specify `RoleArn` when you call `CreateDataSource` to provide an IAM role with permissions to access your Secrets Manager secret and to call the required public APIs for the OneDrive connector and Amazon Kendra. For more information, see [IAM roles for OneDrive data sources](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds).

You can also add the following optional features:
+  **Virtual Private Cloud (VPC)**—Specify `VpcConfiguration` when you call `CreateDataSource`. For more information, see [Configuring Amazon Kendra to use an Amazon VPC](vpc-configuration.md).
+  **Inclusion and exclusion filters**—You can specify whether to include or exclude certain files, OneNote sections, and OneNote pages.
**Note**  
Most data sources use regular expression patterns, which are inclusion or exclusion patterns referred to as filters. If you specify an inclusion filter, only content that matches the inclusion filter is indexed. Any document that doesn’t match the inclusion filter isn’t indexed. If you specify an inclusion and exclusion filter, documents that match the exclusion filter are not indexed, even if they match the inclusion filter.
+ **Identity crawler**—Specify whether to turn on Amazon Kendra’s identity crawler. The identity crawler uses the access control list (ACL) information for your documents to filter search results based on the user or their group access to documents. If you have an ACL for your documents and choose to use your ACL, you can then also choose to turn on Amazon Kendra’s identity crawler to configure [user context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources) of search results. Otherwise, if identity crawler is turned off, all documents can be publicly searched. If you want to use access control for your documents and identity crawler is turned off, you can alternatively use the [PutPrincipalMapping](https://docs.aws.amazon.com/kendra/latest/APIReference/API_PutPrincipalMapping.html) API to upload user and group access information for user context filtering.
+  **Field mappings**—You can only map built-in or common index fields for the Amazon Kendra OneDrive connector. Custom field mapping is not available for the OneDrive connector because of API limitations. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html).

For a list of other important JSON keys to configure, see [OneDrive template schema](https://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html#ds-onedrive-schema).

------

## Learn more
<a name="onedrive-learn-more"></a>

To learn more about integrating Amazon Kendra with your OneDrive data source, see:
+ [Announcing the updated Microsoft OneDrive connector (V2) for Amazon Kendra](https://aws.amazon.com/blogs/machine-learning/announcing-the-updated-microsoft-onedrive-connector-v2-for-amazon-kendra/).

## Notes
<a name="onedrive-notes"></a>
+ When Access Control Lists (ACLs) are enabled, the "Sync only new or modified content" option is not available due to OneDrive API limitations. We recommend using "Full sync" or "New, modified, or deleted content sync" modes instead, or disable ACLs if you need to use this sync mode.

# Microsoft SharePoint
<a name="data-source-sharepoint"></a>

SharePoint is a collaborative website building service that you can use to customize web content and create pages, sites, document libraries, and lists. You can use Amazon Kendra to index your SharePoint data source.

Amazon Kendra currently supports SharePoint Online and SharePoint Server (versions 2013, 2016, 2019, and Subscription Edition).

You can connect Amazon Kendra to your SharePoint data source using either the [Amazon Kendra console](https://console.aws.amazon.com/kendra/), the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html) API, or the [SharePointConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_SharePointConfiguration.html) API.

Amazon Kendra has two versions of the SharePoint connector. Supported features of each version include:

**SharePoint Connector V1.0 / [SharePointConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_SharePointConfiguration.html) API**
+ Field mappings
+ User access control
+ Inclusion/exclusion filters
+ Change log
+ Virtual private cloud (VPC)

**SharePoint Connector V2.0 / [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html) API**
+ Field mappings
+ User access control
+ Inclusion/exclusion filters
+ Full and incremental content syncs
+ Virtual private cloud (VPC)

**Note**  
SharePoint connector V1.0 / SharePointConfiguration API ended in 2023. We recommend migrating to or using SharePoint connector V2.0 / TemplateConfiguration API.

For troubleshooting your Amazon Kendra SharePoint data source connector, see [Troubleshooting data sources](troubleshooting-data-sources.md).

**Topics**
+ [SharePoint connector V1.0](data-source-v1-sharepoint.md)
+ [SharePoint connector V2.0](data-source-v2-sharepoint.md)

# SharePoint connector V1.0
<a name="data-source-v1-sharepoint"></a>

SharePoint is a collaborative website building service that you can use to customize web content and create pages, sites, document libraries, and lists. If you are a SharePoint user, you can use Amazon Kendra to index your SharePoint data source.

**Note**  
SharePoint connector V1.0 / SharePointConfiguration API ended in 2023. We recommend migrating to or using SharePoint connector V2.0 / TemplateConfiguration API.

For troubleshooting your Amazon Kendra SharePoint data source connector, see [Troubleshooting data sources](troubleshooting-data-sources.md).

**Topics**
+ [Supported features](#supported-features-v1-sharepoint)
+ [Prerequisites](#prerequisites-v1-sharepoint)
+ [Connection instructions](#data-source-procedure-v1-sharepoint)
+ [Learn more](#sharepoint-v1-learn-more)

## Supported features
<a name="supported-features-v1-sharepoint"></a>
+ Field mappings
+ User access control
+ Inclusion/exclusion filters
+ Change log
+ Virtual private cloud (VPC)

## Prerequisites
<a name="prerequisites-v1-sharepoint"></a>

Before you can use Amazon Kendra to index your SharePoint data source, make these changes in your SharePoint and AWS accounts.

You are required to provide authentication credentials, which you securely store in an AWS Secrets Manager secret.

**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).

**In SharePoint, make sure you have:**
+ Noted the URL of the SharePoint sites you want to index.
+ **For SharePoint Online:**
  + Noted your basic authentication credentials containing a user name and password with site admin permissions.
  + **Optional: **Generated OAuth 2.0 credentials containing a user name, password, client ID, and client secret.
  + Deactivated **Security Defaults** in your Azure portal using an administrative user. For more information on managing security default settings in the Azure portal, see [Microsoft documentation on how to enable/disable security defaults](https://learn.microsoft.com/en-us/microsoft-365/business-premium/m365bp-conditional-access?view=o365-worldwide&tabs=secdefaults#security-defaults-1).
+ **For SharePoint Server:**
  + Noted your SharePoint Server domain name (the NetBIOS name in your Active Directory). You use this, along with your SharePoint basic authentication user name and password, to connect SharePoint Server to Amazon Kendra.
**Note**  
If you use SharePoint Server and need to convert your Access Control List (ACL) to email format for filtering on user context, provide the LDAP server URL and LDAP search base. Or you can use the directory domain override. The LDAP server URL is the full domain name and the port number (for example, ldap://example.com:389). The LDAP search base are the domain controllers 'example' and 'com'. With the directory domain override, you can use the email domain instead of using LDAP server URL and LDAP search base. For example, the email domain for username@example.com is 'example.com'. You can use this override if you aren't concerned about validating your domain and simply want to use your email domain.
+ Added the following permissions to your SharePoint account:

  **For SharePoint lists**
  + Open Items—View the source of documents with server-side file handlers.
  + View Application Pages—View forms, views, and application pages. Enumerate lists.
  + View Items—View items in lists and documents in document libraries.
  + View Versions—View past versions of a list item or document.

  **For SharePoint websites**
  + Browse Directories—Enumerate files and folders in a website using SharePoint Designer and Web DAV interface.
  + Browse User Information—View information about users of the website.
  + Enumerate Permissions—Enumerate permissions on the website, list, folder, document, or list item.
  + Open—Open a website, list, or folder to access items inside the container.
  + Use Client Integration Features—Use SOAP, WebDAV, the client object model, or SharePoint Designer interfaces to access the website.
  + Use Remote Interfaces—Use features that launch client applications.
  + View Pages—View pages on a website.
+ Checked each document is unique in SharePoint and across other data sources you plan to use for the same index. Each data source that you want to use for an index must not contain the same document across the data sources. Document IDs are global to an index and must be unique per index.

**In your AWS account, make sure you have:**
+ [Created an Amazon Kendra index](https://docs.aws.amazon.com/kendra/latest/dg/create-index.html) and, if using the API, noted the index ID.
+ [Created an IAM role](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds) for your data source and, if using the API, noted the ARN of the IAM role.
**Note**  
If you change your authentication type and credentials, you must update your IAM role to access the correct AWS Secrets Manager secret ID.
+ Stored your SharePoint authentication credentials in an AWS Secrets Manager secret and, if using the API, noted the ARN of the secret.
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).

If you don’t have an existing IAM role or secret, you can use the console to create a new IAM role and Secrets Manager secret when you connect your SharePoint data source to Amazon Kendra. If you are using the API, you must provide the ARN of an existing IAM role and Secrets Manager secret, and an index ID.

## Connection instructions
<a name="data-source-procedure-v1-sharepoint"></a>

To connect Amazon Kendra to your SharePoint data source you must provide details of your SharePoint credentials so that Amazon Kendra can access your data. If you have not yet configured SharePoint for Amazon Kendra see [Prerequisites](#prerequisites-v1-sharepoint).

------
#### [ Console ]

**To connect Amazon Kendra to SharePoint** 

1. Sign in to the AWS Management Console and open the [Amazon Kendra console](https://console.aws.amazon.com/kendra/).

1. From the left navigation pane, choose **Indexes** and then choose the index you want to use from the list of indexes.
**Note**  
You can choose to configure or edit your **User access control** settings under **Index settings**. 

1. On the **Getting started** page, choose **Add data source**.

1. On the **Add data source** page, choose **SharePoint connector v1.0**, and then choose **Add data source**.

1. On the **Specify data source details** page, enter the following information:

   1. In **Name and description**, for **Data source name**—Enter a name for your data source. You can include hyphens but not spaces.

   1. (Optional)** Description**—Enter an optional description for your data source.

   1. In **Default language**—Choose a language to filter your documents for the index. Unless you specify otherwise, the language defaults to English. Language specified in the document metadata overrides the selected language.

   1. In **Tags**, for **Add new tag**—Include optional tags to search and filter your resources or track your AWS costs.

   1. Choose **Next**.

1. On the **Define access and security** page, enter the following information:

   1. For **Hosting method**—Choose between **SharePoint Online** and **SharePoint Server**.

      1. For **SharePoint Online**—Enter the **Site URLs specific to your SharePoint repository**.

      1. For **SharePoint Server**—Choose your **SharePoint version**, enter **Site URLs specific to your SharePoint repository**, and enter the Amazon S3 path to your **SSL certificate location**.

   1. (SharePoint Server only) For **Web proxy**—Enter the **Host name** and **Port number** of your internal SharePoint instance. The port number should be a numeric value between 0 and 65535.

   1. For **Authentication**—Choose between the following options based on your use case:

      1. For SharePoint Online—Choose between **Basic authentication** and **OAuth 2.0 authentication**.

      1. For SharePoint Server—Choose between **None**, **LDAP**, and **Manual**.

   1. For **AWS Secrets Manager secret**—Choose an existing secret or create a new Secrets Manager secret to store your SharePoint authentication credentials. If you choose to create a new secret an AWS Secrets Manager secret window opens. You must enter a **Secret name**. The prefix ‘AmazonKendra-SharePoint-’ is automatically added to your secret name.

   1. Enter following other information in the **Create an AWS Secrets Manager secret window**:

      1. Choose from the following SharePoint Cloud authentication options, based on your use case:

         1. **Basic authentication**—Enter your SharePoint account user name as **User name** and SharePoint account password as **Password**.

         1. **OAuth 2.0 authentication**—Enter your SharePoint account user name as **User name**, SharePoint account password as **Password**, your auto-generated unique SharePoint ID as **Client ID**, and the shared secret string used by both SharePoint and Amazon Kendra as **Client secret**.

      1. Choose from the following SharePoint Server authentication options, based on your use case:

         1. **None**—Enter your SharePoint account user name as **User name**, your SharePoint account password as **Password**, and your **Server Domain Name**.

         1. **LDAP**—Enter your SharePoint account user name as **User name**, SharePoint account password as **Password**, your **LDAP Server Endpoint** (including protocol and port number, for example *ldap://example.com:389*), and your **LDAP Search Base** (for example, *dc=example, dc=com*).

         1. **Manual**—Enter your SharePoint account user name as **User name**, your SharePoint account password as **Password**, and your **Email Domain Override** (email domain of directory user or group).

      1. Choose **Save**.

   1. **Virtual Private Cloud (VPC)**— You must also add **Subnets** and **VPC security groups**.
**Note**  
You must use a VPC if you use SharePoint Server. Amazon VPC is optional for other SharePoint versions.

   1. **IAM role**—Choose an existing IAM role or create a new IAM role to access your repository credentials and index content.
**Note**  
IAM roles used for indexes cannot be used for data sources. If you are unsure if an existing role is used for an index or FAQ, choose **Create a new role** to avoid errors.

   1. Choose **Next**.

1. On the **Configure sync settings** page, enter the following information:

   1. **Use Change log**—Select to update your index instead of syncing all your files.

   1. **Crawl attachments**—Select to crawl attachments.

   1. **Use local group mappings**—Select to make sure that documents are properly filtered.

   1. **Additional configuration**—Add regular expression patterns to include or exclude certain files. You can add up to 100 patterns.

   1. In **Sync run schedule** for **Frequency**—How often Amazon Kendra will sync with your data source.

   1. Choose **Next**.

1. On the **Set field mappings** page, enter the following information:

   1. **Amazon Kendra default field mappings**—Select from the Amazon Kendra generated default data source fields you want to map to your index. 

   1. For **Custom field mappings**—Add custom data source fields to create an index field name to map to and the field data type.

   1. Choose **Next**.

1. On the **Review and create** page, check that the information you have entered is correct and then select **Add data source**. You can also choose to edit your information from this page. Your data source will appear on the **Data sources** page after the data source has been added successfully.

------
#### [ API ]

**To connect Amazon Kendra to SharePoint**

You must specify the following using [SharePointConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_SharePointConfiguration.html) API:
+ **SharePoint Version**—Specify the SharePoint version you use when configuring SharePoint. This is the case no matter if you use SharePoint Server 2013, SharePoint Server 2016, SharePoint Server 2019, or SharePoint Online.
+ **Secret Amazon Resource Name (ARN)**—Provide the Amazon Resource Name (ARN) of a Secrets Manager secret that contains the authentication credentials you created in your SharePoint account.The secret is stored in a JSON structure.

  For **SharePoint Online basic authentication**, the following is the minimum JSON structure that must be in your secret:

  ```
  {
      "userName": "user name",
      "password": "password"
  }
  ```

  For **SharePoint Online OAuth 2.0 authentication**, the following is the minimum JSON structure that must be in your secret:

  ```
  {
      "userName": "SharePoint account user name"",
      "password": "SharePoint account password",
      "clientId": "SharePoint auto-generated unique client id",
      "clientSecret": "secret string shared by Amazon Kendra and SharePoint to authorize communications"
  }
  ```

  For **SharePoint Server basic authentication**, the following is the minimum JSON structure that must be in your secret:

  ```
  {
      "userName": "user name",
      "password": "password",
      "domain": "server domain name"
  }
  ```

  For **SharePoint Server LDAP authentication** (if you need to convert your access control list (ACL) to email format for filtering on user context you can include the LDAP server URL and LDAP search base in your secret), the following is the minimum JSON structure that must be in your secret:

  ```
  {
      "userName": "user name",
      "password": "password",
      "domain": "server domain name"
      "ldapServerUrl": "ldap://example.com:389",
      "ldapSearchBase": "dc=example,dc=com"
  }
  ```

  For **SharePoint Server Manual authentication**, the following is the minimum JSON structure that must be in your secret::

  ```
  {
      "userName": "user name",
      "password": "password",
      "domain": "server domain name",
      "emailDomainOverride": "example.com"
  }
  ```
+ **IAM role**—Specify `RoleArn` when you call `CreateDataSource` to provide an IAM role with permissions to access your Secrets Manager secret and to call the required public APIs for the SharePoint connector and Amazon Kendra. For more information, see [IAM roles for SharePoint data sources](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds).
+  **Amazon VPC**—If you use SharePoint Server, specify `VpcConfiguration` as part of the data source configuration. See [Configuring Amazon Kendra to use a VPC](https://docs.aws.amazon.com/kendra/latest/dg/vpc-configuration.html).

You can also add the following optional features:
+ **Web proxy**—Whether to connect to your SharePoint site URLs via a web proxy. You can use this option only for SharePoint Server.
+ **Indexing lists**—Whether Amazon Kendra should index the contents of attachments to SharePoint list items.
+  **Change log**—Whether Amazon Kendra should use the SharePoint data source change log mechanism to determine if a document must be updated in the index.
**Note**  
Use the change log if you don’t want Amazon Kendra to scan all of the documents. If your change log is large, it might take Amazon Kendra less time to scan the documents in the SharePoint data source than to process the change log. If you are syncing your SharePoint data source with your index for the first time, all documents are scanned. 
+  **Inclusion and exclusion filters**—You can specify whether to include or exclude certain content.
**Note**  
Most data sources use regular expression patterns, which are inclusion or exclusion patterns referred to as filters. If you specify an inclusion filter, only content that matches the inclusion filter is indexed. Any document that doesn’t match the inclusion filter isn’t indexed. If you specify an inclusion and exclusion filter, documents that match the exclusion filter are not indexed, even if they match the inclusion filter.
+  **Field mappings**—Choose to map your SharePoint data source fields to your Amazon Kendra index fields. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html).
**Note**  
The document body field or the document body equivalent for your documents is required in order for Amazon Kendra to search your documents. You must map your document body field name in your data source to the index field name `_document_body`. All other fields are optional.
+  **User context filtering and access control**—Amazon Kendra crawls the access control list (ACL) for your documents, if you have an ACL for your documents. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see [User context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#datasource-context-filter).

------

## Learn more
<a name="sharepoint-v1-learn-more"></a>

To learn more about integrating Amazon Kendra with your SharePoint data source, see:
+ [Getting started with the Amazon Kendra SharePoint Online connector](https://aws.amazon.com/blogs/machine-learning/getting-started-with-the-amazon-kendra-sharepoint-online-connector/)

# SharePoint connector V2.0
<a name="data-source-v2-sharepoint"></a>

SharePoint is a collaborative website building service that you can use to customize web content and create pages, sites, document libraries, and lists. You can use Amazon Kendra to index your SharePoint data source.

Amazon Kendra currently supports SharePoint Online and SharePoint Server (2013, 2016, 2019, and Subscription Edition).

**Note**  
SharePoint connector V1.0 / SharePointConfiguration API ended in 2023. We recommend migrating to or using SharePoint connector V2.0 / TemplateConfiguration API.

For troubleshooting your Amazon Kendra SharePoint data source connector, see [Troubleshooting data sources](troubleshooting-data-sources.md).

**Topics**
+ [Supported features](#supported-features-v2-sharepoint)
+ [Prerequisites](#prerequisites-v2-sharepoint)
+ [Connection instructions](#data-source-procedure-v2-sharepoint)
+ [Notes](#sharepoint-notes)

## Supported features
<a name="supported-features-v2-sharepoint"></a>

Amazon Kendra SharePoint data source connector supports the following features:
+ Field mappings
+ User access control
+ Inclusion/exclusion filters
+ Full and incremental content syncs
+ Virtual private cloud (VPC)

## Prerequisites
<a name="prerequisites-v2-sharepoint"></a>

Before you can use Amazon Kendra to index your SharePoint data source, make these changes in your SharePoint and AWS accounts.

You are required to provide authentication credentials, which you securely store in an AWS Secrets Manager secret.

**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).

**In SharePoint Online, make sure you have:**
+ Copied your SharePoint instance URLs. The format for the host URL you enter is *https://yourdomain.com/sites/mysite*. Your URL must start with `https`.
+ Copied the domain name of your SharePoint instance URL.
+ Noted your basic authentication credentials containing the user name and password with site admin permissions to connect to SharePoint Online.
+ Deactivated **Security Defaults** in your Azure portal using an administrative user. For more information on managing security default settings in the Azure portal, see [Microsoft documentation on how to enable/disable security defaults](https://learn.microsoft.com/en-us/microsoft-365/business-premium/m365bp-conditional-access?view=o365-worldwide&tabs=secdefaults#security-defaults-1).
+ Deactivated multi-factor authentication (MFA) in your SharePoint account, so that Amazon Kendra is not blocked from crawling your SharePoint content.
+ **If using authentication type other than Basic authentication:** Copied the tenant ID of your SharePoint instance. For details on how to find your tenant ID, see [Find your Microsoft 365 tenant ID](https://learn.microsoft.com/en-us/sharepoint/find-your-office-365-tenant-id).
+ If you need to migrate to cloud user authentication with Microsoft Entra, see [Microsoft documentation on cloud authentication](https://learn.microsoft.com/en-us/entra/identity/hybrid/connect/migrate-from-federation-to-cloud-authentication).
+ **For OAuth 2.0 authentication and OAuth 2.0 refresh token authentication:** Noted your **Basic authentication** credentials containing the user name and password you use to connect to SharePoint Online and the client ID and client secret generated after registering SharePoint with Azure AD.
  + **If you're not using ACL**, added the following permissions:    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/data-source-v2-sharepoint.html)
**Note**  
Note.Read.All and Sites.Read.All are required only if you want to crawl OneNote Documents.  
If you want to crawl specific sites, the permission can be restricted to specific sites rather than all sites available in the domain. You configure **Sites.Selected (Application)** permission. With this API permission, you need to set access permission on every site explicitly through Microsoft Graph API. For more information, see [Microsoft's blog on Sites.Selected permissions](https://techcommunity.microsoft.com/t5/microsoft-sharepoint-blog/develop-applications-that-use-sites-selected-permissions-for-spo/ba-p/3790476).
  + **If you're using ACL**, added the following permissions:    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/data-source-v2-sharepoint.html)
**Note**  
GroupMember.Read.All and User.Read.All are required only if **Identity crawler** is activated.  
If you want to crawl specific sites, the permission can be restricted to specific sites rather than all sites available in the domain. You configure **Sites.Selected (Application)** permission. With this API permission, you need to set access permission on every site explicitly through Microsoft Graph API. For more information, see [Microsoft's blog on Sites.Selected permissions](https://techcommunity.microsoft.com/t5/microsoft-sharepoint-blog/develop-applications-that-use-sites-selected-permissions-for-spo/ba-p/3790476).
+ **For Azure AD App-Only authentication:** Private key and the Client ID you generated after registering SharePoint with Azure AD. Also note the X.509 certificate.
  + **If you're not using ACL**, added the following permissions:    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/data-source-v2-sharepoint.html)
**Note**  
If you want to crawl specific sites, the permission can be restricted to specific sites rather than all sites available in the domain. You configure **Sites.Selected (Application)** permission. With this API permission, you need to set access permission on every site explicitly through Microsoft Graph API. For more information, see [Microsoft's blog on Sites.Selected permissions](https://techcommunity.microsoft.com/t5/microsoft-sharepoint-blog/develop-applications-that-use-sites-selected-permissions-for-spo/ba-p/3790476).
  + **If you're using ACL**, added the following permissions:    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/data-source-v2-sharepoint.html)
**Note**  
If you want to crawl specific sites, the permission can be restricted to specific sites rather than all sites available in the domain. You configure **Sites.Selected (Application)** permission. With this API permission, you need to set access permission on every site explicitly through Microsoft Graph API. For more information, see [Microsoft's blog on Sites.Selected permissions](https://techcommunity.microsoft.com/t5/microsoft-sharepoint-blog/develop-applications-that-use-sites-selected-permissions-for-spo/ba-p/3790476).
+ **For SharePoint App-Only authentication:** Noted your SharePoint client ID and client secret generated while granting permission to SharePoint App Only, and your Client ID and Client secret generated when you registered your SharePoint app with Azure AD.
**Note**  
SharePoint App-Only Authentication is *not* supported for SharePoint 2013 version.
  + **(Optional) If you're crawling OneNote documents and using **Identity crawler****, added the following permissions:    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/data-source-v2-sharepoint.html)
**Note**  
No API permissions are required for crawling entities using **Basic authentication** and SharePoint **App-only authentication**.

**In SharePoint Server, make sure you have:**
+ Copied your SharePoint instance URLs and the domain name of your SharePoint URLs. The format for the host URL you enter is *https://yourcompany/sites/mysite*. Your URL must start with `https`.
**Note**  
(On-premise/server) Amazon Kendra checks if the endpoint information included in AWS Secrets Manager is the same the endpoint information specified in your data source configuration details. This helps protect against the [confused deputy problem](https://docs.aws.amazon.com/IAM/latest/UserGuide/confused-deputy.html), which is a security issue where a user doesn’t have permission to perform an action but uses Amazon Kendra as a proxy to access the configured secret and perform the action. If you later change your endpoint information, you must create a new secret to sync this information.
+ Deactivated multi-factor authentication (MFA) in your SharePoint account, so that Amazon Kendra is not blocked from crawling your SharePoint content.
+ If using **SharePoint App-Only authentication** for access control:
  + Copied the SharePoint client ID generated when you registered App Only at Site Level. Client ID format is ClientId@TenantId. For example, *ffa956f3-8f89-44e7-b0e4-49670756342c@888d0b57-69f1-4fb8-957f-e1f0bedf82fe*.
  + Copied the SharePoint client secret generated when you registered App Only at Site Level.

  **Note: **Because client IDs and client secrets are generated for single sites only when you register SharePoint Server for App Only authentication, only one site URL is supported for SharePoint App Only authentication.
**Note**  
SharePoint App-Only Authentication is *not* supported for SharePoint 2013 version.
+ If using **Email ID with Custom Domain** for access control:
  + Noted your custom email domain value—for example: "*amazon.com*".
+ If using **Email ID with Domain from IDP** authorization, copied your:
  + LDAP Server Endpoint (endpoint of LDAP server including protocol and port number). For example: *ldap://example.com:389*.
  + LDAP Search Base (search base of the LDAP user). For example: *CN=Users,DC=sharepoint,DC=com*.
  + LDAP user name and LDAP password.
+ Either configured NTLM authentication credentials **or** configured Kerberos authentication credentials containing a user name (SharePoint account user name) and password (SharePoint account password).

**In your AWS account, make sure you have:**
+ [Created an Amazon Kendra index](https://docs.aws.amazon.com/kendra/latest/dg/create-index.html) and, if using the API, noted the index ID.
+ [Created an IAM role](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds) for your data source and, if using the API, noted the ARN of the IAM role.
**Note**  
If you change your authentication type and credentials, you must update your IAM role to access the correct AWS Secrets Manager secret ID.
+ Stored your SharePoint authentication credentials in an AWS Secrets Manager secret and, if using the API, noted the ARN of the secret.
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).

If you don’t have an existing IAM role or secret, you can use the console to create a new IAM role and Secrets Manager secret when you connect your SharePoint data source to Amazon Kendra. If you are using the API, you must provide the ARN of an existing IAM role and Secrets Manager secret, and an index ID.

## Connection instructions
<a name="data-source-procedure-v2-sharepoint"></a>

To connect Amazon Kendra to your SharePoint data source, you must provide details of your SharePoint credentials so that Amazon Kendra can access your data. If you have not yet configured SharePoint for Amazon Kendra see [Prerequisites](#prerequisites-v2-sharepoint).

------
#### [ Console: SharePoint Online ]

**To connect Amazon Kendra to SharePoint Online** 

1. Sign in to the AWS Management Console and open the [Amazon Kendra console](https://console.aws.amazon.com/kendra/).

1. From the left navigation pane, choose **Indexes** and then choose the index you want to use from the list of indexes.
**Note**  
You can choose to configure or edit your **User access control** settings under **Index settings**. 

1. On the **Getting started** page, choose **Add data source**.

1. On the **Add data source** page, choose **SharePoint connector**, and then choose **Add connector**. If using version 2 (if applicable), choose **SharePoint connector** with the "V2.0" tag.

1. On the **Specify data source details** page, enter the following information:

   1. In **Name and description**, for **Data source name**—Enter a name for your data source. You can include hyphens but not spaces.

   1. (Optional)** Description**—Enter an optional description for your data source.

   1. In **Default language**—Choose a language to filter your documents for the index. Unless you specify otherwise, the language defaults to English. Language specified in the document metadata overrides the selected language.

   1. In **Tags**, for **Add new tag**—Include optional tags to search and filter your resources or track your AWS costs.

   1. Choose **Next**.

1. On the **Define access and security** page, enter the following information:

   1. **Hosting Method**—Choose **SharePoint Online**.

   1. **Site URLs specific to your SharePoint repository**—Enter the SharePoint host URLs. The format for the host URLs you enter is *https://yourdomain.sharepoint.com/sites/mysite*. The URL must start with `https` protocol. Separate URLs with a new line. You can add up to 100 URLs.

   1. **Domain**—Enter the SharePoint domain. For example, the domain in the URL *https://yourdomain.sharepoint.com/sites/mysite* is *yourdomain*. 

   1. **Authorization**—Turn on or off access control list (ACL) information for your documents, if you have an ACL and want to use it for access control. The ACL specifies which documents that users and groups can access. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see [User context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources).

      You can also choose the type of user ID, whether the user principal name or the user email fetched from the Azure Portal. If you don't specify, email is used by default.

   1. **Authentication**—Choose either basic, OAuth 2.0, Azure AD App-Only authentication, SharePoint App-Only authentication, or OAuth 2.0 refresh token authentication. You either choose an existing AWS Secrets Manager secret to store your authentication credentials, or create a secret.

      1. If using **Basic Authentication**, your secret must include a secret name, SharePoint user name and password.

      1. If using **OAuth 2.0 authentication**, your secret must include the SharePoint tenant ID, secret name, SharePoint user name, password, Azure AD client ID generated when you register SharePoint in Azure AD, and Azure AD client secret generated when you register SharePoint in Azure AD.

      1. If using **Azure AD App-Only authentication**, your secret must include the SharePoint tenant ID, Azure AD self-signed X.509 certificate, secret name, Azure AD client ID generated when you register SharePoint in Azure AD, and private key to authenticate the connector for Azure AD.

      1. If using **SharePoint App-Only authentication**, your secret must include the SharePoint tenant ID, secret name, SharePoint client ID you generated when you registered App Only at Tenant Level, SharePoint client secret generated when your register for App Only at Tenant Level, Azure AD client ID generated when you register SharePoint in Azure AD, and Azure AD client secret generated when you register SharePoint to Azure AD.

         The SharePoint client ID format is *ClientID@TenantId*. For example, *ffa956f3-8f89-44e7-b0e4-49670756342c@888d0b57-69f1-4fb8-957f-e1f0bedf82fe*.

      1. If using **OAuth 2.0 refresh token authentication**, your secret must include the SharePoint tenant ID, secret name, unique Azure AD client ID generated when you register SharePoint in Azure AD, Azure AD client secret generated when you register SharePoint to Azure AD, refresh token generated to connect Amazon Kendra to SharePoint.

   1. **Virtual Private Cloud (VPC)**—You can choose to use a VPC. If so, you must add **Subnets** and **VPC security groups**.

   1. **Identity crawler**—Specify whether to turn on Amazon Kendra’s identity crawler. The identity crawler uses the access control list (ACL) information for your documents to filter search results based on the user or their group access to documents. If you have an ACL for your documents and choose to use your ACL, you can then also choose to turn on Amazon Kendra’s identity crawler to configure [user context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources) of search results. Otherwise, if identity crawler is turned off, all documents can be publicly searched. If you want to use access control for your documents and identity crawler is turned off, you can alternatively use the [PutPrincipalMapping](https://docs.aws.amazon.com/kendra/latest/APIReference/API_PutPrincipalMapping.html) API to upload user and group access information for user context filtering.

      You can also choose to crawl local group mapping or Azure Active Directory group mapping.
**Note**  
AD Group mapping crawling is available only for OAuth 2.0, OAuth 2.0 refresh token, and SharePoint App Only authentication. 

   1. **IAM role**—Choose an existing IAM role or create a new IAM role to access your repository credentials and index content.
**Note**  
IAM roles used for indexes cannot be used for data sources. If you are unsure if an existing role is used for an index or FAQ, choose **Create a new role** to avoid errors.

   1. Choose **Next**.

1. On the **Configure sync settings** page, enter the following information:

   1. In **Sync scope**, choose from the following options :

      1. **Select entities**—Choose the entities you want to crawl. You can select to crawl **All** entities or any combination of **Files**, **Attachments**, **Links** **Pages**, **Events**, **Comments**, and **List Data**.

      1. In **Additional configuration**, for **Entity regex patterns**—Add regular expression patterns for **Links**, **Pages**, and **Events** to include specific entities instead of syncing all your documents.

      1. **Regex patterns**—Add regular expression patterns to include or exclude files by **File path**, **File name**, **File type**, **OneNote section name**, and **OneNote page name** instead of syncing all your documents. You can add up to 100.
**Note**  
OneNote crawling is available only for OAuth 2.0, OAuth 2.0 refresh token, and SharePoint App Only authentication.

   1. For **Sync mode** choose how you want to update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is synced by default.
      + **Full sync**—Sync all content regardless of the previous sync status.
      + **New or modified documents sync**—Sync only new or modified documents.
      + **New, modified, or deleted documents sync**—Sync only new, modified, and deleted documents.

   1. In **Sync run schedule**, for **Frequency**—Choose how often to sync your data source content and update your index.

   1. Choose **Next**.

1. On the **Set field mappings** page, enter the following information:

   1. **Default data source fields**—Select from the Amazon Kendra generated default data source fields that you want to map to your index. 

   1.  **Add field**—To add custom data source fields to create an index field name to map to and the field data type.

   1. Choose **Next**.

1. On the **Review and create** page, check that the information you have entered is correct and then select **Add data source**. You can also choose to edit your information from this page. Your data source will appear on the **Data sources** page after the data source has been added successfully.

------
#### [ Console: SharePoint Server ]

**To connect Amazon Kendra to SharePoint** 

1. Sign in to the AWS Management Console and open the [Amazon Kendra console](https://console.aws.amazon.com/kendra/).

1. From the left navigation pane, choose **Indexes** and then choose the index you want to use from the list of indexes.
**Note**  
You can choose to configure or edit your **User access control** settings under **Index settings**. 

1. On the **Getting started** page, choose **Add data source**.

1. On the **Add data source** page, choose **SharePoint connector**, and then choose **Add connector**. If using version 2 (if applicable), choose **SharePoint connector** with the "V2.0" tag.

1. On the **Specify data source details** page, enter the following information:

   1. In **Name and description**, for **Data source name**—Enter a name for your data source. You can include hyphens but not spaces.

   1. (Optional)** Description**—Enter an optional description for your data source.

   1. In **Default language**—Choose a language to filter your documents for the index. Unless you specify otherwise, the language defaults to English. Language specified in the document metadata overrides the selected language.

   1. In **Tags**, for **Add new tag**—Include optional tags to search and filter your resources or track your AWS costs.

   1. Choose **Next**.

1. On the **Define access and security** page, enter the following information:

   1. **Hosting Method**—Choose **SharePoint Server**.

   1. **Choose SharePoint Version**—Choose either **SharePoint 2013**, **SharePoint 2016**, **SharePoint 2019**, and **SharePoint (Subscription Edition)**.

   1. **Site URLs specific to your SharePoint repository**—Enter the SharePoint host URLs. The format for the host URLs you enter is *https://yourcompany/sites/mysite*. The URL must start with `https` protocol. Separate URLs with a new line. You can add up to 100 URLs.

   1. **Domain**—Enter the SharePoint domain. For example, the domain in the URL *https://yourcompany/sites/mysite* is *yourcompany*

   1. **SSL certificate location**—Enter the Amazon S3 path to your SSL certificate file.

   1. (Optional) For **Web proxy**—Enter the host name (without the `http://` or `https://` protocol), and the port number used by the host URL transport protocol. The numeric value of the port number must be between 0 and 65535.

   1. **Authorization**—Turn on or off access control list (ACL) information for your documents, if you have an ACL and want to use it for access control. The ACL specifies which documents that users and groups can access. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see [User context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources).

      For SharePoint Server you can choose from the following ACL options:

      1. **Email ID with Domain from IDP**—User ID is based on email IDs with their domains fetched from the underlying identity provider (IDP). You provide the IDP connection details in your Secrets Manager secret as part of **Authentication**.

      1. **Email ID with Custom Domain**—User ID is based on the custom email domain value. For example, "*amazon.com*". The email domain will be used to construct the email ID for access control. You must enter your custom email domain.

      1. **Domain\$1User with Domain**—User ID is constructed using a Domain\$1User ID format. You need to provide a valid domain name. For example: *"sharepoint2019"* to construct access control.

   1. For **Authentication**, choose either SharePoint App-Only authentication, NTLM authentication, or Kerberos authentication. You either choose an existing AWS Secrets Manager secret to store your authentication credentials, or create a secret.

      1. If using **NTLM authentication** or **Kerberos authentication**, you secret must include a secret name, SharePoint user name and password.

         If using **Email ID with Domain from IDP**, also enter your:
         +  **LDAP Server Endpoint**—Endpoint of LDAP server, including protocol and port number. For example: *ldap://example.com:389*.
         + **LDAP Search Base**—Search base of LDAP user. For example: *CN=Users,DC=sharepoint,DC=com*.
         + **LDAP username**—Your LDAP user name.
         + **LDAP Password**—Your LDAP password.

      1. If using **SharePoint App-Only authentication**, your secret must include a secret name, SharePoint client ID you generated when you registered App Only at Site Level, SharePoint client secret generated when your register for App Only at Site Level.

         The SharePoint client ID format is *ClientID@TenantId*. For example, *ffa956f3-8f89-44e7-b0e4-49670756342c@888d0b57-69f1-4fb8-957f-e1f0bedf82fe*.

         **Note:** Because client IDs and client secrets are generated for single sites only when you register SharePoint Server for App Only authentication, only one site URL is supported for SharePoint App Only authentication.

         If using **Email ID with Domain from IDP**, also enter your:
         +  **LDAP Server Endpoint**—Endpoint of LDAP server, including protocol and port number. For example: *ldap://example.com:389*.
         + **LDAP Search Base**—Search base of LDAP user. For example: *CN=Users,DC=sharepoint,DC=com*.
         + **LDAP username**—Your LDAP user name.
         + **LDAP Password**—Your LDAP password.

   1. **Virtual Private Cloud (VPC)**—You can choose to use a VPC. If so, you must add **Subnets** and **VPC security groups**.

   1. **Identity crawler**—Specify whether to turn on Amazon Kendra’s identity crawler. The identity crawler uses the access control list (ACL) information for your documents to filter search results based on the user or their group access to documents. If you have an ACL for your documents and choose to use your ACL, you can then also choose to turn on Amazon Kendra’s identity crawler to configure [user context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources) of search results. Otherwise, if identity crawler is turned off, all documents can be publicly searched. If you want to use access control for your documents and identity crawler is turned off, you can alternatively use the [PutPrincipalMapping](https://docs.aws.amazon.com/kendra/latest/APIReference/API_PutPrincipalMapping.html) API to upload user and group access information for user context filtering.

      You can also choose to crawl local group mapping or Azure Active Directory group mapping.
**Note**  
AD Group mapping crawling is available only SharePoint App Only authentication.

   1. **IAM role**—Choose an existing IAM role or create a new IAM role to access your repository credentials and index content.
**Note**  
IAM roles used for indexes cannot be used for data sources. If you are unsure if an existing role is used for an index or FAQ, choose **Create a new role** to avoid errors.

   1. Choose **Next**.

1. On the **Configure sync settings** page, enter the following information:

   1. In **Sync scope**, choose from the following options :

      1. **Select entities**—Choose the entities you want to crawl. You can select to crawl **All** entities or any combination of **Files**, **Attachments**, **Links** **Pages**, **Events**, and **List Data**.

      1. In **Additional configuration**, for **Entity regex patterns**—Add regular expression patterns for **Links**, **Pages**, and **Events** to include specific entities instead of syncing all your documents.

      1. **Regex patterns**—Add regular expression patterns to include or exclude files by **File path** **File name** **File type**, **OneNote section name**, and **OneNote page name** instead of syncing all your documents. You can add up to 100.
**Note**  
OneNote crawling is available only for SharePoint App Only authentication.

   1. **Sync mode**—Choose how you want to update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option.
      + Full sync: Freshly index all content, replacing existing content each time your data source syncs with your index.
      + New, modified sync: Index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.
      + New, modified, deleted sync: Index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

   1. In **Sync run schedule**, for **Frequency**—Choose how often to sync your data source content and update your index.

   1. Choose **Next**.

1. On the **Set field mappings** page, enter the following information:

   1. **Default data source fields**—Select from the Amazon Kendra generated default data source fields that you want to map to your index.

   1.  **Add field**—To add custom data source fields to create an index field name to map to and the field data type.

   1. Choose **Next**.

1. On the **Review and create** page, check that the information you have entered is correct and then select **Add data source**. You can also choose to edit your information from this page. Your data source will appear on the **Data sources** page after the data source has been added successfully.

------
#### [ API ]

**To connect Amazon Kendra to SharePoint**

You must specify a JSON of the [data source schema](https://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html) using the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html) API. You must provide the following information:
+ **Data source**—Specify the data source type as `SHAREPOINTV2` when you use the [https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) JSON schema. Also specify the data source as `TEMPLATE` when you call the [https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html](https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html) API.
+ **Repository Endpoint Metadata**—Specify the `tenantID` `domain` and `siteUrls` of your SharePoint instance.
+ **Sync mode**—Specify how Amazon Kendra should update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option. You can choose between:
  + `FORCED_FULL_CRAWL` to freshly index all content, replacing existing content each time your data source syncs with your index.
  + `FULL_CRAWL` to index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source’s mechanism for tracking content changes and index content that changed since the last sync.
  + `CHANGE_LOG` to index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source’s mechanism for tracking content changes and index content that changed since the last sync.
+ **Identity crawler**—Specify whether to turn on Amazon Kendra’s identity crawler. The identity crawler uses the access control list (ACL) information for your documents to filter search results based on the user or their group access to documents. If you have an ACL for your documents and choose to use your ACL, you can then also choose to turn on Amazon Kendra’s identity crawler to configure [user context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources) of search results. Otherwise, if identity crawler is turned off, all documents can be publicly searched. If you want to use access control for your documents and identity crawler is turned off, you can alternatively use the [PutPrincipalMapping](https://docs.aws.amazon.com/kendra/latest/APIReference/API_PutPrincipalMapping.html) API to upload user and group access information for user context filtering.
**Note**  
Identity crawler is available only when you set `crawlAcl` to `true`.
+ **Repository Additional Properties**—Specify the:
  + (For Azure AD) `s3bucketName` and `s3certificateName` you use to store your Azure AD self-signed X.509 certificate.
  + Authentication type (`auth_Type`) you use, whether `OAuth2`, `OAuth2App`, `OAuth2Certificate`, `Basic`, `OAuth2_RefreshToken`, `NTLM`, and `Kerberos`.
  + Version (`version`) you use, whether `Server` or `Online`. If you use `Server` you can futher specify the `onPremVersion` as `2013`, `2016`, `2019`, or `SubscriptionEdition`.
+ **Secret Amazon Resource Name (ARN)**—Provide the Amazon Resource Name (ARN) of a Secrets Manager secret that contains the authentication credentials you created in your SharePoint account.

  If you use SharePoint Online, you can choose between Basic, OAuth 2.0, Azure AD App-only and SharePoint App Only authentication. The following are the minimum JSON structure that must be in your secret for each authentication option:
  + **Basic authentication**

    ```
    {
        "userName": "SharePoint account user name",
        "password": "SharePoint account password"
    }
    ```
  + **OAuth 2.0 authentication**

    ```
    {
        "clientId": "client id generated when registering SharePoint with Azure AD",
        "clientSecret": "client secret generated when registering SharePoint with Azure AD",
        "userName": "SharePoint account user name",
        "password": "SharePoint account password"
    }
    ```
  + **Azure AD App-Only authentication**

    ```
    {
        "clientId": "client id generated when registering SharePoint with Azure AD",
        "privateKey": "private key to authorize connection with Azure AD"
    }
    ```
  + **SharePoint App-Only authentication**

    ```
    {
        "clientId": "client id generated when registering SharePoint for App Only at Tenant Level",
        "clientSecret": "client secret generated when registering SharePoint for App Only at Tenant Level",
        "adClientId": "client id generated while registering SharePoint with Azure AD",
        "adClientSecret": "client secret generated while registering SharePoint with Azure AD"
    }
    ```
  + **OAuth 2.0 refresh token authentication **

    ```
    {
        "clientId": "client id generated when registering SharePoint with Azure AD",
        "clientSecret": "client secret generated when registering SharePoint with Azure AD",
        "refreshToken": "refresh token generated to connect to SharePoint"
    }
    ```

  If you use SharePoint Server, you can choose between SharePoint App-Only authentication, NTLM authentication, and Kerberos authentication. The following are the minimum JSON structure that must be in your secret for each authentication option:
  + ** SharePoint App-Only authentication **

    ```
    {
        "siteUrlsHash": "Hash representation of SharePoint site URLs",
        "clientId": "client id generated when registering SharePoint for App Only at Site Level",
        "clientSecret": "client secret generated when registering SharePoint for App Only at Site Level" 
    }
    ```
  + ** SharePoint App-Only authentication with domain from IDP authorization**

    ```
    {
        "siteUrlsHash": "Hash representation of SharePoint site URLs",
        "clientId": "client id generated when registering SharePoint for App Only at Site Level",
        "clientSecret": "client secret generated when registering SharePoint for App Only at Site Level",
        "ldapUrl": "LDAP Account url eg. ldap://example.com:389",
        "baseDn": "LDAP Account base dn eg. CN=Users,DC=sharepoint,DC=com",
        "ldapUser": "LDAP account user name",
        "ldapPassword": "LDAP account password"
    }
    ```
  + **(Server only) NTLM or Kerberos authentication **

    ```
    {
        "siteUrlsHash": "Hash representation of SharePoint site URLs",
        "userName": "SharePoint account user name",
        "password": "SharePoint account password"
    }
    ```
  + **(Server only) NTLM or Kerberos authentication with domain from IDP authorization**

    ```
    {
        "siteUrlsHash": "Hash representation of SharePoint site URLs",
        "userName": "SharePoint account user name",
        "password": "SharePoint account password",
        "ldapUrl": "ldap://example.com:389",
        "baseDn": "CN=Users,DC=sharepoint,DC=com",
        "ldapUser": "LDAP account user name",
        "ldapPassword": "LDAP account password"
    }
    ```
+ **IAM role**—Specify `RoleArn` when you call `CreateDataSource` to provide an IAM role with permissions to access your Secrets Manager secret and to call the required public APIs for the SharePoint connector and Amazon Kendra. For more information, see [IAM roles for SharePoint data sources](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds).

You can also add the following optional features:
+  **Virtual Private Cloud (VPC)**—Specify `VpcConfiguration` when you call `CreateDataSource`. For more information, see [Configuring Amazon Kendra to use an Amazon VPC](vpc-configuration.md).
+  **Inclusion and exclusion filters**—You can specify whether to include or exclude certain files, OneNotes, and other content.
**Note**  
Most data sources use regular expression patterns, which are inclusion or exclusion patterns referred to as filters. If you specify an inclusion filter, only content that matches the inclusion filter is indexed. Any document that doesn’t match the inclusion filter isn’t indexed. If you specify an inclusion and exclusion filter, documents that match the exclusion filter are not indexed, even if they match the inclusion filter.
+  **Field mappings**—Choose to map your SharePoint data source fields to your Amazon Kendra index fields. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html).
**Note**  
The document body field or the document body equivalent for your documents is required in order for Amazon Kendra to search your documents. You must map your document body field name in your data source to the index field name `_document_body`. All other fields are optional.

For a list of other important JSON keys to configure, see [SharePoint template schema](https://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html#ds-schema-sharepoint).

------

## Notes
<a name="sharepoint-notes"></a>
+ The connector supports custom field mappings only for the **Files** entity.
+ For all SharePoint Server versions, the ACL token must be in lower case. For **Email with Domain from IDP** and **Email ID with Custom Domain** ACL, for example: *user@sharepoint2019.com*. For **Domain\$1User with Domain** ACL, for example: *sharepoint2013\$1user*.
+ When Access Control Lists (ACLs) are enabled, the "Sync only new or modified content" option is not available due to SharePoint API limitations. We recommend using "Full sync" or "New, modified, or deleted content sync" modes instead, or disable ACLs if you need to use this sync mode.
+ The connector does not support change log mode/**New or modified content sync** for SharePoint 2013.
+ If an entity name has a `%` character in its name, the connector will skip these files due to API limitations.
+ OneNote can only be crawled by the connector using a Tenant ID, and with OAuth 2.0, OAuth 2.0 refresh token, or SharePoint App Only authentication activated for SharePoint Online.
+ The connector crawls the first section of a OneNote document using its default name only, even if the document is renamed.
+ The connector crawls links in SharePoint 2019, SharePoint Online, and Subscription Edition, only if **Pages** and **Files** are selected as entities to be crawled in addition to **Links**.
+ The connector crawls links in SharePoint 2013 and SharePoint 2016 if **Links** is selected as an entity to be crawled.
+ The connector crawls list attachments and comments only when **List Data** is also selected as an entity to be crawled.
+ The connector crawls event attachments only when **Events** is also selected as an entity to be crawled.
+ For SharePoint Online version, the ACL token will be in lower case. For example, if **User principal name** is *MaryMajor@domain.com* in Azure portal, the ACL token in the SharePoint Connector will be *marymajor@domain.com*.
+ In **Identity Crawler** for SharePoint Online and Server, if you want to crawl nested groups, you have to activate Local as well as AD Group Crawling.
+ If you're using SharePoint Online, and the User Principal Name in your Azure Portal is a combination of upper case and lower case, the SharePoint API internally converts it to lower case. Because of this, the Amazon Kendra SharePoint connector sets ACL in lower case.

# Microsoft SQL Server
<a name="data-source-ms-sql-server"></a>

**Note**  
Microsoft SQL Server connector remains fully supported for existing customers through May 31, 2026. While this connector is no longer available for new users, current users can continue to use it without interruption. We are continuously evolving our connector portfolio to offer more scalable and customizable solutions. For future integrations, we recommend exploring the Amazon Kendra Custom Connector Framework[1], designed to support a broader range of enterprise use cases with enhanced flexibility.

Microsoft SQL Server is an relational database management system (RDBMS) developed by Microsoft. If you are a Microsoft SQL Server user, you can use Amazon Kendra to index your Microsoft SQL Server data source. The Amazon Kendra Microsoft SQL Server data source connector supports MS SQL Server 2019.

You can connect Amazon Kendra to your Microsoft SQL Server data source using the [Amazon Kendra console](https://console.aws.amazon.com/kendra/) and the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) API.

For troubleshooting your Amazon Kendra Microsoft SQL Server data source connector, see [Troubleshooting data sources](troubleshooting-data-sources.md).

**Topics**
+ [Supported features](#supported-features-ms-sql-server)
+ [Prerequisites](#prerequisites-ms-sql-server)
+ [Connection instructions](#data-source-procedure-ms-sql-server)
+ [Notes](#ms-sql-server-notes)

## Supported features
<a name="supported-features-ms-sql-server"></a>
+ Field mappings
+ User context filtering
+ Inclusion/exclusion filters
+ Full and incremental content syncs
+ Virtual private cloud (VPC)

## Prerequisites
<a name="prerequisites-ms-sql-server"></a>

Before you can use Amazon Kendra to index your Microsoft SQL Server data source, make these changes in your Microsoft SQL Server and AWS accounts.

**In Microsoft SQL Server, make sure you have:**
+ Noted your database user name and password.
**Important**  
As a best practice, provide Amazon Kendra with read-only database credentials.
+ Copied your database host url, port, and instance.
+ Checked each document is unique in Microsoft SQL Server and across other data sources you plan to use for the same index. Each data source that you want to use for an index must not contain the same document across the data sources. Document IDs are global to an index and must be unique per index.

**In your AWS account, make sure you have:**
+ [Created an Amazon Kendra index](https://docs.aws.amazon.com/kendra/latest/dg/create-index.html) and, if using the API, noted the index ID.
+ [Created an IAM role](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds) for your data source and, if using the API, noted the ARN of the IAM role.
**Note**  
If you change your authentication type and credentials, you must update your IAM role to access the correct AWS Secrets Manager secret ID.
+ Stored your Microsoft SQL Server authentication credentials in an AWS Secrets Manager secret and, if using the API, noted the ARN of the secret.
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).

If you don’t have an existing IAM role or secret, you can use the console to create a new IAM role and Secrets Manager secret when you connect your Microsoft SQL Server data source to Amazon Kendra. If you are using the API, you must provide the ARN of an existing IAM role and Secrets Manager secret, and an index ID.

## Connection instructions
<a name="data-source-procedure-ms-sql-server"></a>

To connect Amazon Kendra to your Microsoft SQL Server data source you must provide details of your Microsoft SQL Server credentials so that Amazon Kendra can access your data. If you have not yet configured Microsoft SQL Server for Amazon Kendra see [Prerequisites](#prerequisites-ms-sql-server).

------
#### [ Console ]

**To connect Amazon Kendra to Microsoft SQL Server** 

1. Sign in to the AWS Management Console and open the [Amazon Kendra console](https://console.aws.amazon.com/kendra/).

1. From the left navigation pane, choose **Indexes** and then choose the index you want to use from the list of indexes.
**Note**  
You can choose to configure or edit your **User access control** settings under **Index settings**. 

1. On the **Getting started** page, choose **Add data source**.

1. On the **Add data source** page, choose **Microsoft SQL Server connector**, and then choose **Add connector**. If using version 2 (if applicable), choose **Microsoft SQL Server connector** with the "V2.0" tag.

1. On the **Specify data source details** page, enter the following information:

   1. In **Name and description**, for **Data source name**—Enter a name for your data source. You can include hyphens but not spaces.

   1. (Optional)** Description**—Enter an optional description for your data source.

   1. In **Default language**—Choose a language to filter your documents for the index. Unless you specify otherwise, the language defaults to English. Language specified in the document metadata overrides the selected language.

   1. In **Tags**, for **Add new tag**—Include optional tags to search and filter your resources or track your AWS costs.

   1. Choose **Next**.

1. On the **Define access and security** page, enter the following information:

   1. In **Source**, enter the following information:

   1.  **Host**— Enter the database host name.

   1.  **Port**— Enter the database port.

   1.  **Instance**— Enter the database instance.

   1. **Enable SSL certificate location**—Choose to enter the Amazon S3 path to your SSL certificate file.

   1. In **Authentication**—enter the following information:

      1. **AWS Secrets Manager secret**—Choose an existing secret or create a new Secrets Manager secret to store your Microsoft SQL Server authentication credentials. If you choose to create a new secret an AWS Secrets Manager secret window opens.

        1. Enter following information in the **Create an AWS Secrets Manager secret window**:

           1. **Secret name**—A name for your secret. The prefix ‘AmazonKendra-Microsoft SQL Server-’ is automatically added to your secret name.

           1. For **Database user name**, and **Password**—Enter the authentication credential values you copied from your database. 

        1. Choose **Save**.

   1. **Virtual Private Cloud (VPC)**—You can choose to use a VPC. If so, you must add **Subnets** and **VPC security groups**.

   1. **IAM role**—Choose an existing IAM role or create a new IAM role to access your repository credentials and index content.
**Note**  
IAM roles used for indexes cannot be used for data sources. If you are unsure if an existing role is used for an index or FAQ, choose **Create a new role** to avoid errors.

   1. Choose **Next**.

1. On the **Configure sync settings** page, enter the following information:

   1. In **Sync scope**, choose from the following options :
      + **SQL query**—Enter SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB. Amazon Kendra will crawl all database content that matches your query.
**Note**  
If a table name includes special characters (non alphanumeric) in the name, you must use square brackets around the table name. For example, *select \$1 from [my-database-table]*
      + **Primary key column**—Provide the primary key for the database table. This identifies a table within your database.
      + **Title column**—Provide the name of the document title column within your database table.
      + **Body column**—Provide the name of the document body column within your database table.

   1. In **Additional configuration – *optional***, choose from the following options to sync specific content instead of syncing all files:
      + **Change-detecting columns**—Enter the names of the columns that Amazon Kendra will use to detect content changes. Amazon Kendra will re-index content when there is a change in any of these columns.
      + **User IDs column**—Enter the name of the column which contains User IDs to be allowed access to content.
      + **Groups column**—Enter the name of the column that contains groups to be allowed access to content.
      + **Source URLs column**—Enter the name of the column which contains Source URLs to be indexed.
      + **Time stamps column**—Enter the name of the column which contains time stamps. Amazon Kendra uses time stamp information to detect changes in your content and sync only changed content. 
      + **Time zones column**—Enter the name of the column which contains time zones for the content to be crawled.
      + **Time stamps format**—Enter the name of the column which contains time stamp formats to use to detect content changes and re-sync your content.

   1. **Sync mode**—Choose how you want to update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option.
      + Full sync: Freshly index all content, replacing existing content each time your data source syncs with your index.
      + New, modified sync: Index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.
      + New, modified, deleted sync: Index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

   1. In **Sync run schedule**, for **Frequency**—How often Amazon Kendra will sync with your data source.

   1. Choose **Next**.

1. On the **Set field mappings** page, enter the following information:

   1. Select from the generated default data source fields—**Document IDs**, **Document titles**, and **Source URLs**—you want to map to Amazon Kendra index.

   1.  **Add field**—To add custom data source fields to create an index field name to map to and the field data type.

   1. Choose **Next**.

1. On the **Review and create** page, check that the information you have entered is correct and then select **Add data source**. You can also choose to edit your information from this page. Your data source will appear on the **Data sources** page after the data source has been added successfully.

------
#### [ API ]

**To connect Amazon Kendra to Microsoft SQL Server**

You must specify the following using the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) API:
+ **Data source**—Specify the data source type as `JDBC` when you use the [https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) JSON schema. Also specify the data source as `TEMPLATE` when you call the [https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html](https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html) API.
+ **Database type**—You must specify the database type as `sqlserver`.
+ **SQL query**—Specify SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB. Amazon Kendra will crawl all database content that matches your query.
**Note**  
If a table name includes special characters (non alphanumeric) in the name, you must use square brackets around the table name. For example, *select \$1 from [my-database-table]*
+ **Sync mode**—Specify how Amazon Kendra should update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option. You can choose between:
  + `FORCED_FULL_CRAWL` to freshly index all content, replacing existing content each time your data source syncs with your index.
  + `FULL_CRAWL` to index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source’s mechanism for tracking content changes and index content that changed since the last sync.
  + `CHANGE_LOG` to index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source’s mechanism for tracking content changes and index content that changed since the last sync.
+ **Secret Amazon Resource Name (ARN)**—Provide the Amazon Resource Name (ARN) of an Secrets Manager secret that contains the authentication credentials you created in your Microsoft SQL Server account. The secret is stored in a JSON structure with the following keys:

  ```
  {
      "user name": "database user name",
      "password": "password"
  }
  ```
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).
+ **IAM role**—Specify `RoleArn` when you call `CreateDataSource` to provide an IAM role with permissions to access your Secrets Manager secret and to call the required public APIs for the Microsoft SQL Server connector and Amazon Kendra. For more information, see [IAM roles for Microsoft SQL Server data sources](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds).

You can also add the following optional features:
+  **Virtual Private Cloud (VPC)**—Specify `VpcConfiguration` when you call `CreateDataSource`. For more information, see [Configuring Amazon Kendra to use an Amazon VPC](vpc-configuration.md).
+  **Inclusion and exclusion filters**—You can specify whether to include specific content using user IDs, groups, source URLs, time stamps, and time zones. 
+  **User context filtering and access control**—Amazon Kendra crawls the access control list (ACL) for your documents, if you have an ACL for your documents. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see [User context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#datasource-context-filter).
+  **Field mappings**—Choose to map your Microsoft SQL Server data source fields to your Amazon Kendra index fields. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html).
**Note**  
The document body field or the document body equivalent for your documents is required in order for Amazon Kendra to search your documents. You must map your document body field name in your data source to the index field name `_document_body`. All other fields are optional.

For a list of other important JSON keys to configure, see [Microsoft SQL Server template schema](ds-schemas.md#ds-ms-sql-server-schema).

------

## Notes
<a name="ms-sql-server-notes"></a>
+ Deleted database rows will not be tracked in when Amazon Kendra checks for updated content.
+ The size of field names and values in a row of your database can't exceed 400KB.
+ If you have a large amount of data in your database data source, and do not want Amazon Kendra to index all your database content after the first sync, you can choose to sync only new, modified, or deleted documents.
+ As a best practice, provide Amazon Kendra with read-only database credentials.
+ As a best practice, avoid adding tables with sensitive data or personal identifiable information (PII).

# Microsoft Teams
<a name="data-source-teams"></a>

Microsoft Teams is an enterprise collaboration tool for messaging, meetings and file sharing. If you are a Microsoft Teams user, you can use Amazon Kendra to index your Microsoft Teams data source.

You can connect Amazon Kendra to your Microsoft Teams data source using the [Amazon Kendra console](https://console.aws.amazon.com/kendra/) and the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html) API.

For troubleshooting your Amazon Kendra Microsoft Teams data source connector, see [Troubleshooting data sources](troubleshooting-data-sources.md).

**Topics**
+ [Supported features](#supported-features-teams)
+ [Prerequisites](#prerequisites-teams)
+ [Connection instructions](#data-source-procedure-teams)
+ [Learn more](#teams-learn-more)
+ [Notes](#teams-notes)

## Supported features
<a name="supported-features-teams"></a>
+ Field mappings
+ User access control
+ Inclusion/exclusion filters
+ Full and incremental content syncs
+ Virtual private cloud (VPC)

## Prerequisites
<a name="prerequisites-teams"></a>

Before you can use Amazon Kendra to index your Microsoft Teams data source, make these changes in your Microsoft Teams and AWS accounts.

**In Microsoft Teams, make sure you have:**
+ Created a Microsoft Teams account in Office 365.
+ Noted your Microsoft 365 tenant ID. You can find your tenant ID in the Properties of your Azure Active Directory Portal or in your OAuth application.
+ Configured an OAuth application in the Azure portal and noted the client ID and client secret or client credentials. See [Microsoft tutorial](https://learn.microsoft.com/en-us/power-apps/developer/data-platform/walkthrough-register-app-azure-active-directory) and [Registered app example](https://learn.microsoft.com/en-us/azure/healthcare-apis/register-application) for more information.
**Note**  
When you create or register an app in the Azure portal, the secret ID represents the actual secret value. You must take note or save the actual secret value immediately when creating the secret and app. You can access your secret by selecting the name of your application in the Azure portal and then navigating to the menu option on certificates and secrets.  
You can access your client ID by selecting the name of your application in the Azure portal and then navigating to the overview page. The Application (client) ID is the client ID.
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).
+ Added the necessary permissions. You can choose to add all permissions, or you can limit the scope by selecting fewer permissions based on which entities you'd like to crawl. The following table lists the application level permissions by corresponding entity:    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/data-source-teams.html)
+ Checked each document is unique in Microsoft Teams and across other data sources you plan to use for the same index. Each data source that you want to use for an index must not contain the same document across the data sources. Document IDs are global to an index and must be unique per index.

**In your AWS account, make sure you have:**
+ [Created an Amazon Kendra index](https://docs.aws.amazon.com/kendra/latest/dg/create-index.html) and, if using the API, noted the index ID.
+ [Created an IAM role](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds) for your data source and, if using the API, noted the ARN of the IAM role.
**Note**  
If you change your authentication type and credentials, you must update your IAM role to access the correct AWS Secrets Manager secret ID.
+ Stored your Microsoft Teams authentication credentials in an AWS Secrets Manager secret and, if using the API, noted the ARN of the secret.
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).

If you don’t have an existing IAM role or secret, you can use the console to create a new IAM role and Secrets Manager secret when you connect your Microsoft Teams data source to Amazon Kendra. If you are using the API, you must provide the ARN of an existing IAM role and Secrets Manager secret, and an index ID.

## Connection instructions
<a name="data-source-procedure-teams"></a>

To connect Amazon Kendra to your Microsoft Teams data source, you must provide the necessary details of your Microsoft Teams data source so that Amazon Kendra can access your data. If you have not yet configured Microsoft Teams for Amazon Kendra, see [Prerequisites](#prerequisites-teams).

------
#### [ Console ]

**To connect Amazon Kendra to Microsoft Teams** 

1. Sign in to the AWS Management Console and open the [Amazon Kendra console](https://console.aws.amazon.com/kendra/).

1. From the left navigation pane, choose **Indexes** and then choose the index you want to use from the list of indexes.
**Note**  
You can choose to configure or edit your **User access control** settings under **Index settings**. 

1. On the **Getting started** page, choose **Add data source**.

1. On the **Add data source** page, choose **Microsoft Teams connector**, and then choose **Add connector**. If using version 2 (if applicable), choose **Microsoft Teams connector** with the "V2.0" tag.

1. On the **Specify data source details** page, enter the following information:

   1. In **Name and description**, for **Data source name**—Enter a name for your data source. You can include hyphens but not spaces.

   1. (Optional)** Description**—Enter an optional description for your data source.

   1. In **Default language**—Choose a language to filter your documents for the index. Unless you specify otherwise, the language defaults to English. Language specified in the document metadata overrides the selected language.

   1. In **Tags**, for **Add new tag**—Include optional tags to search and filter your resources or track your AWS costs.

   1. Choose **Next**.

1. On the **Define access and security** page, enter the following information:

   1. **Tenant ID**—Enter your Microsoft 365 tenant ID. You can find your tenant ID in the Properties of your Azure Active Directory Portal or in your OAuth application.

   1. **Authorization**—Turn on or off access control list (ACL) information for your documents, if you have an ACL and want to use it for access control. The ACL specifies which documents that users and groups can access. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see [User context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources).

   1. **AWS Secrets Manager secret**—Choose an existing secret or create a new Secrets Manager secret to store your Microsoft Teams authentication credentials. If you choose to create a new secret an AWS Secrets Manager secret window opens.

      1. Enter following information in the **Create an AWS Secrets Manager secret window**:

         1. **Secret name**—A name for your secret. The prefix ‘AmazonKendra-Microsoft Teams-’ is automatically added to your secret name.

         1. For **Client ID** and **Client secret**—Enter the authentication credentials configured in Microsoft Teams in the Azure portal.

      1. Save and add your secret.

   1. **Payment model**—You can choose a licensing and payment model for your Microsoft Teams account. Model A payment models are restricted to licensing and payment models that require security compliance. Model B payment models are suitable for licensing and payment models that do not require security compliance.

   1. **Virtual Private Cloud (VPC)**—You can choose to use a VPC. If so, you must add **Subnets** and **VPC security groups**.

   1. **Identity crawler**—Specify whether to turn on Amazon Kendra’s identity crawler. The identity crawler uses the access control list (ACL) information for your documents to filter search results based on the user or their group access to documents. If you have an ACL for your documents and choose to use your ACL, you can then also choose to turn on Amazon Kendra’s identity crawler to configure [user context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources) of search results. Otherwise, if identity crawler is turned off, all documents can be publicly searched. If you want to use access control for your documents and identity crawler is turned off, you can alternatively use the [PutPrincipalMapping](https://docs.aws.amazon.com/kendra/latest/APIReference/API_PutPrincipalMapping.html) API to upload user and group access information for user context filtering.

   1. **IAM role**—Choose an existing IAM role or create a new IAM role to access your repository credentials and index content.
**Note**  
IAM roles used for indexes cannot be used for data sources. If you are unsure if an existing role is used for an index or FAQ, choose **Create a new role** to avoid errors.

   1. Choose **Next**.

1. On the **Configure sync settings** page, enter the following information:

   1. **Sync contents**—Select the types of content to crawl. You can choose to crawl chat, teams, and calendar content.

   1. **Additional configuration**—Specify certain calendar start and end dates, user emails, team names, and channel names, attachments, and OneNotes.

   1. **Sync mode**—Choose how you want to update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option.
      + Full sync: Freshly index all content, replacing existing content each time your data source syncs with your index.
      + New, modified sync: Index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.
      + New, modified, deleted sync: Index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

   1. In **Sync run schedule**, for **Frequency**—Choose how often to sync your data source content and update your index.

   1. Choose **Next**.

1. On the **Set field mappings** page, enter the following information:

   1. **Default data source fields**—Select from the Amazon Kendra generated default data source fields you want to map to your index. 

   1.  **Add field**—To add custom data source fields to create an index field name to map to and the field data type.

   1. Choose **Next**.

1. On the **Review and create** page, check that the information you have entered is correct and then select **Add data source**. You can also choose to edit your information from this page. Your data source will appear on the **Data sources** page after the data source has been added successfully.

------
#### [ API ]

**To connect Amazon Kendra to Microsoft Teams**

You must specify a JSON of the [data source schema](https://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html) using the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html) API. You must provide the following information:
+ **Data source**—Specify the data source type as `MSTEAMS` when you use the [https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) JSON schema. Also specify the data source as `TEMPLATE` when you call the [https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html](https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html) API.
+ **Tenant ID**—You can find your tenant ID in the Properties of your Azure Active Directory Portal or in your OAuth application.
+ **Sync mode**—Specify how Amazon Kendra should update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option. You can choose between:
  + `FORCED_FULL_CRAWL` to freshly index all content, replacing existing content each time your data source syncs with your index.
  + `FULL_CRAWL` to index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source’s mechanism for tracking content changes and index content that changed since the last sync.
  + `CHANGE_LOG` to index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source’s mechanism for tracking content changes and index content that changed since the last sync.
+ **Secret Amazon Resource Name (ARN)**—Provide the Amazon Resource Name (ARN) of an Secrets Manager secret that contains the authentication credentials for your Microsoft Teams account. The secret is stored in a JSON structure with the following keys:

  ```
  {
      "clientId": "client ID",
      "clientSecret": "client secret"
  }
  ```
+ **IAM role**—Specify `RoleArn` when you call `CreateDataSource` to provide an IAM role with permissions to access your Secrets Manager secret and to call the required public APIs for the Microsoft Teams connector and Amazon Kendra. For more information, see [IAM roles for Microsoft Teams data sources](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds).

You can also add the following optional features:
+  **Virtual Private Cloud (VPC)**—Specify `VpcConfiguration` when you call `CreateDataSource`. For more information, see [Configuring Amazon Kendra to use an Amazon VPC](vpc-configuration.md).
+ **Document/content types**—Specify whether to crawl chat messages and attachments, channel posts and attachments, channel wikis, calendar content, meeting chats and files and notes.
+ **Calendar content**—Specify a start and end date-time to crawl calendar content.
+ **Inclusion and exclusion filters**—Specify whether to include or exclude certain content in Microsoft Teams. You can include or exclude team names, channel names, file names and file types, user email, OneNote sections, and OneNote pages.
**Note**  
Most data sources use regular expression patterns, which are inclusion or exclusion patterns referred to as filters. If you specify an inclusion filter, only content that matches the inclusion filter is indexed. Any document that doesn’t match the inclusion filter isn’t indexed. If you specify an inclusion and exclusion filter, documents that match the exclusion filter are not indexed, even if they match the inclusion filter.
+ **Identity crawler**—Specify whether to turn on Amazon Kendra’s identity crawler. The identity crawler uses the access control list (ACL) information for your documents to filter search results based on the user or their group access to documents. If you have an ACL for your documents and choose to use your ACL, you can then also choose to turn on Amazon Kendra’s identity crawler to configure [user context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources) of search results. Otherwise, if identity crawler is turned off, all documents can be publicly searched. If you want to use access control for your documents and identity crawler is turned off, you can alternatively use the [PutPrincipalMapping](https://docs.aws.amazon.com/kendra/latest/APIReference/API_PutPrincipalMapping.html) API to upload user and group access information for user context filtering.
+  **Field mappings**—Choose to map your Microsoft Teams data source fields to your Amazon Kendra index fields. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html).
**Note**  
The document body field or the document body equivalent for your documents is required in order for Amazon Kendra to search your documents. You must map your document body field name in your data source to the index field name `_document_body`. All other fields are optional.

For a list of other important JSON keys to configure, see [Microsoft Teams template schema](https://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html#ds-msteams-schema).

------

## Learn more
<a name="teams-learn-more"></a>

To learn more about integrating Amazon Kendra with your Microsoft Teams data source, see:
+ [Intelligently search your organization’s Microsoft Teams data source with the Amazon Kendra connector for Microsoft Teams](https://aws.amazon.com/blogs/machine-learning/intelligently-search-your-organizations-microsoft-teams-data-source-with-the-amazon-kendra-connector-for-microsoft-teams/)

## Notes
<a name="teams-notes"></a>
+ When Access Control Lists (ACLs) are enabled, the "Sync only new or modified content" option is not available due to Microsoft Teams API limitations. We recommend using "Full sync" or "New, modified, or deleted content sync" modes instead, or disable ACLs if you need to use this sync mode.

# Microsoft Yammer
<a name="data-source-yammer"></a>

**Note**  
Microsoft Yammer connector remains fully supported for existing customers through May 31, 2026. While this connector is no longer available for new users, current users can continue to use it without interruption. We are continuously evolving our connector portfolio to offer more scalable and customizable solutions. For future integrations, we recommend exploring the Amazon Kendra Custom Connector Framework[1], designed to support a broader range of enterprise use cases with enhanced flexibility.

Microsoft Yammer is an enterprise collaboration tool for messaging, meetings and file sharing. If you are a Microsoft Yammer user, you can use Amazon Kendra to index your Microsoft Yammer data source.

You can connect Amazon Kendra to your Microsoft Yammer data source using the [Amazon Kendra console](https://console.aws.amazon.com/kendra/) and the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html) API.

For troubleshooting your Amazon Kendra Microsoft Yammer data source connector, see [Troubleshooting data sources](troubleshooting-data-sources.md).

## Supported features
<a name="supported-features-yammer"></a>
+ Field mappings
+ User access control
+ Inclusion/exclusion filters
+ Full and incremental content syncs
+ Virtual private cloud (VPC)

## Prerequisites
<a name="prerequisites-yammer"></a>

Before you can use Amazon Kendra to index your Microsoft Yammer data source, make these changes in your Microsoft Yammer and AWS accounts.

**In Microsoft Yammer, make sure you have:**
+ Created a Microsoft Yammer administrative accountin Office 365.
+ Noted your Microsoft Yammer user name and password.
+ Noted your Microsoft 365 tenant ID. You can find your tenant ID in the Properties of your Azure Active Directory Portal or in your OAuth application.
+ Configured an OAuth application in the Azure portal and noted the client ID and client secret or client credentials. See [Microsoft tutorial](https://learn.microsoft.com/en-us/power-apps/developer/data-platform/walkthrough-register-app-azure-active-directory) and [Registered app example](https://learn.microsoft.com/en-us/azure/healthcare-apis/register-application) for more information.
**Note**  
When you create or register an app in the Azure portal, the secret ID represents the actual secret value. You must take note or save the actual secret value immediately when creating the secret and app. You can access your secret by selecting the name of your application in the Azure portal and then navigating to the menu option on certificates and secrets.  
You can access your client ID by selecting the name of your application in the Azure portal and then navigating to the overview page. The Application (client) ID is the client ID.
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).
+ Checked each document is unique in Microsoft Yammer and across other data sources you plan to use for the same index. Each data source that you want to use for an index must not contain the same document across the data sources. Document IDs are global to an index and must be unique per index.

**In your AWS account, make sure you have:**
+ [Created an Amazon Kendra index](https://docs.aws.amazon.com/kendra/latest/dg/create-index.html) and, if using the API, noted the index ID.
+ [Created an IAM role](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds) for your data source and, if using the API, noted the ARN of the IAM role.
**Note**  
If you change your authentication type and credentials, you must update your IAM role to access the correct AWS Secrets Manager secret ID.
+ Stored your Microsoft Yammer authentication credentials in an AWS Secrets Manager secret and, if using the API, noted the ARN of the secret.
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).

If you don’t have an existing IAM role or secret, you can use the console to create a new IAM role and Secrets Manager secret when you connect your Microsoft Yammer data source to Amazon Kendra. If you are using the API, you must provide the ARN of an existing IAM role and Secrets Manager secret, and an index ID.

## Connection instructions
<a name="data-source-procedure-yammer"></a>

To connect Amazon Kendra to your Microsoft Yammer data source, you must provide the necessary details of your Microsoft Yammer data source so that Amazon Kendra can access your data. If you have not yet configured Microsoft Yammer for Amazon Kendra, see [Prerequisites](#prerequisites-yammer).

------
#### [ Console ]

**To connect Amazon Kendra to Microsoft Yammer** 

1. Sign in to the AWS Management Console and open the [Amazon Kendra console](https://console.aws.amazon.com/kendra/).

1. From the left navigation pane, choose **Indexes** and then choose the index you want to use from the list of indexes.
**Note**  
You can choose to configure or edit your **User access control** settings under **Index settings**. 

1. On the **Getting started** page, choose **Add data source**.

1. On the **Add data source** page, choose **Microsoft Yammer connector**, and then choose **Add connector**. If using version 2 (if applicable), choose **Microsoft Yammer connector** with the "V2.0" tag.

1. On the **Specify data source details** page, enter the following information:

   1. In **Name and description**, for **Data source name**—Enter a name for your data source. You can include hyphens but not spaces.

   1. (Optional)** Description**—Enter an optional description for your data source.

   1. In **Default language**—Choose a language to filter your documents for the index. Unless you specify otherwise, the language defaults to English. Language specified in the document metadata overrides the selected language.

   1. In **Tags**, for **Add new tag**—Include optional tags to search and filter your resources or track your AWS costs.

   1. Choose **Next**.

1. On the **Define access and security** page, enter the following information:

   1. **Authorization**—Turn on or off access control list (ACL) information for your documents, if you have an ACL and want to use it for access control. The ACL specifies which documents that users and groups can access. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see [User context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources).

   1. **AWS Secrets Manager secret**—Choose an existing secret or create a new Secrets Manager secret to store your Microsoft Yammer authentication credentials. If you choose to create a new secret an AWS Secrets Manager secret window opens.

      1. Enter following information in the **Create an AWS Secrets Manager secret window**:

         1. **Secret name**—A name for your secret. The prefix ‘AmazonKendra-Microsoft Yammer-’ is automatically added to your secret name.

         1. For **Username**, **Password**—Enter your Microsoft Yammer user name and password.

         1. For **Client ID**, **Client secret**—Enter the authentication credentials configured in Microsoft Yammer in the Azure portal.

      1. Save and add your secret.

   1. **Virtual Private Cloud (VPC)**—You can choose to use a VPC. If so, you must add **Subnets** and **VPC security groups**.

   1. **Identity crawler**—Specify whether to turn on Amazon Kendra’s identity crawler. The identity crawler uses the access control list (ACL) information for your documents to filter search results based on the user or their group access to documents. If you have an ACL for your documents and choose to use your ACL, you can then also choose to turn on Amazon Kendra’s identity crawler to configure [user context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources) of search results. Otherwise, if identity crawler is turned off, all documents can be publicly searched. If you want to use access control for your documents and identity crawler is turned off, you can alternatively use the [PutPrincipalMapping](https://docs.aws.amazon.com/kendra/latest/APIReference/API_PutPrincipalMapping.html) API to upload user and group access information for user context filtering.

   1. **IAM role**—Choose an existing IAM role or create a new IAM role to access your repository credentials and index content.
**Note**  
IAM roles used for indexes cannot be used for data sources. If you are unsure if an existing role is used for an index or FAQ, choose **Create a new role** to avoid errors.

   1. Choose **Next**.

1. On the **Configure sync settings** page, enter the following information:

   1. **Since date**—Specify the date to begin crawling your data in Microsoft Yammer.

   1. **Sync contents**—Select the type of content to crawl. For example, public message, private messages, and attachments.

   1. **Additional configuration**—Specify certain community names you want to crawl, and also use regular expression patterns to include or exclude certain content.

   1. **Sync mode**—Choose how you want to update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option.
      + Full sync: Freshly index all content, replacing existing content each time your data source syncs with your index.
      + New, modified sync: Index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.
      + New, modified, deleted sync: Index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

   1. In **Sync run schedule**, for **Frequency**—Choose how often to sync your data source content and update your index.

   1. Choose **Next**.

1. On the **Set field mappings** page, enter the following information:

   1. **Default data source fields**—Select from the Amazon Kendra generated default data source fields you want to map to your index. 

   1.  **Add field**—To add custom data source fields to create an index field name to map to and the field data type.

   1. Choose **Next**.

1. On the **Review and create** page, check that the information you have entered is correct and then select **Add data source**. You can also choose to edit your information from this page. Your data source will appear on the **Data sources** page after the data source has been added successfully.

------
#### [ API  ]

**To connect Amazon Kendra to Microsoft Yammer**

You must specify a JSON of the [data source schema](https://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html) using the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html) API. You must provide the following information:
+ **Data source**—Specify the data source type as `YAMMER` when you use the [https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) JSON schema. Also specify the data source as `TEMPLATE` when you call the [https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html](https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html) API.
+ **Sync mode**—Specify how Amazon Kendra should update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option. You can choose between:
  + `FORCED_FULL_CRAWL` to freshly index all content, replacing existing content each time your data source syncs with your index.
  + `FULL_CRAWL` to index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source’s mechanism for tracking content changes and index content that changed since the last sync.
  + `CHANGE_LOG` to index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source’s mechanism for tracking content changes and index content that changed since the last sync.
+ **Secret Amazon Resource Name (ARN)**—Provide the Amazon Resource Name (ARN) of an Secrets Manager secret that contains the authentication credentials for your Microsoft Yammer account. The secret is stored in a JSON structure with the following keys:

  ```
  {
      "username": "user name",
      "password": "password",
      "clientId": "client ID",
      "clientSecret": "client secret"
  }
  ```
+ **IAM role**—Specify `RoleArn` when you call `CreateDataSource` to provide an IAM role with permissions to access your Secrets Manager secret and to call the required public APIs for the Microsoft Yammer connector and Amazon Kendra. For more information, see [IAM roles for Microsoft Yammer data sources](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds).

You can also add the following optional features:
+  **Virtual Private Cloud (VPC)**—Specify `VpcConfiguration` when you call `CreateDataSource`. For more information, see [Configuring Amazon Kendra to use an Amazon VPC](vpc-configuration.md).
+ **Document/content types**—Specify whether to crawl community content, messages and attachments, and private messages.
+ **Inclusion and exclusion filters**—Specify whether to include or exclude certain content.
**Note**  
Most data sources use regular expression patterns, which are inclusion or exclusion patterns referred to as filters. If you specify an inclusion filter, only content that matches the inclusion filter is indexed. Any document that doesn’t match the inclusion filter isn’t indexed. If you specify an inclusion and exclusion filter, documents that match the exclusion filter are not indexed, even if they match the inclusion filter.
+ **Identity crawler**—Specify whether to turn on Amazon Kendra’s identity crawler. The identity crawler uses the access control list (ACL) information for your documents to filter search results based on the user or their group access to documents. If you have an ACL for your documents and choose to use your ACL, you can then also choose to turn on Amazon Kendra’s identity crawler to configure [user context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources) of search results. Otherwise, if identity crawler is turned off, all documents can be publicly searched. If you want to use access control for your documents and identity crawler is turned off, you can alternatively use the [PutPrincipalMapping](https://docs.aws.amazon.com/kendra/latest/APIReference/API_PutPrincipalMapping.html) API to upload user and group access information for user context filtering.
+  **Field mappings**—Choose to map your Microsoft Yammer data source fields to your Amazon Kendra index fields. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html).
**Note**  
The document body field or the document body equivalent for your documents is required in order for Amazon Kendra to search your documents. You must map your document body field name in your data source to the index field name `_document_body`. All other fields are optional.

For a list of other important JSON keys to configure, see [Microsoft Yammer template schema](https://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html#ds-schema-yammer).

------

## Learn more
<a name="yammer-learn-more"></a>

To learn more about integrating Amazon Kendra with your Microsoft Yammer data source, see:
+ [Announcing the Yammer connector for Amazon Kendra](https://aws.amazon.com/blogs/machine-learning/announcing-the-yammer-connector-for-amazon-kendra/)

# MySQL
<a name="data-source-mysql"></a>

**Note**  
MySQL connector remains fully supported for existing customers through May 31, 2026. While this connector is no longer available for new users, current users can continue to use it without interruption. We are continuously evolving our connector portfolio to offer more scalable and customizable solutions. For future integrations, we recommend exploring the Amazon Kendra Custom Connector Framework[1], designed to support a broader range of enterprise use cases with enhanced flexibility.

MySQL is an open source relational database management systen. If you are a MySQL user, you can use Amazon Kendra to index your MySQL data source. The Amazon Kendra MySQL data source connector supports MySQL 8.0. 21.

You can connect Amazon Kendra to your MySQL data source using the [Amazon Kendra console](https://console.aws.amazon.com/kendra/) and the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) API.

For troubleshooting your Amazon Kendra MySQL data source connector, see [Troubleshooting data sources](troubleshooting-data-sources.md).

**Topics**
+ [Supported features](#supported-features-mysql)
+ [Prerequisites](#prerequisites-mysql)
+ [Connection instructions](#data-source-procedure-mysql)
+ [Notes](#mysql-notes)

## Supported features
<a name="supported-features-mysql"></a>
+ Field mappings
+ User context filtering
+ Inclusion/exclusion filters
+ Full and incremental content syncs
+ Virtual private cloud (VPC)

## Prerequisites
<a name="prerequisites-mysql"></a>

Before you can use Amazon Kendra to index your MySQL data source, make these changes in your MySQL and AWS accounts.

**In MySQL, make sure you have:**
+ Noted your database user name and password.
**Important**  
As a best practice, provide Amazon Kendra with read-only database credentials.
+ Copied your database host url, port, and instance.
+ Checked each document is unique in MySQL and across other data sources you plan to use for the same index. Each data source that you want to use for an index must not contain the same document across the data sources. Document IDs are global to an index and must be unique per index.

**In your AWS account, make sure you have:**
+ [Created an Amazon Kendra index](https://docs.aws.amazon.com/kendra/latest/dg/create-index.html) and, if using the API, noted the index ID.
+ [Created an IAM role](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds) for your data source and, if using the API, noted the ARN of the IAM role.
**Note**  
If you change your authentication type and credentials, you must update your IAM role to access the correct AWS Secrets Manager secret ID.
+ Stored your MySQL authentication credentials in an AWS Secrets Manager secret and, if using the API, noted the ARN of the secret.
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).

If you don’t have an existing IAM role or secret, you can use the console to create a new IAM role and Secrets Manager secret when you connect your MySQL data source to Amazon Kendra. If you are using the API, you must provide the ARN of an existing IAM role and Secrets Manager secret, and an index ID.

## Connection instructions
<a name="data-source-procedure-mysql"></a>

To connect Amazon Kendra to your MySQL data source you must provide details of your MySQL credentials so that Amazon Kendra can access your data. If you have not yet configured MySQL for Amazon Kendra see [Prerequisites](#prerequisites-mysql).

------
#### [ Console ]

**To connect Amazon Kendra to MySQL** 

1. Sign in to the AWS Management Console and open the [Amazon Kendra console](https://console.aws.amazon.com/kendra/).

1. From the left navigation pane, choose **Indexes** and then choose the index you want to use from the list of indexes.
**Note**  
You can choose to configure or edit your **User access control** settings under **Index settings**. 

1. On the **Getting started** page, choose **Add data source**.

1. On the **Add data source** page, choose **MySQL connector**, and then choose **Add connector**. If using version 2 (if applicable), choose **MySQL connector** with the "V2.0" tag.

1. On the **Specify data source details** page, enter the following information:

   1. In **Name and description**, for **Data source name**—Enter a name for your data source. You can include hyphens but not spaces.

   1. (Optional)** Description**—Enter an optional description for your data source.

   1. In **Default language**—Choose a language to filter your documents for the index. Unless you specify otherwise, the language defaults to English. Language specified in the document metadata overrides the selected language.

   1. In **Tags**, for **Add new tag**—Include optional tags to search and filter your resources or track your AWS costs.

   1. Choose **Next**.

1. On the **Define access and security** page, enter the following information:

   1. In **Source**, enter the following information:

   1.  **Host**— Enter the database host name.

   1.  **Port**— Enter the database port.

   1.  **Instance**— Enter the database instance.

   1. **Enable SSL certificate location**—Choose to enter the Amazon S3 path to your SSL certificate file.

   1. In **Authentication**—enter the following information:

      1. **AWS Secrets Manager secret**—Choose an existing secret or create a new Secrets Manager secret to store your MySQL authentication credentials. If you choose to create a new secret an AWS Secrets Manager secret window opens.

        1. Enter following information in the **Create an AWS Secrets Manager secret window**:

           1. **Secret name**—A name for your secret. The prefix ‘AmazonKendra-MySQL-’ is automatically added to your secret name.

           1. For **Database user name**, and **Password**—Enter the authentication credential values you copied from your database. 

        1. Choose **Save**.

   1. **Virtual Private Cloud (VPC)**—You can choose to use a VPC. If so, you must add **Subnets** and **VPC security groups**.

   1. **IAM role**—Choose an existing IAM role or create a new IAM role to access your repository credentials and index content.
**Note**  
IAM roles used for indexes cannot be used for data sources. If you are unsure if an existing role is used for an index or FAQ, choose **Create a new role** to avoid errors.

   1. Choose **Next**.

1. On the **Configure sync settings** page, enter the following information:

   1. In **Sync scope**, choose from the following options :
      + **SQL query**—Enter SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB. Amazon Kendra will crawl all database content that matches your query.
      + **Primary key column**—Provide the primary key for the database table. This identifies a table within your database.
      + **Title column**—Provide the name of the document title column within your database table.
      + **Body column**—Provide the name of the document body column within your database table.

   1. In **Additional configuration – *optional***, choose from the following options to sync specific content instead of syncing all files:
      + **Change-detecting columns**—Enter the names of the columns that Amazon Kendra will use to detect content changes. Amazon Kendra will re-index content when there is a change in any of these columns.
      + **Users' IDs column**—Enter the name of the column which contains User IDs to be allowed access to content.
      + **Groups column**—Enter the name of the column that contains groups to be allowed access to content.
      + **Source URLs column**—Enter the name of the column which contains Source URLs to be indexed.
      + **Time stamps column**—Enter the name of the column which contains time stamps. Amazon Kendra uses time stamp information to detect changes in your content and sync only changed content. 
      + **Time zones column**—Enter the name of the column which contains time zones for the content to be crawled.
      + **Time stamps format**—Enter the name of the column which contains time stamp formats to use to detect content changes and re-sync your content.

   1. **Sync mode**—Choose how you want to update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option.
      + Full sync: Freshly index all content, replacing existing content each time your data source syncs with your index.
      + New, modified sync: Index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.
      + New, modified, deleted sync: Index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

   1. In **Sync run schedule**, for **Frequency**—How often Amazon Kendra will sync with your data source.

   1. Choose **Next**.

1. On the **Set field mappings** page, enter the following information:

   1. Select from the generated default data source fields—**Document IDs**, **Document titles**, and **Source URLs**—you want to map to Amazon Kendra index.

   1.  **Add field**—To add custom data source fields to create an index field name to map to and the field data type.

   1. Choose **Next**.

1. On the **Review and create** page, check that the information you have entered is correct and then select **Add data source**. You can also choose to edit your information from this page. Your data source will appear on the **Data sources** page after the data source has been added successfully.

------
#### [ API ]

**To connect Amazon Kendra to MySQL**

You must specify the following using the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) API:
+ **Data source**—Specify the data source type as `JDBC` when you use the [https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) JSON schema. Also specify the data source as `TEMPLATE` when you call the [https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html](https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html) API.
+ **Database type**—You must specify the database type as `mySql`.
+ **SQL query**—Specify SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB. Amazon Kendra will crawl all database content that matches your query.
+ **Sync mode**—Specify how Amazon Kendra should update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option. You can choose between:
  + `FORCED_FULL_CRAWL` to freshly index all content, replacing existing content each time your data source syncs with your index.
  + `FULL_CRAWL` to index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source’s mechanism for tracking content changes and index content that changed since the last sync.
  + `CHANGE_LOG` to index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source’s mechanism for tracking content changes and index content that changed since the last sync.
+ **Secret Amazon Resource Name (ARN)**—Provide the Amazon Resource Name (ARN) of an Secrets Manager secret that contains the authentication credentials you created in your MySQL account. The secret is stored in a JSON structure with the following keys:

  ```
  {
      "user name": "database user name",
      "password": "password"
  }
  ```
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).
+ **IAM role**—Specify `RoleArn` when you call `CreateDataSource` to provide an IAM role with permissions to access your Secrets Manager secret and to call the required public APIs for the MySQL connector and Amazon Kendra. For more information, see [IAM roles for MySQL data sources](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds).

You can also add the following optional features:
+  **Virtual Private Cloud (VPC)**—Specify `VpcConfiguration` when you call `CreateDataSource`. For more information, see [Configuring Amazon Kendra to use an Amazon VPC](vpc-configuration.md).
+  **Inclusion and exclusion filters**—You can specify whether to include specific content using user IDs, groups, source URLs, time stamps, and time zones. 
+  **User context filtering and access control**—Amazon Kendra crawls the access control list (ACL) for your documents, if you have an ACL for your documents. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see [User context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#datasource-context-filter).
+  **Field mappings**—Choose to map your MySQL data source fields to your Amazon Kendra index fields. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html).
**Note**  
The document body field or the document body equivalent for your documents is required in order for Amazon Kendra to search your documents. You must map your document body field name in your data source to the index field name `_document_body`. All other fields are optional.

------

## Notes
<a name="mysql-notes"></a>
+ Deleted database rows will not be tracked in when Amazon Kendra checks for updated content.
+ The size of field names and values in a row of your database can't exceed 400KB.
+ If you have a large amount of data in your database data source, and do not want Amazon Kendra to index all your database content after the first sync, you can choose to sync only new, modified, or deleted documents.
+ As a best practice, provide Amazon Kendra with read-only database credentials.
+ As a best practice, avoid adding tables with sensitive data or personal identifiable information (PII).

# Oracle Database
<a name="data-source-oracle-database"></a>

**Note**  
Oracle Database connector remains fully supported for existing customers through May 31, 2026. While this connector is no longer available for new users, current users can continue to use it without interruption. We are continuously evolving our connector portfolio to offer more scalable and customizable solutions. For future integrations, we recommend exploring the Amazon Kendra Custom Connector Framework[1], designed to support a broader range of enterprise use cases with enhanced flexibility.

Oracle Database is a database management system. If you are a Oracle Database user, you can use Amazon Kendra to index your Oracle Database data source. The Amazon Kendra Oracle Database data source connector supports Oracle Database 18c, 19c, and 21c.

You can connect Amazon Kendra to your Oracle Database data source using the [Amazon Kendra console](https://console.aws.amazon.com/kendra/) and the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) API.

For troubleshooting your Amazon Kendra Oracle Database data source connector, see [Troubleshooting data sources](troubleshooting-data-sources.md).

**Topics**
+ [Supported features](#supported-features-oracle-database)
+ [Prerequisites](#prerequisites-oracle-database)
+ [Connection instructions](#data-source-procedure-oracle-database)
+ [Notes](#oracle-database-notes)

## Supported features
<a name="supported-features-oracle-database"></a>
+ Field mappings
+ User context filtering
+ Inclusion/exclusion filters
+ Full and incremental content syncs
+ Virtual private cloud (VPC)

## Prerequisites
<a name="prerequisites-oracle-database"></a>

Before you can use Amazon Kendra to index your Oracle Database data source, make these changes in your Oracle Database and AWS accounts.

**In Oracle Database, make sure you have:**
+ Noted your database user name and password.
**Important**  
As a best practice, provide Amazon Kendra with read-only database credentials.
+ Copied your database host url, port, and instance.
+ Checked each document is unique in Oracle Database and across other data sources you plan to use for the same index. Each data source that you want to use for an index must not contain the same document across the data sources. Document IDs are global to an index and must be unique per index.

**In your AWS account, make sure you have:**
+ [Created an Amazon Kendra index](https://docs.aws.amazon.com/kendra/latest/dg/create-index.html) and, if using the API, noted the index ID.
+ [Created an IAM role](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds) for your data source and, if using the API, noted the ARN of the IAM role.
**Note**  
If you change your authentication type and credentials, you must update your IAM role to access the correct AWS Secrets Manager secret ID.
+ Stored your Oracle Database authentication credentials in an AWS Secrets Manager secret and, if using the API, noted the ARN of the secret.
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).

If you don’t have an existing IAM role or secret, you can use the console to create a new IAM role and Secrets Manager secret when you connect your Oracle Database data source to Amazon Kendra. If you are using the API, you must provide the ARN of an existing IAM role and Secrets Manager secret, and an index ID.

## Connection instructions
<a name="data-source-procedure-oracle-database"></a>

To connect Amazon Kendra to your Oracle Database data source you must provide details of your Oracle Database credentials so that Amazon Kendra can access your data. If you have not yet configured Oracle Database for Amazon Kendra see [Prerequisites](#prerequisites-oracle-database).

------
#### [ Console ]

**To connect Amazon Kendra to Oracle Database** 

1. Sign in to the AWS Management Console and open the [Amazon Kendra console](https://console.aws.amazon.com/kendra/).

1. From the left navigation pane, choose **Indexes** and then choose the index you want to use from the list of indexes.
**Note**  
You can choose to configure or edit your **User access control** settings under **Index settings**. 

1. On the **Getting started** page, choose **Add data source**.

1. On the **Add data source** page, choose **Oracle Database connector**, and then choose **Add connector**. If using version 2 (if applicable), choose **Oracle Database connector** with the "V2.0" tag.

1. On the **Specify data source details** page, enter the following information:

   1. In **Name and description**, for **Data source name**—Enter a name for your data source. You can include hyphens but not spaces.

   1. (Optional)** Description**—Enter an optional description for your data source.

   1. In **Default language**—Choose a language to filter your documents for the index. Unless you specify otherwise, the language defaults to English. Language specified in the document metadata overrides the selected language.

   1. In **Tags**, for **Add new tag**—Include optional tags to search and filter your resources or track your AWS costs.

   1. Choose **Next**.

1. On the **Define access and security** page, enter the following information:

   1. In **Source**, enter the following information:

   1.  **Host**— Enter the database host name.

   1.  **Port**— Enter the database port.

   1.  **Instance**— Enter the database instance.

   1. **Enable SSL certificate location**—Choose to enter the Amazon S3 path to your SSL certificate file.

   1. In **Authentication**—enter the following information:

      1. **AWS Secrets Manager secret**—Choose an existing secret or create a new Secrets Manager secret to store your Oracle Database authentication credentials. If you choose to create a new secret an AWS Secrets Manager secret window opens.

        1. Enter following information in the **Create an AWS Secrets Manager secret window**:

           1. **Secret name**—A name for your secret. The prefix ‘AmazonKendra-Oracle Database-’ is automatically added to your secret name.

           1. For **Database user name**, and **Password**—Enter the authentication credential values you copied from your database. 

        1. Choose **Save**.

   1. **Virtual Private Cloud (VPC)**—You can choose to use a VPC. If so, you must add **Subnets** and **VPC security groups**.

   1. **IAM role**—Choose an existing IAM role or create a new IAM role to access your repository credentials and index content.
**Note**  
IAM roles used for indexes cannot be used for data sources. If you are unsure if an existing role is used for an index or FAQ, choose **Create a new role** to avoid errors.

   1. Choose **Next**.

1. On the **Configure sync settings** page, enter the following information:

   1. In **Sync scope**, choose from the following options :
      + **SQL query**—Enter SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB. Amazon Kendra will crawl all database content that matches your query.
      + **Primary key column**—Provide the primary key for the database table. This identifies a table within your database.
      + **Title column**—Provide the name of the document title column within your database table.
      + **Body column**—Provide the name of the document body column within your database table.

   1. In **Additional configuration – *optional***, choose from the following options to sync specific content instead of syncing all files:
      + **Change-detecting columns**—Enter the names of the columns that Amazon Kendra will use to detect content changes. Amazon Kendra will re-index content when there is a change in any of these columns.
      + **User IDs column**—Enter the name of the column which contains User IDs to be allowed access to content.
      + **Groups column**—Enter the name of the column that contains groups to be allowed access to content.
      + **Source URLs column**—Enter the name of the column which contains Source URLs to be indexed.
      + **Time stamps column**—Enter the name of the column which contains time stamps. Amazon Kendra uses time stamp information to detect changes in your content and sync only changed content. 
      + **Time zones column**—Enter the name of the column which contains time zones for the content to be crawled.
      + **Time stamps format**—Enter the name of the column which contains time stamp formats to use to detect content changes and re-sync your content.

   1. **Sync mode**—Choose how you want to update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option.
      + Full sync: Freshly index all content, replacing existing content each time your data source syncs with your index.
      + New, modified sync: Index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.
      + New, modified, deleted sync: Index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

   1. In **Sync run schedule**, for **Frequency**—How often Amazon Kendra will sync with your data source.

   1. Choose **Next**.

1. On the **Set field mappings** page, enter the following information:

   1. Select from the generated default data source fields—**Document IDs**, **Document titles**, and **Source URLs**—you want to map to Amazon Kendra index.

   1.  **Add field**—To add custom data source fields to create an index field name to map to and the field data type.

   1. Choose **Next**.

1. On the **Review and create** page, check that the information you have entered is correct and then select **Add data source**. You can also choose to edit your information from this page. Your data source will appear on the **Data sources** page after the data source has been added successfully.

------
#### [ API ]

**To connect Amazon Kendra to Oracle Database**

You must specify the following using the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) API:
+ **Data source**—Specify the data source type as `JDBC` when you use the [https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) JSON schema. Also specify the data source as `TEMPLATE` when you call the [https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html](https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html) API.
+ **Database type**—You must specify the database type as `oracle`.
+ **SQL query**—Specify SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB. Amazon Kendra will crawl all database content that matches your query.
+ **Sync mode**—Specify how Amazon Kendra should update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option. You can choose between:
  + `FORCED_FULL_CRAWL` to freshly index all content, replacing existing content each time your data source syncs with your index.
  + `FULL_CRAWL` to index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source’s mechanism for tracking content changes and index content that changed since the last sync.
  + `CHANGE_LOG` to index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source’s mechanism for tracking content changes and index content that changed since the last sync.
+ **Secret Amazon Resource Name (ARN)**—Provide the Amazon Resource Name (ARN) of an Secrets Manager secret that contains the authentication credentials you created in your Oracle Database account. The secret is stored in a JSON structure with the following keys:

  ```
  {
      "user name": "database user name",
      "password": "password"
  }
  ```
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).
+ **IAM role**—Specify `RoleArn` when you call `CreateDataSource` to provide an IAM role with permissions to access your Secrets Manager secret and to call the required public APIs for the Oracle Database connector and Amazon Kendra. For more information, see [IAM roles for Oracle Database data sources](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds).

You can also add the following optional features:
+  **Virtual Private Cloud (VPC)**—Specify `VpcConfiguration` when you call `CreateDataSource`. For more information, see [Configuring Amazon Kendra to use an Amazon VPC](vpc-configuration.md).
+  **Inclusion and exclusion filters**—You can specify whether to include specific content using user IDs, groups, source URLs, time stamps, and time zones. 
+  **User context filtering and access control**—Amazon Kendra crawls the access control list (ACL) for your documents, if you have an ACL for your documents. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see [User context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#datasource-context-filter).
+  **Field mappings**—Choose to map your Oracle Database data source fields to your Amazon Kendra index fields. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html).
**Note**  
The document body field or the document body equivalent for your documents is required in order for Amazon Kendra to search your documents. You must map your document body field name in your data source to the index field name `_document_body`. All other fields are optional.

For a list of other important JSON keys to configure, see [Oracle Database template schema](ds-schemas.md#ds-oracle-database-schema).

------

## Notes
<a name="oracle-database-notes"></a>
+ Deleted database rows will not be tracked in when Amazon Kendra checks for updated content.
+ The size of field names and values in a row of your database can't exceed 400KB.
+ If you have a large amount of data in your database data source, and do not want Amazon Kendra to index all your database content after the first sync, you can choose to sync only new, modified, or deleted documents.
+ As a best practice, provide Amazon Kendra with read-only database credentials.
+ As a best practice, avoid adding tables with sensitive data or personal identifiable information (PII).

# PostgreSQL
<a name="data-source-postgresql"></a>

**Note**  
PostgreSQL connector remains fully supported for existing customers through May 31, 2026. While this connector is no longer available for new users, current users can continue to use it without interruption. We are continuously evolving our connector portfolio to offer more scalable and customizable solutions. For future integrations, we recommend exploring the Amazon Kendra Custom Connector Framework[1], designed to support a broader range of enterprise use cases with enhanced flexibility.

PostgreSQL is an open source database management system. If you are a PostgreSQL user, you can use Amazon Kendra to index your PostgreSQL data source. The Amazon Kendra PostgreSQL data source connector supports PostgreSQL 9.6.

You can connect Amazon Kendra to your PostgreSQL data source using the [Amazon Kendra console](https://console.aws.amazon.com/kendra/) and the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) API.

For troubleshooting your Amazon Kendra PostgreSQL data source connector, see [Troubleshooting data sources](troubleshooting-data-sources.md).

**Topics**
+ [Supported features](#supported-features-postgresql)
+ [Prerequisites](#prerequisites-postgresql)
+ [Connection instructions](#data-source-procedure-postgresql)
+ [Notes](#postgresql-notes)

## Supported features
<a name="supported-features-postgresql"></a>
+ Field mappings
+ User context filtering
+ Inclusion/exclusion filters
+ Full and incremental content syncs
+ Virtual private cloud (VPC)

## Prerequisites
<a name="prerequisites-postgresql"></a>

Before you can use Amazon Kendra to index your PostgreSQL data source, make these changes in your PostgreSQL and AWS accounts.

**In PostgreSQL, make sure you have:**
+ Noted your database user name and password.
**Important**  
As a best practice, provide Amazon Kendra with read-only database credentials.
+ Copied your database host url, port, and instance.
+ Checked each document is unique in PostgreSQL and across other data sources you plan to use for the same index. Each data source that you want to use for an index must not contain the same document across the data sources. Document IDs are global to an index and must be unique per index.

**In your AWS account, make sure you have:**
+ [Created an Amazon Kendra index](https://docs.aws.amazon.com/kendra/latest/dg/create-index.html) and, if using the API, noted the index ID.
+ [Created an IAM role](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds) for your data source and, if using the API, noted the ARN of the IAM role.
**Note**  
If you change your authentication type and credentials, you must update your IAM role to access the correct AWS Secrets Manager secret ID.
+ Stored your PostgreSQL authentication credentials in an AWS Secrets Manager secret and, if using the API, noted the ARN of the secret.
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).

If you don’t have an existing IAM role or secret, you can use the console to create a new IAM role and Secrets Manager secret when you connect your PostgreSQL data source to Amazon Kendra. If you are using the API, you must provide the ARN of an existing IAM role and Secrets Manager secret, and an index ID.

## Connection instructions
<a name="data-source-procedure-postgresql"></a>

To connect Amazon Kendra to your PostgreSQL data source you must provide details of your PostgreSQL credentials so that Amazon Kendra can access your data. If you have not yet configured PostgreSQL for Amazon Kendra see [Prerequisites](#prerequisites-postgresql).

------
#### [ Console ]

**To connect Amazon Kendra to PostgreSQL** 

1. Sign in to the AWS Management Console and open the [Amazon Kendra console](https://console.aws.amazon.com/kendra/).

1. From the left navigation pane, choose **Indexes** and then choose the index you want to use from the list of indexes.
**Note**  
You can choose to configure or edit your **User access control** settings under **Index settings**. 

1. On the **Getting started** page, choose **Add data source**.

1. On the **Add data source** page, choose **PostgreSQL connector**, and then choose **Add connector**. If using version 2 (if applicable), choose **PostgreSQL connector** with the "V2.0" tag.

1. On the **Specify data source details** page, enter the following information:

   1. In **Name and description**, for **Data source name**—Enter a name for your data source. You can include hyphens but not spaces.

   1. (Optional)** Description**—Enter an optional description for your data source.

   1. In **Default language**—Choose a language to filter your documents for the index. Unless you specify otherwise, the language defaults to English. Language specified in the document metadata overrides the selected language.

   1. In **Tags**, for **Add new tag**—Include optional tags to search and filter your resources or track your AWS costs.

   1. Choose **Next**.

1. On the **Define access and security** page, enter the following information:

   1. In **Source**, enter the following information:

   1.  **Host**— Enter the database host name.

   1.  **Port**— Enter the database port.

   1.  **Instance**— Enter the database instance.

   1. **Enable SSL certificate location**—Choose to enter the Amazon S3 path to your SSL certificate file.

   1. In **Authentication**—enter the following information:

      1. **AWS Secrets Manager secret**—Choose an existing secret or create a new Secrets Manager secret to store your PostgreSQL authentication credentials. If you choose to create a new secret an AWS Secrets Manager secret window opens.

        1. Enter following information in the **Create an AWS Secrets Manager secret window**:

           1. **Secret name**—A name for your secret. The prefix ‘AmazonKendra-PostgreSQL-’ is automatically added to your secret name.

           1. For **Database user name**, and **Password**—Enter the authentication credential values you copied from your database. 

        1. Choose **Save**.

   1. **Virtual Private Cloud (VPC)**—You can choose to use a VPC. If so, you must add **Subnets** and **VPC security groups**.

   1. **IAM role**—Choose an existing IAM role or create a new IAM role to access your repository credentials and index content.
**Note**  
IAM roles used for indexes cannot be used for data sources. If you are unsure if an existing role is used for an index or FAQ, choose **Create a new role** to avoid errors.

   1. Choose **Next**.

1. On the **Configure sync settings** page, enter the following information:

   1. In **Sync scope**, choose from the following options :
      + **SQL query**—Enter SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB. Amazon Kendra will crawl all database content that matches your query.
      + **Primary key column**—Provide the primary key for the database table. This identifies a table within your database.
      + **Title column**—Provide the name of the document title column within your database table.
      + **Body column**—Provide the name of the document body column within your database table.

   1. In **Additional configuration – *optional***, choose from the following options to sync specific content instead of syncing all files:
      + **Change-detecting columns**—Enter the names of the columns that Amazon Kendra will use to detect content changes. Amazon Kendra will re-index content when there is a change in any of these columns.
      + **Users' IDs column**—Enter the name of the column which contains User IDs to be allowed access to content.
      + **Groups column**—Enter the name of the column that contains groups to be allowed access to content.
      + **Source URLs column**—Enter the name of the column which contains Source URLs to be indexed.
      + **Time stamps column**—Enter the name of the column which contains time stamps. Amazon Kendra uses time stamp information to detect changes in your content and sync only changed content. 
      + **Time zones column**—Enter the name of the column which contains time zones for the content to be crawled.
      + **Time stamps format**—Enter the name of the column which contains time stamp formats to use to detect content changes and re-sync your content.

   1. **Sync mode**—Choose how you want to update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option.
      + Full sync: Freshly index all content, replacing existing content each time your data source syncs with your index.
      + New, modified sync: Index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.
      + New, modified, deleted sync: Index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

   1. In **Sync run schedule**, for **Frequency**—How often Amazon Kendra will sync with your data source.

   1. Choose **Next**.

1. On the **Set field mappings** page, enter the following information:

   1. Select from the generated default data source fields—**Document IDs**, **Document titles**, and **Source URLs**—you want to map to Amazon Kendra index.

   1.  **Add field**—To add custom data source fields to create an index field name to map to and the field data type.

   1. Choose **Next**.

1. On the **Review and create** page, check that the information you have entered is correct and then select **Add data source**. You can also choose to edit your information from this page. Your data source will appear on the **Data sources** page after the data source has been added successfully.

------
#### [ API ]

**To connect Amazon Kendra to PostgreSQL**

You must specify the following using the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) API:
+ **Data source**—Specify the data source type as `JDBC` when you use the [https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) JSON schema. Also specify the data source as `TEMPLATE` when you call the [https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html](https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html) API.
+ **Database type**—You must specify the database type as `postgresql`.
+ **SQL query**—Specify SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB. Amazon Kendra will crawl all database content that matches your query.
+ **Sync mode**—Specify how Amazon Kendra should update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option. You can choose between:
  + `FORCED_FULL_CRAWL` to freshly index all content, replacing existing content each time your data source syncs with your index.
  + `FULL_CRAWL` to index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source’s mechanism for tracking content changes and index content that changed since the last sync.
  + `CHANGE_LOG` to index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source’s mechanism for tracking content changes and index content that changed since the last sync.
+ **Secret Amazon Resource Name (ARN)**—Provide the Amazon Resource Name (ARN) of an Secrets Manager secret that contains the authentication credentials you created in your PostgreSQL account. The secret is stored in a JSON structure with the following keys:

  ```
  {
      "user name": "database user name",
      "password": "password"
  }
  ```
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).
+ **IAM role**—Specify `RoleArn` when you call `CreateDataSource` to provide an IAM role with permissions to access your Secrets Manager secret and to call the required public APIs for the PostgreSQL connector and Amazon Kendra. For more information, see [IAM roles for PostgreSQL data sources](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds).

You can also add the following optional features:
+  **Virtual Private Cloud (VPC)**—Specify `VpcConfiguration` when you call `CreateDataSource`. For more information, see [Configuring Amazon Kendra to use an Amazon VPC](vpc-configuration.md).
+  **Inclusion and exclusion filters**—You can specify whether to include specific content using user IDs, groups, source URLs, time stamps, and time zones. 
+  **User context filtering and access control**—Amazon Kendra crawls the access control list (ACL) for your documents, if you have an ACL for your documents. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see [User context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#datasource-context-filter).
+  **Field mappings**—Choose to map your PostgreSQL data source fields to your Amazon Kendra index fields. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html).
**Note**  
The document body field or the document body equivalent for your documents is required in order for Amazon Kendra to search your documents. You must map your document body field name in your data source to the index field name `_document_body`. All other fields are optional.

For a list of other important JSON keys to configure, see [PostgreSQL template schema](ds-schemas.md#ds-postgresql-schema).

------

## Notes
<a name="postgresql-notes"></a>
+ Deleted database rows will not be tracked in when Amazon Kendra checks for updated content.
+ The size of field names and values in a row of your database can't exceed 400KB.
+ If you have a large amount of data in your database data source, and do not want Amazon Kendra to index all your database content after the first sync, you can choose to sync only new, modified, or deleted documents.
+ As a best practice, provide Amazon Kendra with read-only database credentials.
+ As a best practice, avoid adding tables with sensitive data or personal identifiable information (PII).

# Quip
<a name="data-source-quip"></a>

**Note**  
Quip connector remains fully supported for existing customers through May 31, 2026. While this connector is no longer available for new users, current users can continue to use it without interruption. We are continuously evolving our connector portfolio to offer more scalable and customizable solutions. For future integrations, we recommend exploring the Amazon Kendra Custom Connector Framework[1], designed to support a broader range of enterprise use cases with enhanced flexibility.

Quip is a collaborative productivity software that offers real time document-authoring capabilities. You can use Amazon Kendra to index your Quip folders, files, file comments, chatrooms, and attachments.

You can connect Amazon Kendra to your Quip data source using the [Amazon Kendra console](https://console.aws.amazon.com/kendra/) and the [QuipConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_QuipConfiguration.html) API.

For troubleshooting your Amazon Kendra Quip data source connector, see [Troubleshooting data sources](troubleshooting-data-sources.md).

**Topics**
+ [Supported features](#supported-features-quip)
+ [Prerequisites](#prerequisites-quip)
+ [Connection instructions](#data-source-procedure-quip)
+ [Learn more](#quip-learn-more)

## Supported features
<a name="supported-features-quip"></a>

Amazon Kendra Quip data source connector supports the following features:
+ Field mappings
+ User access control
+ Inclusion/exclusion filters
+ Virtual private cloud (VPC)

## Prerequisites
<a name="prerequisites-quip"></a>

Before you can use Amazon Kendra to index your Quip data source, make these changes in your Quip and AWS accounts.

**In Quip, make sure you have:**
+ A Quip account with administrative permissions.
+ Created Quip authentication credentials that include a personal access token. The token is used as your authentication credential stored in an AWS Secrets Manager secret. See [Quip documentation on authentication](https://quip.com/dev/admin/documentation/current#section/Authentication) for more information.
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).
+ Copied your Quip site domain. For example, *https://quip-company.quipdomain.com/browse* where *quipdomain* is the domain.
+ Checked each document is unique in Quip and across other data sources you plan to use for the same index. Each data source that you want to use for an index must not contain the same document across the data sources. Document IDs are global to an index and must be unique per index.

**In your AWS account, make sure you have:**
+ [Created an Amazon Kendra index](https://docs.aws.amazon.com/kendra/latest/dg/create-index.html) and, if using the API, noted the index ID.
+ [Created an IAM role](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds) for your data source and, if using the API, noted the ARN of the IAM role.
**Note**  
If you change your authentication type and credentials, you must update your IAM role to access the correct AWS Secrets Manager secret ID.
+ Stored your Quip authentication credentials in an AWS Secrets Manager secret and, if using the API, noted the ARN of the secret.
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).

If you don’t have an existing IAM role or secret, you can use the console to create a new IAM role and Secrets Manager secret when you connect your Quip data source to Amazon Kendra. If you are using the API, you must provide the ARN of an existing IAM role and Secrets Manager secret, and an index ID.

## Connection instructions
<a name="data-source-procedure-quip"></a>

To connect Amazon Kendra to your Quip data source, you must provide the necessary details of your Quip data source so that Amazon Kendra can access your data. If you have not yet configured Quip for Amazon Kendra, see [Prerequisites](#prerequisites-quip).

------
#### [ Console ]

**To connect Amazon Kendra to Quip** 

1. Sign in to the AWS Management Console and open the [Amazon Kendra console](https://console.aws.amazon.com/kendra/).

1. From the left navigation pane, choose **Indexes** and then choose the index you want to use from the list of indexes.
**Note**  
You can choose to configure or edit your **User access control** settings under **Index settings**. 

1. On the **Getting started** page, choose **Add data source**.

1. On the **Add data source** page, choose **Quip connector**, and then choose **Add connector**. If using version 2 (if applicable), choose **Quip connector** with the "V2.0" tag.

1. On the **Specify data source details** page, enter the following information:

   1. In **Name and description**, for **Data source name**—Enter a name for your data source. You can include hyphens but not spaces.

   1. (Optional)** Description**—Enter an optional description for your data source.

   1. In **Default language**—Choose a language to filter your documents for the index. Unless you specify otherwise, the language defaults to English. Language specified in the document metadata overrides the selected language.

   1. In **Tags**, for **Add new tag**—Include optional tags to search and filter your resources or track your AWS costs.

   1. Choose **Next**.

1. On the **Define access and security** page, enter the following information:

   1. **Quip domain name**—Enter the Quip you copied from your Quip account.

   1. **AWS Secrets Manager secret**—Choose an existing secret or create a new Secrets Manager secret to store your Quip authentication credentials. If you choose to create a new secret an AWS Secrets Manager secret window opens.

      1. Enter following information in the **Create an AWS Secrets Manager secret window**:

         1. **Secret name**—A name for your secret. The prefix ‘AmazonKendra-Quip-’ is automatically added to your secret name.

         1. **Quip token**—Enter the Quip personal access configured Quip.

      1. Add and save your secret.

   1. **Virtual Private Cloud (VPC)**—You can choose to use a VPC. If so, you must add **Subnets** and **VPC security groups**.

   1. **IAM role**—Choose an existing IAM role or create a new IAM role to access your repository credentials and index content.
**Note**  
IAM roles used for indexes cannot be used for data sources. If you are unsure if an existing role is used for an index or FAQ, choose **Create a new role** to avoid errors.

   1. Choose **Next**.

1. On the **Configure sync settings** page, enter the following information:

   1. **Add Quip folder IDs to crawl**—The Quip folder IDs you want to crawl.
**Note**  
To crawl a root folder, including all sub-folders and documents inside it, add the root folder ID. To crawl specific sub-folders, add the specific sub-folder IDs.

   1. **Additional configuration (content types)**—Enter the content types you want to crawl.

   1. **Regex patterns**—Regular expression patterns to include or exclude certain files. You can add up to 100 patterns.

   1. In **Sync run schedule**, for **Frequency**—Choose how often to sync your data source content and update your index

   1. Choose **Next**.

1. On the **Set field mappings** page, enter the following information:

   1. Select from the generated default data source fields you want to map to Amazon Kendra index.

   1.  **Add field**—To add custom data source fields to create an index field name to map to and the field data type.

   1. Choose **Next**.

1. On the **Review and create** page, check that the information you have entered is correct and then select **Add data source**. You can also choose to edit your information from this page. Your data source will appear on the **Data sources** page after the data source has been added successfully.

------
#### [ API ]

**To connect Amazon Kendra to Quip**

You must specify the following using [QuipConfiguration ](https://docs.aws.amazon.com/kendra/latest/APIReference/API_QuipConfiguration.html) API:
+ **Quip site domain**—For example, *https://quip-company.quipdomain.com/browse* where *quipdomain* is the domain.
+ **Secret Amazon Resource Name (ARN)**—Provide the Amazon Resource Name (ARN) of an Secrets Manager secret that contains the authentication credentials for your Quip account. The secret is stored in a JSON structure with the following keys:

  ```
  {
      "accessToken": "token"
  }
  ```
+ **IAM role**—Specify `RoleArn` when you call `CreateDataSource` to provide an IAM role with permissions to access your Secrets Manager secret and to call the required public APIs for the Quip connector and Amazon Kendra. For more information, see [IAM roles for Quip data sources](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds).

You can also add the following optional features:
+ **Virtual Private Cloud (VPC)**—Specify `VpcConfiguration` as part of the data source configuration. See [Configuring Amazon Kendra to use a VPC](https://docs.aws.amazon.com/kendra/latest/dg/vpc-configuration.html).
+  **Inclusion and exclusion filters**—Specify whether to include or exclude certain files.
**Note**  
Most data sources use regular expression patterns, which are inclusion or exclusion patterns referred to as filters. If you specify an inclusion filter, only content that matches the inclusion filter is indexed. Any document that doesn’t match the inclusion filter isn’t indexed. If you specify an inclusion and exclusion filter, documents that match the exclusion filter are not indexed, even if they match the inclusion filter.
+ **Folders**—Specify Quip folders and subfolders you want to index
**Note**  
To crawl a root folder, including all sub-folders and documents inside it, input the root folder ID. To crawl specific sub-folders, add the specific sub-folder IDs.
+ **Attachments, Chat rooms, file comments**—Choose whether to include crawling of attachments, chat rooms content, and file comments.
+  **User context filtering and access control**—Amazon Kendra crawls the access control list (ACL) for your documents, if you have an ACL for your documents. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see [User context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#datasource-context-filter).
+  **Field mappings**—Choose to map your Quip data source fields to your Amazon Kendra index fields. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html).
**Note**  
The document body field or the document body equivalent for your documents is required in order for Amazon Kendra to search your documents. You must map your document body field name in your data source to the index field name `_document_body`. All other fields are optional.

------

## Learn more
<a name="quip-learn-more"></a>

To learn more about integrating Amazon Kendra with your Quip data source, see:
+ [Search for knowledge in Quip documents with intelligent search using the Quip connector for Amazon Kendra](https://aws.amazon.com/blogs/machine-learning/search-for-knowledge-in-quip-documents-with-intelligent-search-using-the-quip-connector-for-amazon-kendra/)

# Salesforce
<a name="data-source-salesforce"></a>

Salesforce is a customer relationship management (CRM) tool for managing support, sales, and marketing teams. You can use Amazon Kendra to index your Salesforce standard objects and even custom objects. 

You can connect Amazon Kendra to your Salesforce data source using either the [Amazon Kendra console](https://console.aws.amazon.com/kendra/), the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html) API, or the [SalesforceConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_SalesforceConfiguration.html) API.

Amazon Kendra has two versions of the Salesforce connector. Supported features of each version include:

**Salesforce connector V1.0 / [SalesforceConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_SalesforceConfiguration.html) API**
+ Field mappings
+ User access control
+ Inclusion/exclusion filters

**Salesforce connector V2.0 / [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html) API**
+ Field mappings
+ User access control
+ Inclusion/exclusion filters
+ Full and incremental content syncs
+ Virtual private cloud (VPC)

**Note**  
Salesforce connector V1.0 / SalesforceConfiguration API ended in 2023. We recommend migrating to or using Salesforce connector V2.0 / TemplateConfiguration API.

For troubleshooting your Amazon Kendra Salesforce data source connector, see [Troubleshooting data sources](troubleshooting-data-sources.md).

**Topics**
+ [Salesforce connector V1.0](data-source-v1-salesforce.md)
+ [Salesforce connector V2.0](data-source-v2-salesforce.md)

# Salesforce connector V1.0
<a name="data-source-v1-salesforce"></a>

Salesforce is a customer relationship management (CRM) tool for managing support, sales, and marketing teams. You can use Amazon Kendra to index your Salesforce standard objects and even custom objects.

**Important**  
Amazon Kendra uses the Salesforce API version 48. The Salesforce API limits the number of requests that you can make per day. If Salesforce exceeds those requests, it retries until it is able to continue.

**Note**  
Salesforce connector V1.0 / SalesforceConfiguration API ended in 2023. We recommend migrating to or using Salesforce connector V2.0 / TemplateConfiguration API.

For troubleshooting your Amazon Kendra Salesforce data source connector, see [Troubleshooting data sources](troubleshooting-data-sources.md).

**Topics**
+ [Supported features](#supported-features-v1-salesforce)
+ [Prerequisites](#prerequisites-v1-salesforce)
+ [Connection instructions](#data-source-procedure-v1-salesforce)

## Supported features
<a name="supported-features-v1-salesforce"></a>

Amazon Kendra Salesforce data source connector supports the following features:
+ Field mappings
+ User access control
+ Inclusion/exclusion filters

## Prerequisites
<a name="prerequisites-v1-salesforce"></a>

Before you can use Amazon Kendra to index your Salesforce data source, make these changes in your Salesforce and AWS accounts.

**In Salesforce, make sure you have:**
+ Created a Salesforce account and have noted the user name and password you use to connect to Salesforce.
+ Created a Salesforce Connected App account with OAuth activated and have copied the consumer key (client ID) and consumer secret (client secret) assigned to your Salesforce Connected App. The client ID and client secret are used as your authentication credentials stored in an AWS Secrets Manager secret. See [Salesforce documentation on Connected Apps](https://help.salesforce.com/s/articleView?id=sf.connected_app_overview.htm&type=5) for more information.
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).
+ Copied the Salesforce security token associated with the account used to connect to Salesforce.
+ Copied the URL of the Salesforce instance that you want to index. Typically, this is *https://<company>.salesforce.com/*. The server must be running a Salesforce connected app.
+ Added credentials to your Salesforce server for a user with read-only access to Salesforce by cloning the ReadOnly profile and then adding the View All Data and Manage Articles permissions. These credentials identify the user making the connection and the Salesforce connected app that Amazon Kendra connects to.
+ Checked each document is unique in Salesforce and across other data sources you plan to use for the same index. Each data source that you want to use for an index must not contain the same document across the data sources. Document IDs are global to an index and must be unique per index.

**In your AWS account, make sure you have:**
+ [Created an Amazon Kendra index](https://docs.aws.amazon.com/kendra/latest/dg/create-index.html) and, if using the API, noted the index ID.
+ [Created an IAM role](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds) for your data source and, if using the API, noted the ARN of the IAM role.
**Note**  
If you change your authentication type and credentials, you must update your IAM role to access the correct AWS Secrets Manager secret ID.
+ Stored your Salesforce authentication credentials in an AWS Secrets Manager secret and, if using the API, noted the ARN of the secret.
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).

If you don’t have an existing IAM role or secret, you can use the console to create a new IAM role and Secrets Manager secret when you connect your Salesforce data source to Amazon Kendra. If you are using the API, you must provide the ARN of an existing IAM role and Secrets Manager secret, and an index ID.

## Connection instructions
<a name="data-source-procedure-v1-salesforce"></a>

To connect Amazon Kendra to your Salesforce data source, you must provide the necessary details of your Salesforce data source so that Amazon Kendra can access your data. If you have not yet configured Salesforce for Amazon Kendra see [Prerequisites](#prerequisites-v1-salesforce).

------
#### [ Console ]

**To connect Amazon Kendra to Salesforce** 

1. Sign in to the AWS Management Console and open the [Amazon Kendra console](https://console.aws.amazon.com/kendra/).

1. From the left navigation pane, choose **Indexes** and then choose the index you want to use from the list of indexes.
**Note**  
You can choose to configure or edit your **User access control** settings under **Index settings**. 

1. On the **Getting started** page, choose **Add data source**.

1. On the **Add data source** page, choose **Salesforce connector V1.0**, and then choose **Add connector**.

1. On the **Specify data source details** page, enter the following information: 

   1. **Data source name**—Enter a name for your data source. You can include hyphens but not spaces.

   1. (Optional)** Description**—Enter an optional description for your data source. 

   1. **Default language**— A language to filter your documents for the index. Unless you specify otherwise, the language defaults to English. Language specified in metadata overrides selected language.

   1. **Add new tag**—Tags to search and filter your resources or track your shared costs.

   1. Choose **Next**.

1. On the **Define access and security** page, enter the following information:

   1. **Salesforce URL**—Enter the instance URL for the Salesforce site that you want to index.

   1. For **Type of authentication**, choose between **Existing** and **New** to store your Salesforce authentication credentials. If you choose to create a new secret an AWS Secrets Manager secret window opens.

      1. Enter following information in the **Create an AWS Secrets Manager secret window**:

        1. **Secret name**—A name for your secret. The prefix ‘AmazonKendra-Salesforce-’ is automatically added to your secret name.

        1. For **User name**, **Password**, **Security token**, **Consumer key**, **Consumer secret**, and **Authentication URL**—Enter the authentication credential values you created in your Salesforce account. 

        1. Choose **Save authentication**.

   1. **IAM role**—Choose an existing IAM role or create a new IAM role to access your repository credentials and index content.
**Note**  
IAM roles used for indexes cannot be used for data sources. If you are unsure if an existing role is used for an index or FAQ, choose **Create a new role** to avoid errors.

   1. Choose **Next**.

1. On the **Configure sync settings** page, enter the following information:

   1. For **Crawl attachments**—Select to crawl all attached objects, articles, and feeds.

   1. For **Standard objects**, **Knowledge articles**, and **Chatter feeds**—Select Salesforce entities or content types you want to crawl.
**Note**  
You must provide configuration information for indexing at least one of standard objects, knowledge articles, or chatter feeds. If you choose to crawl **Knowledge articles** you must specify the types of knowledge articles to index, the name of the articles, and whether to index the standard fields of all knowledge articles or only the fields of a custom article type. If you choose to index custom articles, you must specify the internal name of the article type. You can specify upto 10 article types.

   1. **Frequency**—How often Amazon Kendra will sync with your data source.

   1. Choose **Next**.

1. On the **Set field mappings** page, enter the following information:

   1. For **Standard knowledge article**, **Standard object attachments**, and **Additional suggested field mappings** —Select from the Amazon Kendra generated default data source fields you want to map to your index.
**Note**  
An index mapping to `_document_body` is required. You can't change the mapping between the `Salesforce ID` field and the Amazon Kendra `_document_id `field. 

   1.  **Add field**—To add custom data source fields to create an index field name to map to and the field data type.

   1. Choose **Next**.

1. On the **Review and create** page, check that the information you have entered is correct and then select **Add data source**. You can also choose to edit your information from this page. Your data source will appear on the **Data sources** page after the data source has been added successfully.

------
#### [ API ]

**To connect Amazon Kendra to Salesforce**

You must specify the following the [SalesforceConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_SalesforceConfiguration.html) API:
+ **Server URL**—The instance URL for the Salesforce site that you want to index.
+ **Secret Amazon Resource Name (ARN)**—Provide the Amazon Resource Name (ARN) of an Secrets Manager secret that contains the authentication credentials for your Salesforce account. The secret is stored in a JSON structure with the following keys:

  ```
  {
      "authenticationUrl": "OAUTH endpoint that Amazon Kendra connects to get an OAUTH token",
      "consumerKey": "Application public key generated when you created your Salesforce application",
      "consumerSecret": "Application private key generated when you created your Salesforce application.",
      "password": "Password associated with the user logging in to the Salesforce instance",
      "securityToken": "Token associated with the user account logging in to the Salesforce instance",
      "username": "User name of the user logging in to the Salesforce instance"
  }
  ```
+ **IAM role**—Specify `RoleArn` when you call `CreateDataSource` to provide an IAM role with permissions to access your Secrets Manager secret and to call the required public APIs for the Salesforce connector and Amazon Kendra. For more information, see [IAM roles for Salesforce data sources](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds).
+ You must provide configuration information for indexing at least one of standard objects, knowledge articles, or chatter feeds.
  + **Standard objects**—If you choose to crawl **Standard objects**, you must specify the name of the standard object and the name of the field in the standard object table that contains the document contents.
  + **Knowledge articles**—If you choose to crawl **Knowledge articles**, you must specify the types of knowledge articles to index, the states of the knowledge articles to index, and whether to index the standard fields of all knowledge articles or only the fields of a custom article type.
  + **Chatter feeds**—If you choose to crawl **Chatter feeds**, you must specify the name of the column in the Salesforce FeedItem table that contains the content to index.

You can also add the following optional features:
+  **Inclusion and exclusion filters**—Specify whether to include or exclude certain file attachments.
**Note**  
Most data sources use regular expression patterns, which are inclusion or exclusion patterns referred to as filters. If you specify an inclusion filter, only content that matches the inclusion filter is indexed. Any document that doesn’t match the inclusion filter isn’t indexed. If you specify an inclusion and exclusion filter, documents that match the exclusion filter are not indexed, even if they match the inclusion filter.
+  **Field mappings**—Choose to map your Salesforce data source fields to your Amazon Kendra index fields. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html).
**Note**  
The document body field or the document body equivalent for your documents is required in order for Amazon Kendra to search your documents. You must map your document body field name in your data source to the index field name `_document_body`. All other fields are optional.
+  **User context filtering and access control**—Amazon Kendra crawls the access control list (ACL) for your documents, if you have an ACL for your documents. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see [User context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#datasource-context-filter).

------

# Salesforce connector V2.0
<a name="data-source-v2-salesforce"></a>

Salesforce is a customer relationship management (CRM) tool for managing support, sales, and marketing teams. You can use Amazon Kendra to index your Salesforce standard objects and even custom objects.

The Amazon Kendra Salesforce data source connector supports the following Salesforce editions: Developer Edition and Enterprise Edition.

**Note**  
Salesforce connector V1.0 / SalesforceConfiguration API ended in 2023. We recommend migrating to or using Salesforce connector V2.0 / TemplateConfiguration API.

For troubleshooting your Amazon Kendra Salesforce data source connector, see [Troubleshooting data sources](troubleshooting-data-sources.md).

**Topics**
+ [Supported features](#supported-features-v2-salesforce)
+ [Prerequisites](#prerequisites-v2-salesforce)
+ [Connection instructions](#data-source-procedure-v2-salesforce)
+ [Learn more](#salesforce-v2-learn-more)
+ [Notes](#salesforce-notes)

## Supported features
<a name="supported-features-v2-salesforce"></a>

Amazon Kendra Salesforce data source connector supports the following features:
+ Field mappings
+ User access control
+ Inclusion/exclusion filters
+ Full and incremental content syncs
+ Virtual private cloud (VPC)

## Prerequisites
<a name="prerequisites-v2-salesforce"></a>

Before you can use Amazon Kendra to index your Salesforce data source, make these changes in your Salesforce and AWS accounts.

**In Salesforce, make sure you have:**
+ Created a Salesforce administrative account and have noted the user name and password you use to connect to Salesforce.
+ Copied the Salesforce security token associated with the account used to connect to Salesforce.
+ Created a Salesforce Connected App account with OAuth activated and have copied the consumer key (client ID) and consumer secret (client secret) assigned to your Salesforce Connected App. The client ID and client secret are used as your authentication credentials stored in an AWS Secrets Manager secret. See [Salesforce documentation on Connected Apps](https://help.salesforce.com/s/articleView?id=sf.connected_app_overview.htm&type=5) for more information.
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).
+ Copied the URL of the Salesforce instance that you want to index. Typically, this is *https://<company>.salesforce.com/*. The server must be running a Salesforce connected app.
+ Added credentials to your Salesforce server for a user with read-only access to Salesforce by cloning the ReadOnly profile and then adding the View All Data and Manage Articles permissions. These credentials identify the user making the connection and the Salesforce connected app that Amazon Kendra connects to.
+ Checked each document is unique in Salesforce and across other data sources you plan to use for the same index. Each data source that you want to use for an index must not contain the same document across the data sources. Document IDs are global to an index and must be unique per index.

**In your AWS account, make sure you have:**
+ [Created an Amazon Kendra index](https://docs.aws.amazon.com/kendra/latest/dg/create-index.html) and, if using the API, noted the index ID.
+ [Created an IAM role](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds) for your data source and, if using the API, noted the ARN of the IAM role.
**Note**  
If you change your authentication type and credentials, you must update your IAM role to access the correct AWS Secrets Manager secret ID.
+ Stored your Salesforce authentication credentials in an AWS Secrets Manager secret and, if using the API, noted the ARN of the secret.
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).

If you don’t have an existing IAM role or secret, you can use the console to create a new IAM role and Secrets Manager secret when you connect your Salesforce data source to Amazon Kendra. If you are using the API, you must provide the ARN of an existing IAM role and Secrets Manager secret, and an index ID.

## Connection instructions
<a name="data-source-procedure-v2-salesforce"></a>

To connect Amazon Kendra to your Salesforce data source, you must provide the necessary details of your Salesforce data source so that Amazon Kendra can access your data. If you have not yet configured Salesforce for Amazon Kendra see [Prerequisites](#prerequisites-v2-salesforce).

------
#### [ Console ]

**To connect Amazon Kendra to Salesforce**:

1. Sign in to the AWS Management Console and open the [Amazon Kendra console](https://console.aws.amazon.com/kendra/).

1. From the left navigation pane, choose **Indexes** and then choose the index you want to use from the list of indexes.
**Note**  
You can choose to configure or edit your **User access control** settings under **Index settings**. 

1. On the **Getting started** page, choose **Add data source**.

1. On the **Add data source** page, choose **Salesforce connector**, and then choose **Add connector**. If using version 2 (if applicable), choose **Salesforce connector** with the "V2.0" tag.

1. On the **Specify data source details** page, enter the following information:

   1. In **Name and description**, for **Data source name**—Enter a name for your data source. You can include hyphens but not spaces.

   1. (Optional)** Description**—Enter an optional description for your data source.

   1. In **Default language**—Choose a language to filter your documents for the index. Unless you specify otherwise, the language defaults to English. Language specified in the document metadata overrides the selected language.

   1. In **Tags**, for **Add new tag**—Include optional tags to search and filter your resources or track your AWS costs.

   1. Choose **Next**.

1. On the **Define access and security** page, enter the following information:

   1. **Salesforce URL**—Enter The instance URL for the Salesforce site that you want to index.

   1. **Authorization**—Turn on or off access control list (ACL) information for your documents, if you have an ACL and want to use it for access control. The ACL specifies which documents that users and groups can access. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see [User context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources).

   1. Enter an existing secret or if you create a new secret, an AWS Secrets Manager secret window opens.

      1. **Authentication**—Enter following information in the **Create an AWS Secrets Manager secret window**:

        1. **Secret name**—A name for your secret. The prefix ‘AmazonKendra-Salesforce-’ is automatically added to your secret name.

        1. For **User name**, **Password**, **Security token**, **Consumer key**, **Consumer secret**, and **Authentication URL**—Enter the authentication credential values you generated and downloaded from your Salesforce account. 
**Note**  
If you use Salesforce Developer Edition, use `https://login.salesforce.com/services/oauth2/token` or the My Domain login URL (for example, * https://MyCompany.my.salesforce.com*) as the **Authentication URL**. If you use Salesforce Sandbox Edition, use `https://test.salesforce.com/services/oauth2/token ` or the My Domain login URL (for example, * MyDomainName--SandboxName.sandbox.my.salesforce.com*) as the **Authentication URL**.

        1. Choose **Save authentication**.

   1. **Virtual Private Cloud (VPC)**—You can choose to use a VPC. If so, you must add **Subnets** and **VPC security groups**.

   1. **Identity crawler**—Specify whether to turn on Amazon Kendra’s identity crawler. The identity crawler uses the access control list (ACL) information for your documents to filter search results based on the user or their group access to documents. If you have an ACL for your documents and choose to use your ACL, you can then also choose to turn on Amazon Kendra’s identity crawler to configure [user context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources) of search results. Otherwise, if identity crawler is turned off, all documents can be publicly searched. If you want to use access control for your documents and identity crawler is turned off, you can alternatively use the [PutPrincipalMapping](https://docs.aws.amazon.com/kendra/latest/APIReference/API_PutPrincipalMapping.html) API to upload user and group access information for user context filtering.

   1. **IAM role**—Choose an existing IAM role or create a new IAM role to access your repository credentials and index content.
**Note**  
IAM roles used for indexes cannot be used for data sources. If you are unsure if an existing role is used for an index or FAQ, choose **Create a new role** to avoid errors.

   1. Choose **Next**.

1. On the **Configure sync settings** page, enter the following information:

   1. For **Crawl attachments**—Select to crawl all attached Salesforce objects.

   1. For **Standard objects**, **Standard objects with attachments**, and **Standard object without attachment** and **Knowledge Articles**—Select Salesforce entities or content types you want to crawl.

   1. You must provide configuration information for indexing at least one of standard objects, knowledge articles, or chatter feeds. If you choose to crawl **Knowledge articles** you must specify the types of knowledge articles to index. You can choose published, archived, drafts and attachments.

      **Regex filter**—Specify a regex pattern to include specific catalog items.

1. For **Additional configuration**:
   + **ACL information** All access control lists are included by default. Deselecting an access control list will make all files in that category public.
   + **Regex patterns**—Add regular expression patterns to include or exclude certain files. You can add up to 100 patterns.

   **Sync mode**—Choose how you want to update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option.
   + Full sync: Freshly index all content, replacing existing content each time your data source syncs with your index.
   + New, modified sync: Index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.
   + New, modified, deleted sync: Index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

1. Choose **Next**.

1. On the **Set field mappings** page, enter the following information:

   1. For **Standard knowledge article**, **Standard object attachments**, and **Additional suggested field mappings** —Select from the Amazon Kendra generated default data source fields you want to map to your index.
**Note**  
An index mapping to `_document_body` is required. You can't change the mapping between the `Salesforce ID` field and the Amazon Kendra `_document_id `field. You can map any Salesforce field to the document title or document body Amazon Kendra reserved/default index fields.   
If you map any Salesforce field to Amazon Kendra document title and document body fields, Amazon Kendra will use data from the document title and body fields in search responses.

   1.  **Add field**—To add custom data source fields to create an index field name to map to and the field data type.

   1. Choose **Next**.

1. On the **Review and create** page, check that the information you have entered is correct and then select **Add data source**. You can also choose to edit your information from this page. Your data source will appear on the **Data sources** page after the data source has been added successfully.

------
#### [ API ]

**To connect Amazon Kendra to Salesforce**

You must specify a JSON of the [data source schema](https://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html) using the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html) API. You must provide the following information:
+ **Data source**—Specify the data source type as `SALESFORCEV2` when you use the [https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) JSON schema. Also specify the data source as `TEMPLATE` when you call the [https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html](https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html) API.
+ **Host URL**—Specify the Salesforce instance host URL.
+ **Sync mode**—Specify how Amazon Kendra should update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option. You can choose between:
  + `FORCED_FULL_CRAWL` to freshly index all content, replacing existing content each time your data source syncs with your index.
  + `FULL_CRAWL` to index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source’s mechanism for tracking content changes and index content that changed since the last sync.
  + `CHANGE_LOG` to index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source’s mechanism for tracking content changes and index content that changed since the last sync.
+ **Secret Amazon Resource Name (ARN)**—Provide the Amazon Resource Name (ARN) of an Secrets Manager secret that contains the authentication credentials for your Salesforce account. The secret is stored in a JSON structure with the following keys:

  ```
  {
      "authenticationUrl": "OAUTH endpoint that Amazon Kendra connects to get an OAUTH token",
      "consumerKey": "Application public key generated when you created your Salesforce application",
      "consumerSecret": "Application private key generated when you created your Salesforce application",
      "password": "Password associated with the user logging in to the Salesforce instance",
      "securityToken": "Token associated with the user account logging in to the Salesforce instance",
      "username": "User name of the user logging in to the Salesforce instance"
  }
  ```
+ **IAM role**—Specify `RoleArn` when you call `CreateDataSource` to provide an IAM role with permissions to access your Secrets Manager secret and to call the required public APIs for the Salesforce connector and Amazon Kendra. For more information, see [IAM roles for Salesforce data sources](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds).

You can also add the following optional features:
+  **Virtual Private Cloud (VPC)**—Specify `VpcConfiguration` when you call `CreateDataSource`. For more information, see [Configuring Amazon Kendra to use an Amazon VPC](vpc-configuration.md).
+ **Inclusion and exclusion filters**—You can specify whether to include or exclude certain documents, accounts, campaigns, cases, contacts, leads, opportunities, solutions, tasks, groups, chatters, and custom entity files.
**Note**  
Most data sources use regular expression patterns, which are inclusion or exclusion patterns referred to as filters. If you specify an inclusion filter, only content that matches the inclusion filter is indexed. Any document that doesn’t match the inclusion filter isn’t indexed. If you specify an inclusion and exclusion filter, documents that match the exclusion filter are not indexed, even if they match the inclusion filter.
+ **Identity crawler**—Specify whether to turn on Amazon Kendra’s identity crawler. The identity crawler uses the access control list (ACL) information for your documents to filter search results based on the user or their group access to documents. If you have an ACL for your documents and choose to use your ACL, you can then also choose to turn on Amazon Kendra’s identity crawler to configure [user context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources) of search results. Otherwise, if identity crawler is turned off, all documents can be publicly searched. If you want to use access control for your documents and identity crawler is turned off, you can alternatively use the [PutPrincipalMapping](https://docs.aws.amazon.com/kendra/latest/APIReference/API_PutPrincipalMapping.html) API to upload user and group access information for user context filtering.
+  **Field mappings**—Choose to map your Salesforce data source fields to your Amazon Kendra index fields. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html).
**Note**  
The document body field or the document body equivalent for your documents is required in order for Amazon Kendra to search your documents. You must map your document body field name in your data source to the index field name `_document_body`. All other fields are optional.
**Note**  
An index mapping to `_document_body` is required. You can't change the mapping between the `Salesforce ID` field and the Amazon Kendra `_document_id `field. You can map any Salesforce field to the document title or document body Amazon Kendra reserved/default index fields.   
If you map any Salesforce field to Amazon Kendra document title and document body fields, Amazon Kendra will use data from the document title and body fields in search responses.

For a list of other important JSON keys to configure, see [Salesforce template schema](https://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html#ds-salesforce-schema).

------

## Learn more
<a name="salesforce-v2-learn-more"></a>

To learn more about integrating Amazon Kendra with your Salesforce data source, see:
+ [Announcing the updated Salesforce connector (V2) for Amazon Kendra](https://aws.amazon.com/blogs/machine-learning/announcing-the-updated-salesforce-connector-v2-for-amazon-kendra/)

## Notes
<a name="salesforce-notes"></a>
+ When Access Control Lists (ACLs) are enabled, the "Sync only new or modified content" option is not available due to Salesforce API limitations. We recommend using "Full sync" or "New, modified, or deleted content sync" modes instead, or disable ACLs if you need to use this sync mode.

# ServiceNow
<a name="data-source-servicenow"></a>

ServiceNow provides a cloud-based service management system to create and manage organization-level workflows, such as IT services, ticketing systems, and support. You can use Amazon Kendra to index your ServiceNow catalogs, knowledge articles, incidents, and their attachments.

You can connect Amazon Kendra to your ServiceNow data source using either the [Amazon Kendra console](https://console.aws.amazon.com/kendra/), the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html) API, or the [ServiceNowConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_ServiceNowConfiguration.html) API.

Amazon Kendra has two versions of the ServiceNow connector. Supported features of each version include:

**ServiceNow connector V1.0 / [ServiceNowConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_ServiceNowConfiguration.html) API**
+ Field mappings
+ ServiceNow instance versions: London, Others
+ Inclusion/exclusion filters

**ServiceNow connector V2.0 / [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html) API**
+ Field mappings
+ User access control
+ Inclusion/exclusion filters
+ Full and incremental content syncs
+ ServiceNow instance versions: Rome, Sandiego, Tokyo, Others
+ Virtual private cloud (VPC)

**Note**  
ServiceNow connector V1.0 / ServiceNowConfiguration API ended in 2023. We recommend migrating to or using ServiceNow connector V2.0 / TemplateConfiguration API.

For troubleshooting your Amazon Kendra ServiceNow data source connector, see [Troubleshooting data sources](troubleshooting-data-sources.md).

**Topics**
+ [ServiceNow connector V1.0](data-source-v1-servicenow.md)
+ [ServiceNow connector V2.0](data-source-v2-servicenow.md)
+ [Specifying documents to index with a query](servicenow-query.md)

# ServiceNow connector V1.0
<a name="data-source-v1-servicenow"></a>

ServiceNow provides a cloud-based service management system to create and manage organization-level workflows, such as IT services, ticketing systems, and support. You can use Amazon Kendra to index your ServiceNow catalogs, knowledge articles, and their attachments.

**Note**  
ServiceNow connector V1.0 / ServiceNowConfiguration API ended in 2023. We recommend migrating to or using ServiceNow connector V2.0 / TemplateConfiguration API.

For troubleshooting your Amazon Kendra ServiceNow data source connector, see [Troubleshooting data sources](troubleshooting-data-sources.md).

**Topics**
+ [Supported features](#supported-features-v1-servicenow)
+ [Prerequisites](#prerequisites-v1-servicenow)
+ [Connection instructions](#data-source-procedure-v1-servicenow)
+ [Learn more](#servicenow-v1-learn-more)

## Supported features
<a name="supported-features-v1-servicenow"></a>

Amazon Kendra ServiceNow data source connector supports the following features:
+ ServiceNow instance versions: London, Others
+ Inclusion/exclusion patterns: Service catalogs, knowledge articles, and their attachments

## Prerequisites
<a name="prerequisites-v1-servicenow"></a>

Before you can use Amazon Kendra to index your ServiceNow data source, make these changes in your ServiceNow and AWS accounts.

**In ServiceNow, make sure you have:**
+ Created a ServiceNow administrator account and have created a ServiceNow instance.
+ Copied the host of your ServiceNow instance URL. For example, if the URL of the instance is *https://your-domain.service-now.com*, the format for the host URL you enter is *your-domain.service-now.com*.
+ Noted your basic authentication credentials containing a user name and password to allow Amazon Kendra to connect to your ServiceNow instance.
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).
+ **Optional:** Configured an OAuth 2.0 credential token that can identify Amazon Kendra and generate a user name, password, a client ID, and a client secret. The user name and password must provide access to the ServiceNow knowledge base and service catalog. See [ServiceNow documentation on OAuth 2.0 authentication](https://www.servicenow.com/docs/bundle/utah-platform-security/page/integrate/single-sign-on/concept/c_Authentication.html) for more information.
+ Added the following permissions:
  + kb\$1category
  + kb\$1knowledge
  + kb\$1knowledge\$1base
  + kb\$1uc\$1cannot\$1read\$1mtom
  + kb\$1uc\$1can\$1read\$1mtom
  + sc\$1catalog
  + sc\$1category
  + sc\$1cat\$1item
  + sys\$1attachment
  + sys\$1attachment\$1doc
  + sys\$1user\$1role
+ Checked each document is unique in ServiceNow and across other data sources you plan to use for the same index. Each data source that you want to use for an index must not contain the same document across the data sources. Document IDs are global to an index and must be unique per index.

**In your AWS account, make sure you have:**
+ [Created an Amazon Kendra index](https://docs.aws.amazon.com/kendra/latest/dg/create-index.html) and, if using the API, noted the index ID.
+ [Created an IAM role](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds) for your data source and, if using the API, noted the ARN of the IAM role.
**Note**  
If you change your authentication type and credentials, you must update your IAM role to access the correct AWS Secrets Manager secret ID.
+ Stored your ServiceNow authentication credentials in an AWS Secrets Manager secret and, if using the API, noted the ARN of the secret.
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).

If you don’t have an existing IAM role or secret, you can use the console to create a new IAM role and Secrets Manager secret when you connect your ServiceNow data source to Amazon Kendra. If you are using the API, you must provide the ARN of an existing IAM role and Secrets Manager secret, and an index ID.

## Connection instructions
<a name="data-source-procedure-v1-servicenow"></a>

To connect Amazon Kendra to your ServiceNow data source, you must provide the necessary details of your ServiceNow data source so that Amazon Kendra can access your data. If you have not yet configured ServiceNow for Amazon Kendra see [Prerequisites](#prerequisites-v1-servicenow).

------
#### [ Console ]

**To connect Amazon Kendra to ServiceNow** 

1. Sign in to the AWS Management Console and open the [Amazon Kendra console](https://console.aws.amazon.com/kendra/).

1. From the left navigation pane, choose **Indexes** and then choose the index you want to use from the list of indexes.
**Note**  
You can choose to configure or edit your **User access control** settings under **Index settings**. 

1. On the **Getting started** page, choose **Add data source**.

1. On the **Add data source** page, choose **ServiceNow connector V1.0**, and then choose **Add data source**.

1. On the **Specify data source details** page, enter the following information:

   1. In **Name and description**, for **Data source name**—Enter a name for your data source. You can include hyphens but not spaces.

   1. (Optional)** Description**—Enter an optional description for your data source.

   1. In **Default language**—Choose a language to filter your documents for the index. Unless you specify otherwise, the language defaults to English. Language specified in the document metadata overrides the selected language.

   1. In **Tags**, for **Add new tag**—Include optional tags to search and filter your resources or track your AWS costs.

   1. Choose **Next**.

1. On the **Define access and security** page, enter the following information:

   1. **ServiceNow host**—Enter the ServiceNow host URL.

   1. **ServiceNow version**—Select your ServiceNow version.

   1. Choose between **Basic authentication** and **Oauth 2.0 authentication** based on your use case.

   1. **AWS Secrets Manager secret**—Choose an existing secret or create a new Secrets Manager secret to store your ServiceNow authentication credentials. If you choose to create a new secret an AWS Secrets Manager secret window opens.

      1. **Secret name**—A name for your secret. The prefix ‘AmazonKendra-ServiceNow-’ is automatically added to your secret name.

      1. If using Basic Authentication—Enter the **Secret name**, **Username**, and **Password** for your ServiceNow account.

         If using OAuth2 Authentication—Enter the **Secret name**, **Username**, **Password**, **Client ID**, and **Client Secret** you created in your ServiceNow account.

      1. Choose **Save and add secret**.

   1. **IAM role**—Choose an existing IAM role or create a new IAM role to access your repository credentials and index content.
**Note**  
IAM roles used for indexes cannot be used for data sources. If you are unsure if an existing role is used for an index or FAQ, choose **Create a new role** to avoid errors.

   1. Choose **Next**.

1. On the **Configure sync settings** page, enter the following information:

   1. **Include knowledge articles**—Choose to index knowledge articles.

   1. **Type of knowledge articles**—Choose between **Include only public articles** and **Include articles based on ServiceNow filter query** based on your use case. If you select **Include articles based on ServiceNow filter query**, you must enter a **Filter query** copied from your ServiceNow account.

   1. **Include knowledge articles attachments**—Choose to index knowledge article attachments. You can also select specific file types to index.

   1. **Include catalog items**—Choose to index catalog items.

   1. **Include catalog item attachments**—Choose to index catalog item attachments. You can also select specific file types to index.

   1. **Frequency**—How often Amazon Kendra will sync with your data source.

   1. Choose **Next**.

1. On the **Set field mappings** page, enter the following information:

   1. **Knowledge articles** and **Service catalog** —Select from the Amazon Kendra generated default data source fields and additional suggested field mappings that you want to map to your index. 

   1.  **Add field**—To add custom data source fields to create an index field name to map to and the field data type.

   1. Choose **Next**.

1. On the **Review and create** page, check that the information you have entered is correct and then select **Add data source**. You can also choose to edit your information from this page. Your data source will appear on the **Data sources** page after the data source has been added successfully.

------
#### [ API ]

**To connect Amazon Kendra to ServiceNow**

You must specify the following using [ServiceNowConfiguration API](https://docs.aws.amazon.com/kendra/latest/APIReference/API_ServiceNowConfiguration.html):
+ **Data source URL**—Specify the ServiceNow URL. The host endpoint should look like the following: *your-domain.service-now.com*.
+ **Data source host instance**—Specify the ServiceNow host instance version as either `LONDON` or `OTHERS`.
+ **Secret Amazon Resource Name (ARN)**—Provide the Amazon Resource Name (ARN) of a Secrets Manager secret that contains the authentication credentials you created in your ServiceNow account.

   If you are using basic authentication, the secret is stored in a JSON structure with the following keys:

  ```
  {
      "username": "user name",
      "password": "password"
  }
  ```

  If you are using OAuth2 authentication, the secret is stored in a JSON structure with the following keys:

  ```
  {
      "username": "user name",
      "password": "password",
      "clientId": "client id",
      "clientSecret": "client secret"
  }
  ```
+ **IAM role**—Specify `RoleArn` when you call `CreateDataSource` to provide an IAM role with permissions to access your Secrets Manager secret and to call the required public APIs for the ServiceNow connector and Amazon Kendra. For more information, see [IAM roles for ServiceNow data sources](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds).

You can also add the following optional features:
+  **Field mappings**—Choose to map your ServiceNow data source fields to your Amazon Kendra index fields. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html).
**Note**  
The document body field or the document body equivalent for your documents is required in order for Amazon Kendra to search your documents. You must map your document body field name in your data source to the index field name `_document_body`. All other fields are optional.
+  **Inclusion and exclusion filters**—Specify whether to include or exclude certain file attachments of catalogs and knowledge articles.
**Note**  
Most data sources use regular expression patterns, which are inclusion or exclusion patterns referred to as filters. If you specify an inclusion filter, only content that matches the inclusion filter is indexed. Any document that doesn’t match the inclusion filter isn’t indexed. If you specify an inclusion and exclusion filter, documents that match the exclusion filter are not indexed, even if they match the inclusion filter.
+ **Indexing parameters**—You can also choose to specify whether to:
  + Index knowledge articles and service catalogs, or both of these. If you choose to index knowledge articles and service catalog items, you must provide the name of the ServiceNow field that is mapped to the index document contents field in the Amazon Kendra index.
  + Index attachments to knowledge articles and catalog items.
  + Use a ServiceNow query that selects documents from one or more knowledge bases. The knowledge bases can be public or private. For more information, see [Specifying documents to index with a query](https://docs.aws.amazon.com/kendra/latest/dg/servicenow-query.html).

------

## Learn more
<a name="servicenow-v1-learn-more"></a>

To learn more about integrating Amazon Kendra with your ServiceNow data source, see:
+ [Getting started with Amazon Kendra ServiceNow Online connector](https://aws.amazon.com/blogs/machine-learning/getting-started-with-amazon-kendra-servicenow-online-connector/)

# ServiceNow connector V2.0
<a name="data-source-v2-servicenow"></a>

ServiceNow provides a cloud-based service management system to create and manage organization-level workflows, such as IT services, ticketing systems, and support. You can use Amazon Kendra to index your ServiceNow catalogs, knowledge articles, incidents, and their attachments.

For troubleshooting your Amazon Kendra ServiceNow data source connector, see [Troubleshooting data sources](troubleshooting-data-sources.md).

**Topics**
+ [Supported features](#supported-features-v2-servicenow)
+ [Prerequisites](#prerequisites-v2-servicenow)
+ [Connection instructions](#data-source-procedure-v2-servicenow)
+ [Learn more](#servicenow-learn-more)

## Supported features
<a name="supported-features-v2-servicenow"></a>

Amazon Kendra ServiceNow data source connector supports the following features:
+ Field mappings
+ User access control
+ Inclusion/exclusion filters
+ Full and incremental content syncs
+ ServiceNow instance versions: Rome, Sandiego, Tokyo, Others
+ Virtual private cloud (VPC)

## Prerequisites
<a name="prerequisites-v2-servicenow"></a>

Before you can use Amazon Kendra to index your ServiceNow data source, make these changes in your ServiceNow and AWS accounts.

**In ServiceNow, make sure you have:**
+ Created a Personal or Enterprise Developer Instance and have a ServiceNow instance with an administrative role.
+ Copied the host of your ServiceNow instance URL. The format for the host URL you enter is *your-domain.service-now.com*. You need your ServiceNow instance URL to connect to Amazon Kendra.
+ Noted your basic authentication credentials of a user name and password to allow Amazon Kendra to connect to your ServiceNow instance.
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).
+ **Optional:** Configured OAuth 2.0 client credentials that can identify Amazon Kendra using a user name, password, and a generated client ID, and a client secret. See [ServiceNow documentation on OAuth 2.0 authentication](https://www.servicenow.com/docs/bundle/utah-platform-security/page/integrate/single-sign-on/concept/c_Authentication.html) for more information.
+ Added the following permissions:
  + kb\$1category
  + kb\$1knowledge
  + kb\$1knowledge\$1base
  + kb\$1uc\$1cannot\$1read\$1mtom
  + kb\$1uc\$1can\$1read\$1mtom
  + sc\$1catalog
  + sc\$1category
  + sc\$1cat\$1item
  + sys\$1attachment
  + sys\$1attachment\$1doc
  + sys\$1user\$1role
+ Checked each document is unique in ServiceNow and across other data sources you plan to use for the same index. Each data source that you want to use for an index must not contain the same document across the data sources. Document IDs are global to an index and must be unique per index.

**In your AWS account, make sure you have:**
+ [Created an Amazon Kendra index](https://docs.aws.amazon.com/kendra/latest/dg/create-index.html) and, if using the API, noted the index ID.
+ [Created an IAM role](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds) for your data source and, if using the API, noted the ARN of the IAM role.
**Note**  
If you change your authentication type and credentials, you must update your IAM role to access the correct AWS Secrets Manager secret ID.
+ Stored your ServiceNow authentication credentials in an AWS Secrets Manager secret and, if using the API, noted the ARN of the secret.
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).

If you don’t have an existing IAM role or secret, you can use the console to create a new IAM role and Secrets Manager secret when you connect your ServiceNow data source to Amazon Kendra. If you are using the API, you must provide the ARN of an existing IAM role and Secrets Manager secret, and an index ID.

## Connection instructions
<a name="data-source-procedure-v2-servicenow"></a>

To connect Amazon Kendra to your ServiceNow data source, you must provide the necessary details of your ServiceNow data source so that Amazon Kendra can access your data. If you have not yet configured ServiceNow for Amazon Kendra see [Prerequisites](#prerequisites-v2-servicenow).

------
#### [ Console ]

**To connect Amazon Kendra to ServiceNow** 

1. Sign in to the AWS Management Console and open the [Amazon Kendra console](https://console.aws.amazon.com/kendra/).

1. From the left navigation pane, choose **Indexes** and then choose the index you want to use from the list of indexes.
**Note**  
You can choose to configure or edit your **User access control** settings under **Index settings**. 

1. On the **Getting started** page, choose **Add data source**.

1. On the **Add data source** page, choose **ServiceNow connector**, and then choose **Add connector**. If using version 2 (if applicable), choose **ServiceNow connector** with the "V2.0" tag.

1. On the **Specify data source details** page, enter the following information:

   1. In **Name and description**, for **Data source name**—Enter a name for your data source. You can include hyphens but not spaces.

   1. (Optional)** Description**—Enter an optional description for your data source.

   1. In **Default language**—Choose a language to filter your documents for the index. Unless you specify otherwise, the language defaults to English. Language specified in the document metadata overrides the selected language.

   1. In **Tags**, for **Add new tag**—Include optional tags to search and filter your resources or track your AWS costs.

   1. Choose **Next**.

1. On the **Define access and security** page, enter the following information:

   1. **ServiceNow host**—Enter the ServiceNow host URL. The format for the host URL you enter is *your-domain.service-now.com*.

   1. **ServiceNow version**—Select your ServiceNow instance version. You can select from Rome, Sandiego, Tokyo, or Others.

   1. **Authorization**—Turn on or off access control list (ACL) information for your documents, if you have an ACL and want to use it for access control. The ACL specifies which documents that users and groups can access. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see [User context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources).

   1. **Authentication**—Choose between **Basic authentication** and **Oauth 2.0 authentication**.

   1. **AWS Secrets Manager secret**—Choose an existing secret or create a new Secrets Manager secret to store your ServiceNow authentication credentials. If you choose to create a new secret an AWS Secrets Manager secret window opens. Enter the following information in the window:

      1. **Secret name**—A name for your secret. The prefix ‘AmazonKendra-ServiceNow-’ is automatically added to your secret name.

      1. If using Basic Authentication—Enter the **Secret name**, **Username**, and **Password** for your ServiceNow account.

         If using OAuth2.0 Authentication—Enter the **Secret name**, **Username**, **Password**, **Client ID**, and **Client Secret** you created in your ServiceNow account.

      1. Save and add your secret.

   1. **Virtual Private Cloud (VPC)**—You can choose to use a VPC. If so, you must add **Subnets** and **VPC security groups**.

   1. **Identity crawler**—Specify whether to turn on Amazon Kendra’s identity crawler. The identity crawler uses the access control list (ACL) information for your documents to filter search results based on the user or their group access to documents. If you have an ACL for your documents and choose to use your ACL, you can then also choose to turn on Amazon Kendra’s identity crawler to configure [user context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources) of search results. Otherwise, if identity crawler is turned off, all documents can be publicly searched. If you want to use access control for your documents and identity crawler is turned off, you can alternatively use the [PutPrincipalMapping](https://docs.aws.amazon.com/kendra/latest/APIReference/API_PutPrincipalMapping.html) API to upload user and group access information for user context filtering.

   1. **IAM role**—Choose an existing IAM role or create a new IAM role to access your repository credentials and index content.
**Note**  
IAM roles used for indexes cannot be used for data sources. If you are unsure if an existing role is used for an index or FAQ, choose **Create a new role** to avoid errors.

   1. Choose **Next**.

1. On the **Configure sync settings** page, enter the following information:

   1. For **Knowledge articles**, choose from the following options :
      +  **Knowledge articles**—Choose to index knowledge articles.
      + **Knowledge article attachments**—Choose to index knowledge article attachments.
      + **Type of knowledge articles**—Choose between **Only public articles** and **Knowledge articles based on ServiceNow filter query** based on your use case. If you select **Include articles based on ServiceNow filter query**, you must enter a **Filter query** copied from your ServiceNow account. Example filter queries include: *workflow\$1state=draft^EQ*, *kb\$1knowledge\$1base=dfc19531bf2021003f07e2c1ac0739ab^text ISNOTEMPTY^EQ*, *article\$1type=text^active=true^EQ*.
**Important**  
If you choose to crawl **Only public articles**, Amazon Kendra crawls only knowledge articles assigned a public access role in ServiceNow.
      + **Include articles based on short description filter**—Specify regular expression patterns to include or exclude specific articles.

   1. For **Service catalog items**:
      +  **Service catalog items**—Choose to index service catalog items.
      + **Service catalog item attachments**—Choose to index service catalog item attachments.
      + **Active service catalog items**—Choose to index active service catalog items.
      + **Inactive service catalog items**—Choose to index inactive service catalog items.
      + **Filter query**—Choose to include service catalog items based on a filter defined in your ServiceNow instance. Example filter queries include: *short\$1descriptionLIKEAccess^category=2809952237b1300054b6a3549dbe5dd4^EQ*, *nameSTARTSWITHService^active=true^EQ*.
      + **Include service catalog items based on short description filter**—Specify a regex pattern to include specific catalog items.

   1. For **Incidents**:
      + **Incidents**—Choose to index service incidents.
      + **Incident attachments**—Choose to index incident attachments.
      + **Active incidents**—Choose to index active incidents.
      + **Inactive incidents**—Choose to index inactive incidents.
      + **Active incident type**—Choose between **All incidents**, **Open incidents**, **Open - unassigned incidents**, and **Resolved incidents** depending on your use case.
      + **Filter query**—Choose to include incidents based on a filter defined in your ServiceNow instance. Example filter queries include: *short\$1descriptionLIKETest^urgency=3^state=1^EQ*, *priority=2^category=software^EQ *.
      + **Include incidents based on short description filter**—Specify a regex pattern to include specific incidents.

   1. For **Additional configuration**:
      + **ACL information**—Access control lists for entities you have selected are included by default. Deselecting an access control list will make all files in that category public. ACL options are automatically deactivated for entities not selected. For public articles ACL is not applied.
      + For **Maximum file size** – Specify the file size limit in MBs that Amazon Kendra will crawl. Amazon Kendra will crawl only the files within the size limit you define. The default file size is 50MB. The maximum file size should be greater than 0MB and less than or equal to 50MB.
      + **Attachment regex patterns**—Add regular expression patterns to include or exclude certain attached files of catalogs, knowledge articles, and incidents. You can add up to 100 patterns.

   1. **Sync mode**—Choose how you want to update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option.
      + Full sync: Freshly index all content, replacing existing content each time your data source syncs with your index.
      + New, modified, deleted sync: Index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

   1. In **Sync run schedule**, for **Frequency**—Choose how often to sync your data source content and update your index.

   1. Choose **Next**.

1. On the **Set field mappings** page, enter the following information:

   1. **Default field mappings**—Select from the Amazon Kendra generated default data source fields that you want to map to your index.

   1.  **Add field**—To add custom data source fields to create an index field name to map to and the field data type.

   1. Choose **Next**.

1. On the **Review and create** page, check that the information you have entered is correct and then select **Add data source**. You can also choose to edit your information from this page. Your data source will appear on the **Data sources** page after the data source has been added successfully.

------
#### [ API ]

**To connect Amazon Kendra to ServiceNow**

You must specify a JSON of the [data source schema](https://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html) using the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html) API. You must provide the following information:
+ **Data source**—Specify the data source type as `SERVICENOWV2` when you use the [https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) JSON schema. Also specify the data source as `TEMPLATE` when you call the [https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html](https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html) API.
+ **Host URL**—Specify the ServiceNow host instance version. For example, *your-domain.service-now.com*.
+ **Authentication type**—Specify the type of authentication you use, whether `basicAuth` or `OAuth2` for your ServiceNow instance.
+ **ServiceNow instance version**—Specify the ServiceNow instance you use, whether `Tokyo`, `Sandiego`, `Rome`, or `Others`.
+ **Sync mode**—Specify how Amazon Kendra should update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option. You can choose between:
  + `FORCED_FULL_CRAWL` to freshly index all content, replacing existing content each time your data source syncs with your index.
  + `FULL_CRAWL` to index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source’s mechanism for tracking content changes and index content that changed since the last sync.
+ **Secret Amazon Resource Name (ARN)**—Provide the Amazon Resource Name (ARN) of a Secrets Manager secret that contains the authentication credentials you created in your ServiceNow account.

  If you use basic authentication, the secret is stored in a JSON structure with the following keys:

  ```
  {
      "username": "user name",
      "password": "password"
  }
  ```
+ If you use OAuth2 client credentials, the secret is stored in a JSON structure with the following keys:

  ```
  {
      "username": "user name",
      "password": "password",
      "clientId": "client id",
      "clientSecret": "client secret"
  }
  ```
+ **IAM role**—Specify `RoleArn` when you call `CreateDataSource` to provide an IAM role with permissions to access your Secrets Manager secret and to call the required public APIs for the ServiceNow connector and Amazon Kendra. For more information, see [IAM roles for ServiceNow data sources](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds).

You can also add the following optional features:
+  **Virtual Private Cloud (VPC)**—Specify `VpcConfiguration` when you call `CreateDataSource`. For more information, see [Configuring Amazon Kendra to use an Amazon VPC](vpc-configuration.md).
+  **Inclusion and exclusion filters**—You can specify whether to include or exclude certain attached files using the file names and the file types of knowledge articles, service catalogs, and incidents. 
**Note**  
Most data sources use regular expression patterns, which are inclusion or exclusion patterns referred to as filters. If you specify an inclusion filter, only content that matches the inclusion filter is indexed. Any document that doesn’t match the inclusion filter isn’t indexed. If you specify an inclusion and exclusion filter, documents that match the exclusion filter are not indexed, even if they match the inclusion filter.
+  **Specific documents to index**—You can use a ServiceNow query to specify the documents you want from one or more knowledge bases, including private knowledge bases. Access to the knowledge bases is determined by the user that you use to connect to the ServiceNow instance. For more information, see [Specifying documents to index with a query](https://docs.aws.amazon.com//kendra/latest/dg/servicenow-query.html).
+ **Indexing parameters**—You can also choose to specify whether to:
  + Index knowledge articles, service catalogs, and incidents or all of these. If you choose to index knowledge articles, service catalog items and incidents, you must provide the name of the ServiceNow field that is mapped to the index document contents field in the Amazon Kendra index.
  + Index attachments to knowledge articles, service catalog items and incidents.
  + Include knowledge articles, service catalog items and incidents based on the `short description` filter pattern.
  + Choose to filter active and inactive service catalog items and incidents.
  + Choose to filter incidents based on incident type.
  + Choose which entities should have their ACL crawled.
  + You can use a ServiceNow query to specify the documents you want from one or more knowledge bases, including private knowledge bases. Access to the knowledge bases is determined by the user that you use to connect to the ServiceNow instance. For more information, see [Specifying documents to index with a query](https://docs.aws.amazon.com//kendra/latest/dg/servicenow-query.html).
+ **Identity crawler**—Specify whether to turn on Amazon Kendra’s identity crawler. The identity crawler uses the access control list (ACL) information for your documents to filter search results based on the user or their group access to documents. If you have an ACL for your documents and choose to use your ACL, you can then also choose to turn on Amazon Kendra’s identity crawler to configure [user context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources) of search results. Otherwise, if identity crawler is turned off, all documents can be publicly searched. If you want to use access control for your documents and identity crawler is turned off, you can alternatively use the [PutPrincipalMapping](https://docs.aws.amazon.com/kendra/latest/APIReference/API_PutPrincipalMapping.html) API to upload user and group access information for user context filtering.
+  **Field mappings**—Choose to map your ServiceNow data source fields to your Amazon Kendra index fields. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html).
**Note**  
The document body field or the document body equivalent for your documents is required in order for Amazon Kendra to search your documents. You must map your document body field name in your data source to the index field name `_document_body`. All other fields are optional.

For a list of other important JSON keys to configure, see [ServiceNow template schema](https://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html#ds-servicenow-schema).

------

## Learn more
<a name="servicenow-learn-more"></a>

To learn more about integrating Amazon Kendra with your ServiceNow data source, see:
+ [Getting started with Amazon KendraAnnouncing the updated ServiceNow connector (V2) for Amazon Kendra](https://aws.amazon.com/blogs/machine-learning/announcing-the-updated-servicenow-connector-v2-for-amazon-kendra/)

# Specifying documents to index with a query
<a name="servicenow-query"></a>

You can use a ServiceNow query to specify the documents you want to include in an Amazon Kendra index. When you use a query, you can specify multiple knowledge bases, including private knowledge bases. Access to the knowledge bases is determined by the user that you use to connect to the ServiceNow instance.

To build a query, you use the ServiceNow query builder. You can use the builder to create the query and to test that the query returns the correct list of documents.

**To create a query using the ServiceNow console**

1. Log in to the ServiceNow console.

1. From the left menu, choose **Knowledge**, then **Articles**, and the choose **All**.

1. At the top of the page, choose the filter icon.

1. Use the query builder to create the query.

1. When the query is complete, right click the query and choose **Copy query** to copy the query from the query builder. Save this query to use in Amazon Kendra.  
![\[Query builder interface showing Knowledge base filters with options to run, save, and copy query.\]](http://docs.aws.amazon.com/kendra/latest/dg/images/ServiceNowQuery.png)

Make sure that you don't change any query parameter when you copy the query. If any of the query parameters are not recognized, ServiceNow treats the parameter as empty and doesn't use it to filter the results.

# Slack
<a name="data-source-slack"></a>

Slack is an enterprise communications app that lets users send messages and attachments through various public and private channels. You can use Amazon Kendra to index your Slack public and private channels, bot and archive messages, files and attachments, direct and group messages. You can also choose specific content to filter.

**Note**  
Amazon Kendra now supports an upgraded Slack connector.  
The console has been automatically upgraded for you. Any new connectors you create in the console will use the upgraded architecture. If you use the API, you must now use the [https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html](https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html) object instead of the `SlackConfiguration` object to configure your connector.  
Connectors configured using the older console and API architecture will continue to function as configured. However, you won’t be able to edit or update them. If you want to edit or update your connector configuration, you must create a new connector.  
We recommended migrating your connector workflow to the upgraded version. Support for connectors configured using the older architecture is scheduled to end by June 2024.

You can connect Amazon Kendra to your Slack data source using the [Amazon Kendra console](https://console.aws.amazon.com/kendra/) or the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html) API.

For troubleshooting your Amazon Kendra Slack data source connector, see [Troubleshooting data sources](troubleshooting-data-sources.md).

**Topics**
+ [Supported features](#supported-features-slack)
+ [Prerequisites](#prerequisites-slack)
+ [Connection instructions](#data-source-procedure-slack)
+ [Learn more](#slack-learn-more)

## Supported features
<a name="supported-features-slack"></a>

Amazon Kendra Slack data source connector supports the following features:
+ Field mappings
+ User access control
+ Inclusion/exclusion filters
+ Full and incremental content syncs
+ Virtual private cloud (VPC)

## Prerequisites
<a name="prerequisites-slack"></a>

Before you can use Amazon Kendra to index your Slack data source, make these changes in your Slack and AWS accounts.

**In Slack, make sure you have:**
+ Configured a Slack Bot User OAuth token or Slack User OAuth token. You can choose either token to connect Amazon Kendra to your Slack data source. A token is required to use as your authentication credentials. See [Slack documentation on access tokens](https://api.slack.com/authentication/token-types) for more information.
**Note**  
If you use the bot token as part of your Slack credentials, you cannot index direct messages and group messages and you must add the bot token to the channel you want to index.
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).
+ Noted your Slack workspace team ID from your Slack workspace main page URL. For example, *https://app.slack.com/client/T0123456789/... * where *T0123456789* is the team ID.
+ Added the following Oauth scopes/permissions:    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kendra/latest/dg/data-source-slack.html)
+ Checked each document is unique in Slack and across other data sources you plan to use for the same index. Each data source that you want to use for an index must not contain the same document across the data sources. Document IDs are global to an index and must be unique per index.

**In your AWS account, make sure you have:**
+ [Created an Amazon Kendra index](https://docs.aws.amazon.com/kendra/latest/dg/create-index.html) and, if using the API, noted the index ID.
+ [Created an IAM role](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds) for your data source and, if using the API, noted the ARN of the IAM role.
**Note**  
If you change your authentication type and credentials, you must update your IAM role to access the correct AWS Secrets Manager secret ID.
+ Stored your Slack authentication credentials in an AWS Secrets Manager secret and, if using the API, noted the ARN of the secret.
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).

If you don’t have an existing IAM role or secret, you can use the console to create a new IAM role and Secrets Manager secret when you connect your Slack data source to Amazon Kendra. If you are using the API, you must provide the ARN of an existing IAM role and Secrets Manager secret, and an index ID.

## Connection instructions
<a name="data-source-procedure-slack"></a>

To connect Amazon Kendra to your Slack data source, you must provide the necessary details of your Slack data source so that Amazon Kendra can access your data. If you have not yet configured Slack for Amazon Kendra, see [Prerequisites](#prerequisites-slack).

------
#### [ Console ]

**To connect Amazon Kendra to Slack** 

1. Sign in to the AWS Management Console and open the [Amazon Kendra console](https://console.aws.amazon.com/kendra/).

1. From the left navigation pane, choose **Indexes** and then choose the index you want to use from the list of indexes.
**Note**  
You can choose to configure or edit your **User access control** settings under **Index settings**. 

1. On the **Getting started** page, choose **Add data source**.

1. On the **Add data source** page, choose **Slack connector**, and then choose **Add connector**. If using version 2 (if applicable), choose **Slack connector** with the "V2.0" tag.

1. On the **Specify data source details** page, enter the following information:

   1. In **Name and description**, for **Data source name**—Enter a name for your data source. You can include hyphens but not spaces.

   1. (Optional)** Description**—Enter an optional description for your data source.

   1. In **Default language**—Choose a language to filter your documents for the index. Unless you specify otherwise, the language defaults to English. Language specified in the document metadata overrides the selected language.

   1. In **Tags**, for **Add new tag**—Include optional tags to search and filter your resources or track your AWS costs.

   1. Choose **Next**.

1. On the **Define access and security** page, enter the following information:

   1. For **Slack workspace team ID**—The team ID of your Slack workspace. You can find your team ID in your Slack workspace main page URL. For example, *https://app.slack.com/client/T0123456789/...* where *T0123456789* is the team ID.

   1. **Authorization**—Turn on or off access control list (ACL) information for your documents, if you have an ACL and want to use it for access control. The ACL specifies which documents that users and groups can access. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see [User context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources).

   1. **AWS Secrets Manager secret**—Choose an existing secret or create a new Secrets Manager secret to store your Slack authentication credentials. If you choose to create a new secret an AWS Secrets Manager secret window opens.

      1. Enter following information in the **Create an AWS Secrets Manager secret window**:

         1. **Secret name**—A name for your secret. The prefix ‘AmazonKendra-Slack-’ is automatically added to your secret name.

         1. For **Slack token**—Enter the authentication credential values you configured Slack. 

      1. Save and add your secret.

   1. **Virtual Private Cloud (VPC)**—You can choose to use a VPC. If so, you must add **Subnets** and **VPC security groups**.

   1. **Identity crawler**—Specify whether to turn on Amazon Kendra’s identity crawler. The identity crawler uses the access control list (ACL) information for your documents to filter search results based on the user or their group access to documents. If you have an ACL for your documents and choose to use your ACL, you can then also choose to turn on Amazon Kendra’s identity crawler to configure [user context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources) of search results. Otherwise, if identity crawler is turned off, all documents can be publicly searched. If you want to use access control for your documents and identity crawler is turned off, you can alternatively use the [PutPrincipalMapping](https://docs.aws.amazon.com/kendra/latest/APIReference/API_PutPrincipalMapping.html) API to upload user and group access information for user context filtering.

   1. **IAM role**—Choose an existing IAM role or create a new IAM role to access your repository credentials and index content.
**Note**  
IAM roles used for indexes cannot be used for data sources. If you are unsure if an existing role is used for an index or FAQ, choose **Create a new role** to avoid errors.

   1. Choose **Next**.

1. On the **Configure sync settings** page, enter the following information:

   1. **Select type of content**—Select the Slack entities or content types you want to crawl. You can choose from all channels, public channels, private channels, group messages, and private messages.

   1. **Select crawl start date**—Enter the date you want to start crawling your content.

   1. For **Additional configuration**—Choose to include bot and archived messages and use regular expression patterns to include or exclude certain content.
**Note**  
If you choose to include for both channel IDs and channel names, the Amazon Kendra Slack connector will prioritize channel IDs over channel names.  
If you've chosen to include certain private and group messages, the Amazon Kendra Slack connector will ignore all private and group messages and only crawl the private and group messages you specify.

   1. **Sync mode**—Choose how you want to update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option.
      + Full sync: Freshly index all content, replacing existing content each time your data source syncs with your index.
      + New, modified, deleted sync: Index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

   1. In **Sync run schedule**, for **Frequency**—Choose how often to sync your data source content and update your index.

   1. Choose **Next**.

1. On the **Set field mappings** page, enter the following information:

   1. **Default data source fields**—Select from the Amazon Kendra generated default data source fields you want to map to your index.

   1.  **Add field**—To add custom data source fields to create an index field name to map to and the field data type.

   1. Choose **Next**.

1. On the **Review and create** page, check that the information you have entered is correct and then select **Add data source**. You can also choose to edit your information from this page. Your data source will appear on the **Data sources** page after the data source has been added successfully.

------
#### [ API ]

**To connect Amazon Kendra to Slack**

You must specify a JSON of the [data source schema](https://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html#ds-schema-slack) using the [https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) API. You must provide the following information:
+ **Data source**—Specify the data source type as `SLACK` when you use the [https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) JSON schema. Also specify the data source as `TEMPLATE` when you call the [https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html](https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html) API.
+ **Slack workspace team ID**—The Slack team ID you copied from your Slack main page URL.
+ **Since date**—The date to start crawling your data from your Slack workspace team. The date must follow this format: yyyy-mm-dd.
+ **Sync mode**—Specify how Amazon Kendra should update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option. You can choose between:
  + `FORCED_FULL_CRAWL` to freshly index all content, replacing existing content each time your data source syncs with your index.
  + `FULL_CRAWL` to index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source’s mechanism for tracking content changes and index content that changed since the last sync.
  + `CHANGE_LOG` to index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source’s mechanism for tracking content changes and index content that changed since the last sync.
+ **Identity crawler**—Specify whether to turn on Amazon Kendra’s identity crawler. The identity crawler uses the access control list (ACL) information for your documents to filter search results based on the user or their group access to documents. If you have an ACL for your documents and choose to use your ACL, you can then also choose to turn on Amazon Kendra’s identity crawler to configure [user context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources) of search results. Otherwise, if identity crawler is turned off, all documents can be publicly searched. If you want to use access control for your documents and identity crawler is turned off, you can alternatively use the [PutPrincipalMapping](https://docs.aws.amazon.com/kendra/latest/APIReference/API_PutPrincipalMapping.html) API to upload user and group access information for user context filtering.
+ **Secret Amazon Resource Name (ARN)**—Provide the Amazon Resource Name (ARN) of an Secrets Manager secret that contains the authentication credentials for your Slack account. The secret is stored in a JSON structure with the following keys:

  ```
  {
      "slackToken": "token"
  }
  ```
+ **IAM role**—Specify `RoleArn` when you call `CreateDataSource` to provide an IAM role with permissions to access your Secrets Manager secret and to call the required public APIs for the Slack connector and Amazon Kendra. For more information, see [IAM roles for Slack data sources](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds).

You can also add the following optional features:
+  **Virtual Private Cloud (VPC)**—Specify `VpcConfiguration` when you call `CreateDataSource`. For more information, see [Configuring Amazon Kendra to use an Amazon VPC](vpc-configuration.md).
+ **Specific channels**—Filter by public or private channels, and specify certain channels by their ID.
+ **Types of channels and messages**—Whether Amazon Kendra should index your public and private channels, your group and direct messages, and your bot and archived messages. If you use a bot token as part of your Slack authentication credentials, you must add the bot token to the channel you want to index. You cannot index direct messages and group messages using a bot token.
+ **Look back**—You can choose to configure a `lookBack` parameter so that the Slack connector crawls updated or deleted content up to a specified number of hours before your last connector sync.
+  **Inclusion and exclusion filters**—Specify whether to include or exclude certain Slack content. If you use a bot token as part of your Slack authentication credentials, you must add the bot token to the channel you want to index. You cannot index direct messages and group messages using a bot token.
**Note**  
Most data sources use regular expression patterns, which are inclusion or exclusion patterns referred to as filters. If you specify an inclusion filter, only content that matches the inclusion filter is indexed. Any document that doesn’t match the inclusion filter isn’t indexed. If you specify an inclusion and exclusion filter, documents that match the exclusion filter are not indexed, even if they match the inclusion filter.
+  **Field mappings**—Choose to map your Slack data source fields to your Amazon Kendra index fields. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html).
**Note**  
The document body field or the document body equivalent for your documents is required in order for Amazon Kendra to search your documents. You must map your document body field name in your data source to the index field name `_document_body`. All other fields are optional.

For a list of other important JSON keys to configure, see [Slack template schema](https://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html#ds-schema-slack).

------

## Learn more
<a name="slack-learn-more"></a>

To learn more about integrating Amazon Kendra with your Slack data source, see:
+ [Unravel the knowledge in Slack workspaces with intelligent search using the Amazon Kendra Slack connector](https://aws.amazon.com/blogs/machine-learning/unravel-the-knowledge-in-slack-workspaces-with-intelligent-search-using-the-amazon-kendra-slack-connector/)

# Zendesk
<a name="data-source-zendesk"></a>

Zendesk is a customer relationship management system that helps businesses automate and enhance customer support interactions. You can use Amazon Kendra to index your Zendesk support tickets, ticket comments, ticket attachments, help center articles, article comments, article comment attachments, guide community topics, community posts, and community post comments.

You can filter by organization name if you want to index tickets that are only within a specific organization. You can also choose to set a crawl date for when you want to start crawling data from Zendesk.

You can connect Amazon Kendra to your Zendesk data source using the [Amazon Kendra console](https://console.aws.amazon.com/kendra/) and the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html) API.

For troubleshooting your Amazon Kendra Zendesk data source connector, see [Troubleshooting data sources](troubleshooting-data-sources.md).

**Topics**
+ [Supported features](#supported-features-zendesk)
+ [Prerequisites](#prerequisites-zendesk)
+ [Connection instructions](#data-source-procedure-zendesk)
+ [Learn more](#zendesk-learn-more)
+ [Notes](#zendesk-notes)

## Supported features
<a name="supported-features-zendesk"></a>

Amazon Kendra Zendesk data source connector supports the following features:
+ Field mappings
+ User access control
+ Inclusion/exclusion filters
+ Change log, full and incremental content syncs
+ Virtual private cloud (VPC)

## Prerequisites
<a name="prerequisites-zendesk"></a>

Before you can use Amazon Kendra to index your Zendesk data source, make these changes in your Zendesk and AWS accounts.

**In Zendesk, make sure you have:**
+ Created a Zendesk Suite (Professional/Enterprise) administrative account.
+ Noted your Zendesk host URL. For example, *https://\$1sub-domain\$1.zendesk.com/*.
**Note**  
(On-premise/server) Amazon Kendra checks if the endpoint information included in AWS Secrets Manager is the same the endpoint information specified in your data source configuration details. This helps protect against the [confused deputy problem](https://docs.aws.amazon.com/IAM/latest/UserGuide/confused-deputy.html), which is a security issue where a user doesn’t have permission to perform an action but uses Amazon Kendra as a proxy to access the configured secret and perform the action. If you later change your endpoint information, you must create a new secret to sync this information.
+ Set up OAuth 2.0 Authentication using the authorization code grant flow:

  1. In Admin Center, navigate to Apps and integrations > APIs > Zendesk API.

  1. Select the OAuth Clients tab and click "Add OAuth client".

  1. Configure the OAuth client details: Set Client Name and Description, Set Client Kind to "Confidential", Add appropriate Redirect URLs (e.g., https://localhost/callback for testing), Save and securely store the generated Client ID and Client Secret.

  1.  Ensure the OAuth client has the required "read" scope (or "read write" if you need write access). 

  1.  Generate an Access Token using the authorization code grant flow: 
     + In a browser, navigate to: `https://{subdomain}.zendesk.com/oauth/authorizations/new?response_type=code&client_id={your_client_id}&redirect_uri={your_redirect_uri}&scope=read`
     +  Authenticate and authorize the application when prompted. 
     +  After authorization, Zendesk redirects to the redirect\$1uri with a code parameter (e.g., https://localhost/callback?code=\$1authorization\$1code\$1). Copy the authorization code. 
     +  Exchange the authorization code for an access token by sending a POST request to Zendesk's token endpoint: 

       ```
       curl -X POST https://{subdomain}.zendesk.com/oauth/tokens \
         -H "Content-Type: application/x-www-form-urlencoded" \
         -d "grant_type=authorization_code&code={authorization_code}&client_id={your_client_id}&client_secret={your_client_secret}&redirect_uri={your_redirect_uri}&scope=read"
       ```
     +  Zendesk responds with a JSON object containing the access\$1token. Extract and securely store this access token. 

  1. Store the generated access token securely. This access token will be used for Kendra integration.
+ 
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).
+ **Optional:** Installed an SSL certificate to allow Amazon Kendra to connect.
+ Checked each document is unique in Zendesk and across other data sources you plan to use for the same index. Each data source that you want to use for an index must not contain the same document across the data sources. Document IDs are global to an index and must be unique per index.

**In your AWS account, make sure you have:**
+ [Created an Amazon Kendra index](https://docs.aws.amazon.com/kendra/latest/dg/create-index.html) and, if using the API, noted the index ID.
+ [Created an IAM role](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds) for your data source and, if using the API, noted the ARN of the IAM role.
**Note**  
If you change your authentication type and credentials, you must update your IAM role to access the correct AWS Secrets Manager secret ID.
+ Stored your Zendesk authentication credentials in an AWS Secrets Manager secret and, if using the API, noted the ARN of the secret.
**Note**  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do **not** recommend that you re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0 (where applicable).

If you don’t have an existing IAM role or secret, you can use the console to create a new IAM role and Secrets Manager secret when you connect your Zendesk data source to Amazon Kendra. If you are using the API, you must provide the ARN of an existing IAM role and Secrets Manager secret, and an index ID.

## Connection instructions
<a name="data-source-procedure-zendesk"></a>

To connect Amazon Kendra to your Zendesk data source, you must provide the necessary details of your Zendesk data source so that Amazon Kendra can access your data. If you have not yet configured Zendesk for Amazon Kendra, see [Prerequisites](#prerequisites-zendesk).

------
#### [ Console ]

**To connect Amazon Kendra to Zendesk** 

1. Sign in to the AWS Management Console and open the [Amazon Kendra console](https://console.aws.amazon.com/kendra/).

1. From the left navigation pane, choose **Indexes** and then choose the index you want to use from the list of indexes.
**Note**  
You can choose to configure or edit your **User access control** settings under **Index settings**. 

1. On the **Getting started** page, choose **Add data source**.

1. On the **Add data source** page, choose **Zendesk connector**, and then choose **Add connector**. If using version 2 (if applicable), choose **Zendesk connector** with the "V2.0" tag.

1. On the **Specify data source details** page, enter the following information:

   1. In **Name and description**, for **Data source name**—Enter a name for your data source. You can include hyphens but not spaces.

   1. (Optional)** Description**—Enter an optional description for your data source.

   1. In **Default language**—Choose a language to filter your documents for the index. Unless you specify otherwise, the language defaults to English. Language specified in the document metadata overrides the selected language.

   1. In **Tags**, for **Add new tag**—Include optional tags to search and filter your resources or track your AWS costs.

   1. Choose **Next**.

1. On the **Define access and security** page, enter the following information:

   1. **Zendesk URL**—Enter your Zendesk URL. For example, *https://\$1sub-domain\$1.zendesk.com/*.

   1. **Authorization**—Turn on or off access control list (ACL) information for your documents, if you have an ACL and want to use it for access control. The ACL specifies which documents that users and groups can access. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see [User context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources).

   1. **AWS Secrets Manager secret**—Choose an existing secret or create a new Secrets Manager secret to store your Zendesk authentication credentials. If you choose to create a new secret an AWS Secrets Manager secret window opens.

      1. Create a new secret with the following structure:

         ```
         {
                  "hostUrl": "https://yoursubdomain.zendesk.com/",
                  "accessToken": "your_access_token"
         }
         ```
**Note**  
For Kendra integration, the secret name should start with 'AmazonKendra-Zendesk-' followed by your chosen identifier (e.g., 'AmazonKendra-Zendesk-MyConnector').

      1. Save and add your secret.

   1. **Virtual Private Cloud (VPC)**—You can choose to use a VPC. If so, you must add **Subnets** and **VPC security groups**.

   1. **Identity crawler**—Specify whether to turn on Amazon Kendra’s identity crawler. The identity crawler uses the access control list (ACL) information for your documents to filter search results based on the user or their group access to documents. If you have an ACL for your documents and choose to use your ACL, you can then also choose to turn on Amazon Kendra’s identity crawler to configure [user context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#context-filter-user-incl-datasources) of search results. Otherwise, if identity crawler is turned off, all documents can be publicly searched. If you want to use access control for your documents and identity crawler is turned off, you can alternatively use the [PutPrincipalMapping](https://docs.aws.amazon.com/kendra/latest/APIReference/API_PutPrincipalMapping.html) API to upload user and group access information for user context filtering.

   1. **IAM role**—Choose an existing IAM role or create a new IAM role to access your repository credentials and index content.
**Note**  
IAM roles used for indexes cannot be used for data sources. If you are unsure if an existing role is used for an index or FAQ, choose **Create a new role** to avoid errors.

   1. Choose **Next**.

1. On the **Configure sync settings** page, enter the following information:

   1. **Select contents**—Select the types of content you want to crawl from tickets, to help center articles, community topics, and more.

   1. **Organization name**—Enter the Zendesk organization names to filter content.

   1. **Sync start date**—Enter the date from which you want to start crawling your content.

   1. **Regex patterns**—Add regular expression patterns to include or exclude certain files. You can add up to 100 patterns.

   1. **Sync mode**—Choose how you want to update your index when your data source content changes. When you sync your data source with Amazon Kendra for the first time, all content is crawled and indexed by default. You must run a full sync of your data if your initial sync failed, even if you don't choose full sync as your sync mode option.
      + Full sync: Freshly index all content, replacing existing content each time your data source syncs with your index.
      + New, modified sync: Index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.
      + New, modified, deleted sync: Index only new, modified, and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

   1. In **Sync run schedule** for **Frequency**—Choose how often to sync your data source content and update your index.

   1. Choose **Next**.

1. On the **Set field mappings** page, enter the following information:

   1. **Default data source fields**—Select from the Amazon Kendra generated default data source fields you want to map to your index.

   1.  **Add field**—To add custom data source fields to create an index field name to map to and the field data type.

   1. Choose **Next**.

1. On the **Review and create** page, check that the information you have entered is correct and then select **Add data source**. You can also choose to edit your information from this page. Your data source will appear on the **Data sources** page after the data source has been added successfully.

------
#### [ API ]

**To connect Amazon Kendra to Zendesk**

You must specify a JSON of the [data source schema](https://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html) using the [TemplateConfiguration](https://docs.aws.amazon.com/kendra/latest/APIReference/API_TemplateConfiguration.html) API. You must provide the following information:
+ **Data source**—Specify the data source type as `ZENDESK` when you use the [https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html](https://docs.aws.amazon.com/kendra/latest/dg/API_TemplateConfiguration.html) JSON schema. Also specify the data source as `TEMPLATE` when you call the [https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html](https://docs.aws.amazon.com/kendra/latest/dg/API_CreateDataSource.html) API.
+ **Host URL**—Provide your Zendesk host URL as part of the connection configuration or repository endpoint details. For example, * https://yoursubdomain.zendesk.com*.
+  **Change log**—Whether Amazon Kendra should use the Zendesk data source change log mechanism to determine if a document must be updated in the index.
**Note**  
Use the change log if you don’t want Amazon Kendra to scan all of the documents. If your change log is large, it might take Amazon Kendra less time to scan the documents in the Zendesk data source than to process the change log. If you are syncing your Zendesk data source with your index for the first time, all documents are scanned. 
+ **Secret Amazon Resource Name (ARN)**—Provide the Amazon Resource Name (ARN) of an Secrets Manager secret that contains the authentication credentials for your Zendesk account. The secret is stored in a JSON structure with the following keys:

  ```
  {
      "hostUrl": "https://yoursubdomain.zendesk.com",
      "clientId": "client ID",
      "clientSecret": "Zendesk client secret",
      "userName": "Zendesk user name",
      "password": "Zendesk password"
  }
  ```
+ **IAM role**—Specify `RoleArn` when you call `CreateDataSource` to provide an IAM role with permissions to access your Secrets Manager secret and to call the required public APIs for the Zendesk connector and Amazon Kendra. For more information, see [IAM roles for Zendesk data sources](https://docs.aws.amazon.com/kendra/latest/dg/iam-roles.html#iam-roles-ds).

You can also add the following optional features:
+  **Virtual Private Cloud (VPC)**—Specify `VpcConfiguration` when you call `CreateDataSource`. For more information, see [Configuring Amazon Kendra to use an Amazon VPC](vpc-configuration.md).
+  **Document/content types**—Specify whether to crawl:
  + Support tickets, ticket comments, and/or ticket comment attachments
  + Help center articles, article attachments, and article comments
  + Guide community topics, posts, or post comments
+  **Inclusion and exclusion filters**—Specify whether to include or exclude certain Slack content. If you use a bot token as part of your Slack authentication credentials, you must add the bot token to the channel you want to index. You cannot index direct messages and group messages using a bot token.
**Note**  
Most data sources use regular expression patterns, which are inclusion or exclusion patterns referred to as filters. If you specify an inclusion filter, only content that matches the inclusion filter is indexed. Any document that doesn’t match the inclusion filter isn’t indexed. If you specify an inclusion and exclusion filter, documents that match the exclusion filter are not indexed, even if they match the inclusion filter.
+  **User context filtering and access control**—Amazon Kendra crawls the access control list (ACL) for your documents, if you have an ACL for your documents. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see [User context filtering](https://docs.aws.amazon.com/kendra/latest/dg/user-context-filter.html#datasource-context-filter).
+  **Field mappings**—Choose to map your Zendesk data source fields to your Amazon Kendra index fields. For more information, see [Mapping data source fields](https://docs.aws.amazon.com/kendra/latest/dg/field-mapping.html).
**Note**  
The document body field or the document body equivalent for your documents is required in order for Amazon Kendra to search your documents. You must map your document body field name in your data source to the index field name `_document_body`. All other fields are optional.

For a list of other important JSON keys to configure, see [Zendesk template schema](https://docs.aws.amazon.com/kendra/latest/dg/ds-schemas.html#ds-schema-zendesk).

------

## Learn more
<a name="zendesk-learn-more"></a>

To learn more about integrating Amazon Kendra with your Zendesk data source, see:
+ [Discover insights from Zendesk with Amazon Kendra intelligent search](https://aws.amazon.com/blogs/machine-learning/discover-insights-from-zendesk-with-amazon-kendra-intelligent-search/)

## Notes
<a name="zendesk-notes"></a>
+ When Access Control Lists (ACLs) are enabled, the "Sync only new or modified content" option is not available due to Zendesk API limitations. We recommend using "Full sync" or "New, modified, or deleted content sync" modes instead, or disable ACLs if you need to use this sync mode.