# Knowledge bases
<a name="knowledge-base-integrations"></a>

A knowledge base is an organized, indexed collection of documents or content from data sources optimized for generative AI-powered retrieval and question answering. Whether your team stores documentation in Confluence, collaborates through SharePoint, or manages files in cloud storage, you can bring all this information into one unified search experience by creating knowledge bases.

 The built-in integrations can be set up with just a few clicks to sync your data in Quick and make it effortless to tap into your organization's knowledge across Google Drive, OneDrive, Confluence, SharePoint, S3, and Web Crawler. Whether your team stores documentation in Confluence, collaborates through SharePoint, or manages files in cloud storage, you can bring all this information into one unified search experience with the help of knowledge bases. 

## How knowledge bases work
<a name="how-knowledge-bases-work"></a>

Knowledge base is an indexed collection of documents or content from data sources such as Google Drive, optimized for generative AI-powered retrieval and question answering. Multiple knowledge bases can be created from the same source, and all can reside within a shared Quick Index. For example, if you sync two folders from Google Drive and create two knowledge bases — one for “Policy Documents” to answer queries such as *“What’s our refund policy”* and one for “Customer feedback” to answer queries such as *“What are the common customer complaints”* — both can be part of the same index. Quick distinguishes between them using the knowledge base id, so queries can be filtered to retrieve only the relevant documents from the desired knowledge base. This allows users to organize, secure, and retrieve information relevant to different domains or use cases, even though the underlying data is indexed together.

Your knowledge bases can be used individually or shared with team members through Amazon Quick spaces. Our coarse-grained access control enables security at the knowledge base level, ensuring that users only receive information from knowledge bases they're authorized to access.

### Creation process
<a name="knowledge-base-creation-process"></a>

You can create knowledge bases while setting up a new data access integration and use existing integrations to create additional knowledge bases:

1. **Data access integration setup** - Connect to your external data source

1. **Content selection** - Choose which content to include through filters and scope settings

1. **Indexing** - Amazon Quick processes and indexes the selected content

1. **Availability** - The knowledge base becomes available for use in spaces and by AI agents

### Capabilities
<a name="knowledge-base-capabilities"></a>

Each knowledge base provides the following capabilities:
+ **Content indexing** - Processes text, documents, and structured data from external sources
+ **Semantic search** - Enables AI-powered search across indexed content
+ **Automatic synchronization** - Keeps content up-to-date with configurable sync schedules
+ **Coarse-grained access control** - Ensures that users only receive information from knowledge bases they're authorized to access.
+ **Multi-space usage** - Can be used across multiple spaces and by different AI agents

## General workflow
<a name="general-workflow"></a>

The typical workflow for working with knowledge bases follows these steps:

1. **Set up data source integration** - Connect to your external application (such as SharePoint, Google Drive, or Confluence) with appropriate authentication. For more information, see [Integration-specific guides](integration-guides.md).

1. **Create a knowledge base** - You can create a knowledge base while configuring your new integration. Configure your content filters by setting up include filters, file type restrictions, and folder selections to focus on relevant content.

1. **Set sync schedule** - Data refresh frequency is set to daily by default. You can edit the sync frequency to configure how often the knowledge base should be updated with new content from the source.

1. **Monitor and manage** - Review sync status, manage access permissions.

## Common configuration settings
<a name="common-configuration-settings"></a>

Knowledge bases share common configuration patterns across different data source integrations. Understanding these settings helps you optimize content indexing and manage sync behavior effectively.

**Note**  
While these configuration options are available across most integrations, specific settings and available options may vary depending on your chosen data source integration.

### File size and content limits
<a name="file-size-and-content-limits"></a>

Configure file size limits to optimize processing performance and manage storage costs. The specific limits vary by content type and are displayed in the console when you configure your knowledge base.

**Standard text documents**  
Applies to documents like PDFs, Word files, and text files. File size limit is 500 MB.

**Video files**  
Available when video processing is enabled. Supported formats include `.mp4`, `.mov`, `.m4v`. File size limit is 10 GB (10240 MB). Quick Index supports up to **10 video files per GB of storage**. If your use case requires higher video volumes, please open a ticket with AWS support to extend this limit.

**Audio files**  
Available when audio processing is enabled. Supported formats include `.mp3`,` .wav`,` .m4a`, `.flac`, and` .ogg`. Limit is 2 GB (2048 MB) for audio files.  
Files with extracted text that exceeds the 30 MB system limit are not indexed, regardless of the original file size. The maximum amount of text that can be extracted from a single document is 30 MB.

**Images**  
Quick Index applies the following limits for images:  
+ **Per-document limit**: 500 images per document
+ **Per-GB limit**: 10K images per GB of index storage
+ **Per-index limit**: 2M images per index
If your use case requires higher image volumes, please open a ticket with AWS support to extend these limits.

### Sync schedule and safeguards
<a name="sync-schedule-and-safeguards"></a>

Configure how often your knowledge base updates and protect against unintended content deletion:

#### Sync frequency
<a name="sync-frequency"></a>

Data refresh frequency is set to daily by default. You can edit the sync frequency to configure how often the knowledge base should update with new content from the source

#### Document deletion safeguard
<a name="document-deletion-safeguard"></a>

Protect your indexed content from accidental mass deletion by setting a maximum deletion percentage threshold. If a sync job would delete more documents than your threshold allows, the deletion phase is skipped, preserving your existing indexed content.

This safeguard protects against temporary network issues, permission changes, or source system problems that might make content temporarily unavailable.

# Best practices for managing ACLs in knowledge bases
<a name="acl-best-practices-kb"></a>

When using knowledge bases with access control lists (ACLs), you're responsible for keeping user identities and permissions accurate. This ensures the right people can access the right documents. Quick automatically syncs identity and document-level ACL changes every 24 hours by default. Any updates to users or permissions will take up to a day to appear in the system unless you've configured a different refresh schedule for your knowledge bases.

For more information about configuring ACLs for a specific data source, see [Amazon S3 integration](s3-integration.md).

**Note**  
Quick treats all email addresses as case-insensitive. `JohnDoe@example.com`, `johndoe@example.com`, and `JOHNDOE@example.com` are all considered the same user.

## Important user management scenarios
<a name="acl-user-management-scenarios"></a>

**Understanding email binding**

Email addresses are bound to Quick users dynamically when users initiate chat interactions. This binding follows a first-come-first-serve approach. The first user to chat with a given email address establishes the binding for that identity within the namespace.

**When an employee leaves your organization**

When an employee leaves, clean up their access promptly:

1. Update the ACL configuration files to remove references to their email address. For example, in Amazon S3, update the global ACL file or metadata files.

1. Refresh the knowledge bases to apply the changes.

This prevents potential security issues if the email is later reassigned to someone else.

**When an email address is reassigned to a new employee**
+ ACL-aware knowledge base access is automatically locked for the reassigned email address to protect data security.
+ Contact Quick support to clean up the previous user's access before the new employee can access documents associated with that email.

## Limitations
<a name="acl-limitations"></a>

When configuring document-level ACLs for your knowledge bases, be aware of these limitations:
+ **Document-level ACL configuration is permanent** – You cannot enable ACLs on knowledge bases created without ACL support. You also cannot disable ACLs once enabled. To change ACL configuration, create a new knowledge base with your desired setting from the start.
+ **Shared email addresses within a namespace** – If multiple Quick users share the same email address within a namespace, the system denies access to everyone using that shared email. This safeguard prevents accidentally granting document access to the wrong person.
+ **ACL resolution scope** – All ACLs are resolved within the Quick namespace of the knowledge base creator. This applies whether ACLs are specified by email address or group name. Quick looks up identities in the creator's organizational context to ensure consistent identity resolution.
+ **Email address recycling timing** – If your organization reassigns an email address from one employee to another, there's an important timing consideration. If the previous employee never used Quick for chat or AI interactions, and the email is reassigned before the next ACL refresh, the new employee may temporarily access documents intended for the previous employee.

  To avoid this, complete the following steps in order:

  1. Update your ACLs (if applicable, such as in Amazon S3) to remove the old user and add the new user.

  1. Manually refresh your knowledge base, or wait for the automatic daily refresh.

  1. Assign the email address to the new employee.

  This ensures access permissions are properly synchronized before the new user begins using Quick.
+ **Research compatibility** – Knowledge bases with document-level ACLs enabled are not currently compatible with Quick Research. If you need to use documents from an ACL-enabled knowledge base for research purposes, create a separate knowledge base without ACLs for those documents.

# Troubleshooting knowledge bases
<a name="troubleshooting-knowledge-bases"></a>

When you encounter issues with your Quick knowledge base, you can use this troubleshooting guide to identify and resolve common problems. Knowledge base issues typically involve document synchronization, refresh job failures, or access permissions.

## Documents don't appear in your knowledge base
<a name="documents-not-appearing"></a>

When documents you expect to see don't appear in your knowledge base, several factors might cause this issue.

**Common causes:**
+ **Sync in progress** – Documents might still be processing. Check the refresh status to confirm the refresh is complete.
+ **Unsupported file format** – Verify your documents are in supported formats: Word, Excel, PowerPoint, PDF, CSV, TXT, RTF, JSON, XML, HTML
+ **File size too large** – Each file must be less than 50 MB.
+ **Insufficient access permissions** – Confirm the knowledge base has proper permissions to access the document source.
+ **Document filtering** – Check if filters or exclusion rules prevent certain documents from being indexed.

**To troubleshoot:**

1. Review the refresh history for error messages related to specific documents that failed to sync.

1. Verify your document formats and file sizes meet requirements.

1. Check your access permissions and connection settings.

## Refresh job fails
<a name="refresh-job-fails"></a>

A refresh job typically fails when there's a configuration error in the knowledge base or data source connection.

**Common causes:**
+ **Permission issues** – The integration lacks sufficient permissions to access the data source.
+ **Configuration errors** – Incorrect URLs or data source connection settings.
+ **Resource limitations** – Rate limiting from the source system.

**To resolve:**

1. Check the refresh history details for specific error messages.

1. Verify all connection settings and permissions are correctly configured.

1. Take the recommended action based on the error message.

## Refresh job completes with issues
<a name="refresh-job-completes-with-issues"></a>

When a refresh job completes with issues, the job processed successfully but encountered problems with some documents.

**What this means:**
+ **Partial success** – Some documents synced successfully while others failed.
+ **Document-level errors** – Individual files might have formatting issues, corruption, or access problems.
+ **Metadata issues** – Problems with document metadata or associated information.
+ **Size or format violations** – Some files might exceed size limits or be in unsupported formats.

**To resolve:**

1. Review the detailed refresh reports to identify which documents encountered issues.

1. Address the individual document problems.

1. Run another refresh after resolving the issues.

## Refresh job succeeds but no documents appear
<a name="refresh-job-succeeds-no-documents"></a>

If a refresh job shows as successful but no documents appear in your knowledge base, check these potential causes.

**Common causes:**
+ **Empty source** – The configured data source location contains no documents.
+ **Incorrect path configuration** – The source path or connection settings don't point to the correct location.
+ **Document filters** – Inclusion or exclusion criteria might filter out all documents.
+ **Read permissions missing** – The job connected successfully but lacked permissions to read the actual documents.

**To resolve:**

1. Verify your data source configuration points to the correct location.

1. Confirm documents are present in the specified location.

1. Check that appropriate access permissions are configured.

1. Review any document filters that might exclude content.

## File format issues during refresh
<a name="file-format-issues"></a>

Quick knowledge bases support specific file formats. Files must meet format, size, and character limit requirements.

**Requirements:**
+ **Supported formats:** Word, Excel, PowerPoint, PDF, CSV, TXT, RTF, JSON, XML, HTML
+ **File size limit:** 50 MB per file
+ **File condition:** Not corrupted or password-protected

**To resolve format issues:**

1. Verify your files meet the format and size requirements.

1. Convert unsupported formats to supported ones.

1. Remove password protection from files.

1. Check that files aren't corrupted.

## Access denied errors
<a name="access-denied-errors"></a>

Access denied errors typically occur due to authentication or authorization issues.

**Common causes:**
+ **Invalid credentials** – Authentication tokens or passwords might have expired.
+ **Insufficient permissions** – The account used in the integration lacks read access to the data source.
+ **Network restrictions** – Firewall or security policies block access.
+ **SSL/TLS issues** – Certificate problems with secure connections.

**To resolve:**

1. **Verify authentication credentials** – Confirm that authentication credentials are current and valid. Edit the integration to re-authenticate and generate a new token.

1. **For web crawler data sources** – Verify that secure connections are properly configured and SSL certificates are properly configured and trusted.

1. **Contact your system administrator** – If you continue experiencing access issues, contact your system administrator. They might need to adjust permissions or security settings.