

# Managing lexicons


Pronunciation lexicons enable you to customize the pronunciation of words. Amazon Polly provides API operations that you can use to store lexicons in an AWS region. Those lexicons are then specific to that particular region. You can use one or more of the lexicons from that region when synthesizing the text by using the `SynthesizeSpeech` operation. This applies the specified lexicon to the input text before the synthesis begins. For more information, see [SynthesizeSpeech](API_SynthesizeSpeech.md).

**Note**  
These lexicons must conform with the Pronunciation Lexicon Specification (PLS) W3C recommendation. For more information, see [Pronunciation Lexicon Specification (PLS) Version 1.0](https://www.w3.org/TR/pronunciation-lexicon/) on the W3C website. 

The following are examples of ways to use lexicons with speech synthesis engines:
+ Common words are sometimes stylized with numbers taking the place of letters, as with "g3t sm4rt" (get smart). Humans can read these words correctly. However, a Text-to-Speech (TTS) engine reads the text literally, pronouncing the name exactly as it is spelled. This is where you can leverage lexicons to customize the synthesized speech by using Amazon Polly. In this example, you can specify an alias (get smart) for the word "g3t sm4rt" in the lexicon. 
+ Your text might include an acronym, such as W3C. You can use a lexicon to define an alias for the word W3C so that it is read in the full, expanded form (World Wide Web Consortium).

Lexicons give you additional control over how Amazon Polly pronounces words uncommon to the selected language. For example, you can specify the pronunciation using a phonetic alphabet. For more information, see [Pronunciation Lexicon Specification (PLS) Version 1.0](https://www.w3.org/TR/pronunciation-lexicon/) on the W3C website.

**Topics**
+ [

# Using multiple lexicons
](lexicons-applying.md)
+ [

# Uploading a lexicon
](managing-lexicons-console-upload.md)
+ [

# Applying lexicons (Synthesizing Speech)
](managing-lexicons-console-synthesize-speech.md)
+ [

# Filtering the lexicon list on the console
](managing-lexicons-console-filter.md)
+ [

# Downloading lexicons on the console
](managing-lexicons-console-download.md)
+ [

# Deleting a lexicon
](managing-lexicons-console-delete.md)

# Using multiple lexicons


You can apply up to five lexicons to your text. If the same grapheme appears in more than one lexicon that you apply to your text, the order in which they are applied can make a difference in the resulting speech. For example, given the following text, "Hello, my name is Bob." and two lexemes in different lexicons that both use the grapheme `Bob`.

**LexA**

```
<lexeme>
   <grapheme>Bob</grapheme>
   <alias>Robert</alias>
</lexeme>
```

**LexB**

```
<lexeme>
   <grapheme>Bob</grapheme>
   <alias>Bobby</alias>
</lexeme>
```

If the lexicons are listed in the order LexA and then LexB, the synthesized speech will be "Hello, my name is Robert." If they are listed in the order LexB and then LexA, the synthesized speech is "Hello, my name is Bobby."

**Example – Applying LexA Before LexB**  

```
aws polly synthesize-speech \
--lexicon-names LexA LexB \
--output-format mp3 \
--text 'Hello, my name is Bob' \
--voice-id Justin \
bobAB.mp3
```
**Speech output:** "Hello, my name is Robert."

**Example – Applying LexB before LexA**  

```
aws polly synthesize-speech \
--lexicon-names LexB LexA \
--output-format mp3 \
--text 'Hello, my name is Bob' \
--voice-id Justin \
bobBA.mp3
```
**Speech output:** "Hello, my name is Bobby."

For information about applying lexicons using the Amazon Polly console, see [Applying lexicons (Synthesizing Speech)](managing-lexicons-console-synthesize-speech.md).

# Uploading a lexicon


The lexicons you use must conform to the Pronunciation Lexicon Specification (PLS) W3C recommendation. For more information, see [Pronunciation Lexicon Specification (PLS) Version 1.0](https://www.w3.org/TR/pronunciation-lexicon/#S4.7) on the W3C website.

------
#### [ Console - Lexicons tab ]

To use a pronunciation lexicon, you must first upload it. There are two locations on the console from which you can upload a lexicon, the **Text-to-Speech** tab and the **Lexicons** tab.

The following processes describe how to add lexicons that you can use to customize how words and phrases uncommon to the chosen language are pronounced. <a name="upload-lexicon-lexicons-tab"></a>

**To add a lexicon from the Lexicons tab**

1. Sign in to the AWS Management Console and open the Amazon Polly console at [https://console.aws.amazon.com/polly/](https://console.aws.amazon.com/polly/).

1. Choose the **Lexicons** tab.

1. Choose **Upload lexicon**.

1. Provide a name for the lexicon and then use **Choose a lexicon file** to find the lexicon to upload. You can only upload PLS files with .pls or .xml extensions.

1. Choose **Upload lexicon**. If a lexicon by the same name (whether a .pls or .xml file) already exists, uploading the lexicon overwrites the existing lexicon.

------
#### [ Console - TTS tab ]<a name="upload-lexicon-tts-tab"></a>

**To add a lexicon from the text-to-Speech tab**

1. Sign in to the AWS Management Console and open the Amazon Polly console at [https://console.aws.amazon.com/polly/](https://console.aws.amazon.com/polly/).

1. Choose the **Text-to-Speech** tab.

1. Expand **Additional settings**, turn on **Customize pronunciation,** and then choose **Upload lexicon**.

1. Provide a name for the lexicon and then use **Choose a lexicon file** to find the lexicon to upload. You can only use PLS files with .pls or .xml extensions. 

1. Choose **Upload lexicon**. If a lexicon with the same name (whether a .pls or .xml file) already exists, uploading the lexicon overwrites the existing lexicon.

------
#### [ AWS CLI - one lexeme ]

With Amazon Polly, you can use [PutLexicon](API_PutLexicon.md) to store pronunciation lexicons in a specific AWS Region for your account. Then, you can specify one or more of these stored lexicons in your [SynthesizeSpeech](API_SynthesizeSpeech.md) request that you want to apply before the service starts synthesizing the text. For more information, see [Managing lexicons](managing-lexicons.md).

Consider the following W3C PLS-compliant lexicon. 

```
<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0" 
      xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
      xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon 
        http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd"
      alphabet="ipa" 
      xml:lang="en-US">
  <lexeme>
    <grapheme>W3C</grapheme>
    <alias>World Wide Web Consortium</alias>
  </lexeme>
</lexicon>
```

Note the following:
+ The two attributes specified in the `<lexicon>` element:
  + The `xml:lang` attribute specifies the language code, `en-US`, to which the lexicon applies. Amazon Polly can use this example lexicon if the voice you specify in the `SynthesizeSpeech` call has the same language code (en-US). 
**Note**  
You can use the `DescribeVoices` operation to find the language code associated with a voice.

     
  + The `alphabet` attribute specifies `IPA`, which means that the International Phonetic Alphabet (IPA) alphabet is used for pronunciations. IPA is one of the alphabets for writing pronunciations. Amazon Polly also supports the Extended Speech Assessment Methods Phonetic Alphabet (X-SAMPA).

     
+ The `<lexeme>` element describes the mapping between `<grapheme>` (that is, a textual representation of the word) and `<alias>`. 

To test this lexicon, do the following:

1. Save the lexicon as `example.pls`.

1. Run the `put-lexicon` AWS CLI command to store the lexicon (with the name `w3c`), in the us-east-2 region.

   ```
   aws polly put-lexicon \
   --name w3c \
   --content file://example.pls
   ```

1. Run the `synthesize-speech` command to synthesize sample text to an audio stream (`speech.mp3`), and specify the optional `lexicon-name` parameter. 

   ```
   aws polly synthesize-speech \
   --text 'W3C is a Consortium' \
   --voice-id Joanna \
   --output-format mp3 \
   --lexicon-names="w3c" \
   speech.mp3
   ```

1. Play the resulting `speech.mp3`, and notice that the word W3C in the text is replaced by World Wide Web Consortium.

The preceding example lexicon uses an alias. The IPA alphabet mentioned in the lexicon is not used. The following lexicon specifies a phonetic pronunciation using the `<phoneme>` element with the IPA alphabet.

```
<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0" 
      xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
      xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon 
        http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd"
      alphabet="ipa" 
      xml:lang="en-US">
  <lexeme>
    <grapheme>pecan</grapheme>
    <phoneme>pɪˈkɑːn</phoneme>
  </lexeme>
</lexicon>
```

Follow the same steps to test this lexicon. Make sure you specify input text that has the word "pecan" (for example, "Pecan pie is delicious").

See the following resources for additional code samples for the PutLexicon API operation:
+ Java Sample: [PutLexicon](PutLexiconSample.md)
+ Python (Boto3) Sample: [PutLexicon](PutLexiconSamplePython.md)

------
#### [ AWS CLI - multiple lexemes ]

With Amazon Polly, you can use [PutLexicon](API_PutLexicon.md) to store pronunciation lexicons in a specific AWS Region for your account. Then, you can specify one or more of these stored lexicons in your [SynthesizeSpeech](API_SynthesizeSpeech.md) request that you want to apply before the service starts synthesizing the text. For more information, see [Managing lexicons](managing-lexicons.md).

In this example, the lexeme that you specify in the lexicon applies exclusively to the input text for the synthesis. Consider the following lexicon: 

```
<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0"
      xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon
        http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd"
      alphabet="ipa" xml:lang="en-US">

  <lexeme> 
    <grapheme>W3C</grapheme>
    <alias>World Wide Web Consortium</alias>
  </lexeme>
  <lexeme> 
    <grapheme>W3C</grapheme>
    <alias>WWW Consortium</alias>
  </lexeme>
  <lexeme> 
    <grapheme>Consortium</grapheme>
    <alias>Community</alias>
  </lexeme>
</lexicon>
```

The lexicon specifies three lexemes, two of which define an alias for the grapheme W3C as follows:
+ The first `<lexeme`> element defines an alias (World Wide Web Consortium).
+ The second `<lexeme>` defines an alternative alias (WWW Consortium). 

Amazon Polly uses the first replacement for any given grapheme in a lexicon.

The third `<lexeme>` defines a replacement (Community) for the word Consortium.

First, let's test this lexicon. Suppose you want to synthesize the following sample text to an audio file (`speech.mp3`), and you specify the lexicon in a call to `SynthesizeSpeech`.

```
The W3C is a Consortium
```

`SynthesizeSpeech` first applies the lexicon as follows: 
+ As per the first lexeme, the word W3C is revised as World Wide Web Consortium. The revised text appears as follows:

  ```
  The World Wide Web Consortium is a Consortium
  ```
+ The alias defined in the third lexeme applies only to the word Consortium that was part of the original text, resulting in the following text:

  ```
  The World Wide Web Consortium is a Community.
  ```

You can test this using the AWS CLI as follows:

1. Save the lexicon as `example.pls`.

1. Run the `put-lexicon` command to store the lexicon with name w3c in the us-east-2 region.

   ```
   aws polly put-lexicon \
   --name w3c \
   --content file://example.pls
   ```

1. Run the `list-lexicons` command to verify that the w3c lexicon is in the list of lexicons returned.

   ```
   aws polly list-lexicons
   ```

1. Run the `synthesize-speech` command to synthesize sample text to an audio file (`speech.mp3`), and specify the optional `lexicon-name` parameter. 

   ```
   aws polly synthesize-speech \
   --text 'W3C is a Consortium' \
   --voice-id Joanna \
   --output-format mp3 \
   --lexicon-names="w3c" \
   speech.mp3
   ```

1. Play the resulting `speech.mp3` file to verify that the synthesized speech reflects the text changes.

See the following resources for additional code samples for the PutLexicon API operation:
+ Java Sample: [PutLexicon](PutLexiconSample.md)
+ Python (Boto3) Sample: [PutLexicon](PutLexiconSamplePython.md)

------

# Applying lexicons (Synthesizing Speech)


The lexicons you use must conform to the Pronunciation Lexicon Specification (PLS) W3C recommendation. For more information, see [Pronunciation Lexicon Specification (PLS) Version 1.0](https://www.w3.org/TR/pronunciation-lexicon/#S4.7) on the W3C website.

------
#### [ Console ]

The following procedure demonstrates how to apply a lexicon to your input text by applying the `W3c.pls` lexicon to substitute "World Wide Web Consortium" for "W3C". If you apply multiple lexicons to your text they are applied in a top-down order with the first match taking precedence over later matches. A lexicon is applied to the text only if the language specified in the lexicon is the same as the language chosen.

You can apply a lexicon to plain text or SSML input.

**Example – Applying the W3C.pls Lexicon**  
To create the lexicon you'll need for this exercise, see [Uploading a lexicon](managing-lexicons-console-upload.md). Use a plain text editor to create the W3C.pls lexicon shown at the top of the topic. Remember where you save this file.  

**To apply the W3C.pls lexicon to your input**

In this example we introduce a lexicon to substitute "World Wide Web Consortium" for "W3C". Compare the results of this exercise with that of [Using SSML on the console](ssml-to-speech-console.md) for both US English and another language.

1. Sign in to the AWS Management Console and open the Amazon Polly console at [https://console.aws.amazon.com/polly/](https://console.aws.amazon.com/polly/).

1. Do one of the following:
   + Turn off **SSML** and then type or paste this text into the text input box.

     ```
     He was caught up in the game. 
     In the middle of the 10/3/2014 W3C meeting 
     he shouted, "Score!" quite loudly.
     ```
   + Turn on **SSML** and then type or paste this text into the text input box.

     ```
     <speak>He wasn't paying attention.<break time="1s"/>
     In the middle of the 10/3/2014 W3C meeting 
     he shouted, "Score!" quite loudly.</speak>
     ```

1. From the **Language** list, choose **English, US**, then choose the voice you want to use for this text.

1. Expand **Additional settings** and turn on **Customize pronunciation.**

1. From the list of lexicons, choose `W3C (English, US)`.

   If the `W3C (English, US)` lexicon is not listed, choose **Upload lexicon** and upload it, then choose it from the list. To create this lexicon, see [Uploading a lexicon](managing-lexicons-console-upload.md).

1. To listen to the speech immediately, choose **Listen**.

1. To save the speech to a file,

   1. Choose **Download**.

   1. To change to a different file format, turn on **Speech file format settings**, choose the file format you want, and then choose **Download**.
Repeat the previous steps, but choose a different language and notice the difference in the output.

------
#### [ AWS CLI ]

In a call to `SynthesizeSpeech`, you can specify multiple lexicons. In this case, the first lexicon specified (in order from left to right) overrides any preceding lexicons.

Consider the following two lexicons. Note that each lexicon describes different aliases for the same grapheme W3C. 
+ Lexicon 1: `w3c.pls`

  ```
  <?xml version="1.0" encoding="UTF-8"?>
  <lexicon version="1.0" 
        xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
        xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon 
          http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd"
        alphabet="ipa" xml:lang="en-US">
    <lexeme>
      <grapheme>W3C</grapheme>
      <alias>World Wide Web Consortium</alias>
    </lexeme>
  </lexicon>
  ```
+ Lexicon 2: `w3cAlternate.pls`

  ```
  <?xml version="1.0" encoding="UTF-8"?>
  <lexicon version="1.0"
        xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon
          http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd"
        alphabet="ipa" xml:lang="en-US">
  
    <lexeme> 
      <grapheme>W3C</grapheme>
      <alias>WWW Consortium</alias>
    </lexeme>
  </lexicon>
  ```

  

Suppose you store these lexicons as `w3c` and `w3cAlternate` respectively. If you specify lexicons in order (`w3c` followed by `w3cAlternate`) in a `SynthesizeSpeech` call, the alias for W3C defined in the first lexicon has precedence over the second. To test the lexicons, do the following:

1. Save the lexicons locally in files called `w3c.pls` and `w3cAlternate.pls`.

1. Upload these lexicons using the `put-lexicon` AWS CLI command.
   + Upload the `w3c.pls` lexicon and store it as `w3c`.

     ```
     aws polly put-lexicon \
     --name w3c \
     --content file://w3c.pls
     ```
   + Upload the` w3cAlternate.pls` lexicon on the service as `w3cAlternate`.

     ```
     aws polly put-lexicon \
     --name w3cAlternate \
     --content file://w3cAlternate.pls
     ```

1. Run the `synthesize-speech` command to synthesize sample text to an audio stream (`speech.mp3`), and specify both lexicons using the `lexicon-name` parameter. 

   ```
   aws polly synthesize-speech \
   --text 'PLS is a W3C recommendation' \
   --voice-id Joanna \
   --output-format mp3 \
   --lexicon-names '["w3c","w3cAlternative"]' \
   speech.mp3
   ```

1. Test the resulting `speech.mp3`. It should read as follows:

   ```
   PLS is a World Wide Web Consortium recommendation
   ```

------

# Filtering the lexicon list on the console


The following procedure describes how to filter the lexicons list so that only lexicons of a chosen language are displayed.

------
#### [ Console ]<a name="filter-and-choose-lexicons"></a>

**To filter the lexicons listed by language**

1. Sign in to the AWS Management Console and open the Amazon Polly console at [https://console.aws.amazon.com/polly/](https://console.aws.amazon.com/polly/).

1. Choose the **Lexicons** tab.

1. Choose **Any language**.

1. From the list of languages, choose the language you want to filter on.

   The list displays only the lexicons for the chosen language.

------
#### [ AWS CLI ]

Amazon Polly provides the [ListLexicons](API_ListLexicons.md) API operation that you can use to get the list of pronunciation lexicons in your account in a specific AWS Region. The following AWS CLI call lists the lexicons in your account in the us-east-2 region.



```
aws polly list-lexicons
```

The following is an example response, showing two lexicons named `w3c` and `tomato`. For each lexicon, the response returns metadata such as the language code to which the lexicon applies, the number of lexemes defined in the lexicon, the size in bytes, and so on. The language code describes a language and locale to which the lexemes defined in the lexicon apply. 

```
{
    "Lexicons": [
        {
            "Attributes": {
                "LanguageCode": "en-US",
                "LastModified": 1474222543.989,
                "Alphabet": "ipa",
                "LexemesCount": 1,
                "LexiconArn": "arn:aws:polly:aws-region:account-id:lexicon/w3c",
                "Size": 495
            },
            "Name": "w3c"
        },
        {
            "Attributes": {
                "LanguageCode": "en-US",
                "LastModified": 1473099290.858,
                "Alphabet": "ipa",
                "LexemesCount": 1,
                "LexiconArn": "arn:aws:polly:aws-region:account-id:lexicon/tomato",
                "Size": 645
            },
            "Name": "tomato"
        }
    ]
}
```

The following resources contain additional information for the ListLexicons operation:
+ Java Sample: [ListLexicons](ListLexiconsSample.md)
+ Python (Boto3) Sample: [ListLexicon](ListLexiconSamplePython.md)

------

# Downloading lexicons on the console


The following process describes how to download one or more lexicons. You can add, remove, or modify lexicon entries in the file and then upload it again to keep your lexicon up-to-date. 

------
#### [ Console ]<a name="download-lexicon"></a>

**To download one or more lexicons**

1. Sign in to the AWS Management Console and open the Amazon Polly console at [https://console.aws.amazon.com/polly/](https://console.aws.amazon.com/polly/).

1. Choose the **Lexicons** tab.

1. Choose the lexicon or lexicons you want to download.

   1. To download a single lexicon, choose its name from the list.

   1. To download multiple lexicons as a single compressed archive file, select the check box next to each entry in the list that you want to download.

1. Choose **Download**.

1. Open the folder where you want to download the lexicon.

1. Choose **Save**.

------
#### [ AWS CLI ]

Amazon Polly provides the [GetLexicon](API_GetLexicon.md) API operation to retrieve the content of a pronunciation lexicon you stored in your account in a specific region. 

The following `get-lexicon` AWS CLI command retrieves the content of the `example` lexicon.

```
aws polly get-lexicon \
--name example
```

If you don't already have a lexicon stored in your account, you can use the `PutLexicon` operation to store one. For more information, see [Uploading a lexicon](managing-lexicons-console-upload.md).

The following is a sample response. In addition to the lexicon content, the response returns the metadata, such as the language code to which the lexicon applies, number of lexemes defined in the lexicon, the Amazon Resource Name (ARN) of the resource, and the size of the lexicon in bytes. The `LastModified` value is a Unix timestamp.

```
{
    "Lexicon": {
        "Content": "lexicon content in plain text PLS format",
        "Name": "example"
    },
    "LexiconAttributes": {
        "LanguageCode": "en-US",
        "LastModified": 1474222543.989,
        "Alphabet": "ipa",
        "LexemesCount": 1,
        "LexiconArn": "arn:aws:polly:us-east-2:account-id:lexicon/example",
        "Size": 495
    }
}
```

The following resources contain additional code samples for the GetLexicon operation:
+ Java Sample: [GetLexicon](GetLexiconSample.md)
+ Python (Boto3) Sample: [GetLexicon](GetLexiconSamplePython.md)

------

# Deleting a lexicon


The following process describes how to delete a lexicon. After deleting the lexicon, you must add it back before you can use it again. You can delete one or more lexicons at the same time by selecting the check boxes next to individual lexicons.

------
#### [ Console ]<a name="delete-lexicon"></a>

**To delete a lexicon**

1. Sign in to the AWS Management Console and open the Amazon Polly console at [https://console.aws.amazon.com/polly/](https://console.aws.amazon.com/polly/).

1. Choose the **Lexicons** tab.

1. Choose one or more lexicons that you want to delete from the list.

1. Choose **Delete**.

1. Enter confirmation text and then choose **Delete** to remove the lexicon from the Region or **Cancel** to keep it.

------
#### [ AWS CLI ]

Amazon Polly provides the [DeleteLexicon](API_DeleteLexicon.md) API operation to delete a pronunciation lexicon from a specific AWS Region in your account. The following AWS CLI deletes the specified lexicon. 

The following AWS CLI example is formatted for Unix, Linux, and macOS. For Windows, replace the backslash (\$1) Unix continuation character at the end of each line with a caret (^) and use full quotation marks (") around the input text with single quotes (') for interior tags.

```
aws polly delete-lexicon \
--name example
```

The following resources contain additional information for the DeleteLexicon operation:
+ Java Sample: [DeleteLexicon](DeleteLexiconSample.md)
+ Python (Boto3) Sample: [DeleteLexicon](DeleteLexiconPython.md)

------