# Generating speech from SSML documents You can use Amazon Polly to generate speech from either plain text or from documents marked up with Speech Synthesis Markup Language (SSML). Using SSML-enhanced text gives you additional control over how Amazon Polly generates speech from the text you provide. With SSML tags, you can customize and control aspects of speech such as pronunciation, volume, and speech rate. In the AWS Management Console, the SSML-enhanced text that you want to convert to audio is entered on the SSML tab of the Text-to-Speech page. Although text entered in plain text relies on default settings for the language and voice you've chosen, text enhanced with SSML tells Amazon Polly not only what you want to say, but how you want to say it. Except for the added SSML tags, Amazon Polly synthesizes SSML-enhanced text in the same way as it synthesizes plain text. See [Synthesizing speech with Amazon Polly example](synthesize-example.md) for more information. When using SSML, you enclose the entire text in a `` tag to let Amazon Polly know that you're using SSML. For example: ``` Hi! My name is Joanna. I will read any text you type here. ``` You then use specific SSML tags on the text inside the `` tags to customize the way you want the text to sound. You can add a pause, change the pace of the speech, lower or raise the volume of the voice, or add many other customizations so that the text sounds right for you. For a full list of the SSML tags that you can use, see [Supported SSML tags](supportedtags.md). For example, you can include a long pause within your text, or change the speech rate or pitch. Other options include: + emphasizing specific words or phrases + using phonetic pronunciation + including breathing sounds + whispering + using the Newscaster speaking style. For complete details on the SSML tags supported by Amazon Polly and how to use them, see [Supported SSML tags](supportedtags.md) When using SSML, there are several reserved characters that require special treatment. This is because SSML uses these characters as part of its code. In order to use them, you use a specific entity to *escape* them. For more information, see [Reserved characters in SSML](escapees.md) Amazon Polly provides these types of control with a subset of the SSML markup tags that are defined by [Speech Synthesis Markup Language (SSML) Version 1.1, W3C Recommendation](https://www.w3.org/TR/2010/REC-speech-synthesis11-20100907/). You can use SSML within the Amazon Polly console or by using the AWS CLI. The following topics show you how you can use SSML to generate speech and control the output so that it precisely fits your needs. **Topics** + [Reserved characters in SSML](escapees.md) + [Using SSML on the console](ssml-to-speech-console.md) + [Using SSML with the Synthesize-Speech command](example-ssml-synthesize-speech-cli.md) + [Synthesizing an SSML-enhanced document](example-ssml-synthesize-document.md) + [Supported SSML tags](supportedtags.md) # Reserved characters in SSML There are five predefined characters that can't normally be used within an SSML statement. These entities are reserved by the language specification. These characters are as follows: | Name | Character | Escape code | | --- | --- | --- | | quotation mark (double quotation mark) | " | " | | ampersand | & | & | | apostrophe or single quotation mark | ' | ' | | less than sign | < | < | | greater than sign | > | > | Because SSML uses these characters as part of its code, to use these symbols in SSML, you must *escape* the character when you use it. You use the escape code instead of the actual character so it displays properly while still creating a valid SSML document. For example, the following sentence ``` We're using the lawyer at Peabody & Chambers, attorneys-at-law. ``` would be rendered in SSML as ``` We're using the lawyer at Peabody & Chambers, attorneys-at-law. ``` In this case, the special characters for the apostrophe and ampersand are escaped so the SSML document remains valid. For the **&**, **<**, and **>** symbols, escape codes are always necessary when you use SSML. Additionallty, when you use the apostrophe/single quotation mark (**'**) as an apostrophe, you must also use the escape code. However, when you use the double quotation mark (**"**), or the apostrophe/single quotation mark (**'**) as a quotation mark, then whether or not you use the escape code is dependent on context. Double quotation marks + Must be escaped when in a attribute value delimited by double quotes. For example, in the following AWS CLI code ``` --text "Pete "Maverick" Mitchell" ``` + Do not need to be escaped when in textual context. For example, in the following ``` He said, "Turn right at the corner." ``` + Do not need to be escaped when in a attribute value delimited by single quotes. For example, in the following AWS CLI code ``` --text 'Pete "Maverick" Mitchell' ``` Single quotation marks + Must be escaped when used as an apostrophe. For example, in the following ``` We've got to leave quickly. ``` + Do not need to be escaped when in textual context. For example, in the following ``` "And then I said, 'Don't quote me.'" ``` + Do not need to be escaped when in a code attribute delimited by double quotes. For example, in the following AWS CLI code ``` --text "Pete 'Maverick' Mitchell" ``` # Using SSML on the console In the following example, you use an SSML tag to tell Amazon Polly to substitute "World Wide Web Consortium" for "W3C" when it speaks a short paragraph. You also use tags to introduce a pause and whisper a word. Compare the results of this exercise with that of [Applying lexicons (Synthesizing Speech)](managing-lexicons-console-synthesize-speech.md) . For more information on SSML, with examples, see [Supported SSML tags](supportedtags.md). **To synthesize speech from SSML-enhanced text (console)** 1. Sign in to the AWS Management Console and open the Amazon Polly console at [https://console.aws.amazon.com/polly/](https://console.aws.amazon.com/polly/). 1. If it isn't already displayed, choose the **Text-to-Speech** tab. 1. Turn on **SSML**. 1. Type or paste the following text in the text box: ``` He was caught up in the game. In the middle of the 10/3/2014 _W3C meeting, he shouted, "Nice job!" quite loudly. When his boss stared at him, he repeated "Nice job," in a whisper. ``` The SSML tags tell Amazon Polly how to render the text: + `` tells Amazon Polly to pause 1 second between the first two sentences. + `_W3C` tells Amazon Polly to substitute World Wide Web Consortium for the acronym W3C. + `Nice job` tells Amazon Polly to whisper the second instance of "Nice job." . **Note** When you use the AWS CLI, you enclose the input text in quotation marks to differentiate it from the surrounding code. The Amazon Polly console doesn't show you code, so you don't enclose input text in quotation marks when you use it. 1. For **Language**, choose **English, US**, then choose a voice. 1. To listen to the speech, choose **Listen**. 1. To save the speech file, choose **Download**. If you want to save it in a different format, expand **Additional settings**, turn on **Speech file format settings** and choose the format that you want, then choose **Download**. # Using SSML with the Synthesize-Speech command This example shows how to use the `synthesize-speech` command with an SSML string. When you use the `synthesize-speech` command, you typically provide the following: + The input text (required) + Opening and closing tags (required) + The output format + A voice In this example, you specify a simple text string in quotation marks along with the required opening and closing `` tags. **Important** Although you don't use quotation marks around input text in the Amazon Polly console, you must use them in use the AWS CLI It's also important that you differentiate between the quotation marks around input text and quotations required for individual tags. For example, you can use standard quotation marks (") to enclose the input text, and single quotation marks (') for interior tags, or vice versa. Either option works for Unix, Linux, and macOS. However, with Windows you must enclose the input text in standard quotations marks and use single quotation marks for the tags. For all operating systems, you can use standard quotation marks (") to enclose the input text, and single quotation marks (') for interior tags). For example: ``` --text "Hello World" ``` For Unix, Linux, and macOS, you can also use the reverse, with single quotation marks (') enclosing the input text and standard quotation marks (") for interior tags: ``` --text 'Hello World' ``` The following AWS CLI example is formatted for Unix, Linux, and macOS. For Windows, replace the backslash (\$1) Unix continuation character at the end of each line with a caret (^) and use full quotation marks (") around the input text with single quotes (') for interior tags. ``` aws polly synthesize-speech \ --text-type ssml \ --text 'Hello world' \ --output-format mp3 \ --voice-id Joanna \ speech.mp3 ``` To hear the synthesized speech, play the resulting `speech.mp3` file using any audio player. # Synthesizing an SSML-enhanced document For longer input text, you may find it easier to save your SSML content to a file and simply specify the file name in the `synthesize-speech` command. For example you could save the following to a file called `example.xml`: ``` Hello World ``` The `xml:lang` attribute specifies `en-US` (US English) as the language of the input text. For information about how the language of the input text and the language of the chosen voice affect the `SynthesizeSpeech` operation, see [Specifying another language for specific words](lang-tag.md). **To run an SSML-enhanced file** 1. Save the SSML to a file (for example, `example.xml`). 1. Run the following `synthesize-speech` command from the path where the XML file is stored and specify the SSML file as input by substituting `file:\\example.xml` for the input text. Because this command points to a file instead of containing the actual input text, you don't use quotation marks. **Note** The following AWS CLI example is formatted for Unix, Linux, and macOS. For Windows, replace the backslash (\$1) Unix continuation character at the end of each line with a caret (^). ``` aws polly synthesize-speech \ --text-type ssml \ --text file://example.xml \ --output-format mp3 \ --voice-id Joanna \ speech.mp3 ``` 1. To hear the synthesized speech, play the resulting `speech.mp3` file using any audio player. # Supported SSML tags All tags except for `` are supported for Standard voices. Tag availability for other voices is provided in the following table. Amazon Polly supports the following SSML tags: | Action | SSML tag | Neural voice availability | Long-form voice availability | Generative voice availability | | --- | --- | --- | --- | --- | | [Adding a pause](break-tag.md) | | Full availability | Full availability | Full availability | | [Emphasizing words](emphasis-tag.md) | | Not available | Not available | Not available | | [Specifying another language for specific words](lang-tag.md) | | Full availability | Full availability | Full availability | | [Placing a custom tag in your text](custom-tag.md) | | Full availability | Full availability | Partial availability | | [Adding a pause between paragraphs](p-tag.md) |

| Full availability | Full availability | Full availability | | [Using phonetic pronunciation](phoneme-tag.md) | | Full availability | Full availability | Partial availability | | [Controlling volume, speaking rate, and pitch](prosody-tag.md) | | Partial availability | Partial availability | Partial availiability | | [Setting a maximum duration for synthesized speech](maxduration-tag.md) | | Not available | Not available | Not available | | [Adding a pause between sentences](s-tag.md) | | Full availability | Full availability | Full availability | | [Controlling how special types of words are spoken](say-as-tag.md) | | Partial availability | Full availability | Full availability | | [Identifying SSML-enhanced text](speak-tag.md) | | Full availability | Full availability | Full availability | | [Pronouncing acronyms and abbreviations](sub-tag.md) | _{| Full availability | Full availability | Full availability |
| [Improving pronunciation by specifying parts of speech](w-tag.md) | | Full availability | Full availability | Full availability |
| [Adding the sound of breathing](breath-tag.md) | | Not available | Not available | Not available |
| [Newscaster speaking style](newscaster-tag.md) | | Select neural voices only | Not available | Not available |
| [Adding dynamic range compression](drc-tag.md) | | Full availability | Full availability | Not available |
| [Speaking softly](phonation-tag.md) | | Not available | Not available | Not available |
| [Controlling timbre](vocaltractlength-tag.md) | | Not available | Not available | Not available |
| [Whispering](whispered-tag.md) | | Not available | Not available | Not available |

**Note**
If you use unsupported SSML tags in standard, neural, or long-form format, you will get an error.

# Identifying SSML-enhanced text

**

This tag is supported by generative, long-form, neural, and standard TTS formats.

The `` tag is the root element of all Amazon Polly SSML text. All SSML-enhanced text must be enclosed within a pair of tags.

```
Mary had a little lamb.
```

# Adding a pause

**

This tag is supported by generative, long-form, neural, and standard TTS formats.

To add a pause to your text, use the tag. You can set a pause based on strength (equivalent to the pause after a comma, a sentence, or a paragraph), or you can set it to a specific length of time in seconds or milliseconds. If you don't specify an attribute to determine the pause length, Amazon Polly uses the default, which is ``, which adds a pause the length of a pause after a comma.

`strength` attribute values:
+ `none`: No pause. Use `none` to remove a normally occurring pause, such as after a period.
+ `x-weak`: Has the same strength as `none`, no pause.
+ `weak`: Sets a pause of the same duration as the pause after a comma.
+ `medium`: Has the same strength as `weak`.
+ `strong`: Sets a pause of the same duration as the pause after a sentence.
+ `x-strong`: Sets a pause of the same duration as the pause after a paragraph.

`time` attribute values:
+ `[number]s`: The duration of the pause, in seconds. The maximum duration is `10s`.
+ `[number]ms`: The duration of the pause, in milliseconds. The maximum duration is `10000ms`.

For example:

```

Mary had a little lamb Whose fleece was white as snow.

```

If you don't use an attribute with the `break` tag, the result varies depending on text:
+ If there is no other punctuation next to the `break` tag, it creates a `` (comma-length pause).
+ If the tag is next to a comma, it upgrades the tag to a `` (sentence-length pause).
+ If the tag is next to a period, it upgrades the tag to `` (paragraph-length pause).

# Emphasizing words

**

This tag is supported only by the standard TTS format.

To emphasize words, use the tag. Emphasizing words changes the speaking rate and volume. More emphasis makes Amazon Polly speak the text louder and slower. Less emphasis makes it speak quieter and faster. To specify the degree of emphasis, use the `level` attribute.

`level` attribute values:
+ `Strong`: Increases the volume and slows the speaking rate so that the speech is louder and slower.
+ `Moderate`: Increases the volume and slows the speaking rate, but less than `strong`. `Moderate` is the default.
+ `Reduced`: Decreases the volume and speeds up the speaking rate. Speech is softer and faster.

**Note**
The normal speaking rate and volume for a voice falls between the `moderate` and `reduced` levels.

For example:

```
I already told you I really like that person.
```

# Specifying another language for specific words

**

This tag is supported by generative, long-form, neural, and standard TTS formats. For generative voices, the `` tag can be used only around full sentences.

Specify another language for a specific word, phrase, or sentence with the tag. Foreign language words and phrases are generally spoken better when they are enclosed within a pair of `` tags. To specify the language, use the `xml:lang` attribute. For a complete list of available languages, see [Languages in Amazon Polly](supported-languages.md).

Unless you apply the `` tag, all of the words in the input text are spoken in the language of the voice specified in the `voice-id`. If you apply the `` tag, the words are spoken in that language.

For example, if the `voice-id` is Joanna (who speaks US English), Amazon Polly speaks the following in the Joanna voice without a French accent:

```

Je ne parle pas français.

```

If you use the Joanna voice with the `` tag, Amazon Polly speaks the sentence in the Joanna voice in American-accented French:

```

Je ne parle pas français..

```

Because Joanna is not a native French voice, pronunciation is based on her native language, US English. For example, although perfect French pronunciation features an uvual trill /R/ in the word *français*, Joanna's US English voice pronounces this phoneme as the corresponding sound /r/.

If you use the `voice-id` of Giorgio, who speaks Italian, with the following text, Amazon Polly speaks the sentence in Giorgio's voice with an Italian pronunciation:

```

Mi piace Bruce Springsteen.

```

If you use the same voice with the following `` tag, Amazon Polly pronounces Bruce Springsteen in Italian-accented English:

```

Mi piace Bruce Springsteen.

```

This tag can also be used as a substitute for the optional [DefaultLangCode](API_StartSpeechSynthesisTask.html#polly-StartSpeechSynthesisTask-request-DefaultLangCode) option when synthesizing speech. However, doing so requires that you format your text using SSML.

# Placing a custom tag in your text

**

This tag is supported by long-form, neural, and standard TTS formats. This tag does not do anything for generative voices because speechmarks are not available for generative voices.

To put a custom tag within the text, use the tag. Amazon Polly takes no action on the tag, but returns the location of the tag in the SSML metadata. This tag can be anything you want to call out, as long as it maintains the following format:

```

```

For example, suppose that the tag name is "animal" and the input text is:

```

Mary had a little lamb.

```

Amazon Polly might return the following SSML metadata:

```
{"time":767,"type":"ssml","start":25,"end":46,"value":"animal"}
```

# Adding a pause between paragraphs

*}

* This tag is supported by generative, long-form, neural, and standard TTS formats. To add a pause between paragraphs in your text, use the

~~tag. Using this tag provides a longer pause than native speakers usually place at commas or the end of a sentence. Use the~~

~~tag to enclose the paragraph: ```~~

~~This is the first paragraph. There should be a pause after this text is spoken.~~

~~This is the second paragraph.~~
``` This is equivalent to specifying a pause using . # Using phonetic pronunciation ** This tag is supported by long-form, neural, and standard TTS formats. To make Amazon Polly use phonetic pronunciation for specific text, use the tag. Two attributes are required with the `` tag. They indicate the phonetic alphabet Amazon Polly uses and the phonetic symbols of the corrected pronunciation: + `alphabet` + `ipa`— Indicates that the International Phonetic Alphabet (IPA) will be used. + `x-sampa`— Indicates that the Extended Speech Assessment Methods Phonetic Alphabet (X-SAMPA) will be used. + `ph` + Specifies the phonetic symbols for pronunciation. For more information, see [Languages in Amazon Polly](supported-languages.md) With the `` tag, Amazon Polly uses the pronunciation specified by the `ph` attribute instead of the standard pronunciation associated by default with the language used by the selected voice. For instance, the word "pecan" can be pronounced two ways. In the following example, “pecan” is assigned a different pronunciation in each line. Amazon Polly pronounces pecan as specified in the `ph` attributes, instead of using the default pronunciation. International Phonetic Alphabet (IPA) ``` You say, pecan. I say, pecan. ``` Extended Speech Assessment Methods Phonetic Alphabet (X-SAMPA) ``` You say, pecan. I say, pecan. ``` Mandarin Chinese uses Pinyin for phonetic pronunciation.. Pinyin ``` 你说薄。我说薄。 ``` Japanese uses Yomigana and Pronunciation Kana. Yomigana ``` 名前は浩一です。名前は浩一です。名前は浩一です。 ``` Pronunciation Kana ``` 名前は浩一です。 ``` # Controlling volume, speaking rate, and pitch ** Prosody tag attributes are fully supported by the standard TTS voices. Generative, Neural, and Long-Form voices support the `volume` and `rate` attributes, but don't support the `pitch` attribute. For Generative voices, the prosody tag can be used only around full sentences. To control the volume, rate, or pitch of your selected voice, use the `prosody` tag. Volume, speech rate, and pitch are dependent on the specific voice selected. In addition to differences between voices for different languages, there are differences between individual voices speaking the same language. Because of this, while attributes are similar across all languages, there are clear variations from language to language and no absolute value is available. The `prosody` tag has three attributes, each of which has several available values to set the attribute. Each attribute uses the same syntax: ``` ``` + `volume` + `default`: Resets volume to the default level for the current voice. + `silent`, `x-soft`, `soft`, `medium`, `loud`, `x-loud`: Sets the volume to a predefined value for the current voice. + `+ndB`, `-ndB`: Changes volume relative to the current level. A value of `+0dB` means no change, `+6dB` means approximately twice the current volume, and `-6dB` means approximately half the current volume. For example, you could set the volume for a passage as follows: ``` Sometimes it can be useful to increase the volume for a specific speech. ``` Or you could set it this way: ``` And sometimes a lower volume is a more effective way of interacting with your audience. ``` + `rate` + `x-slow`, `slow`, `medium`, `fast`,`x-fast`. Sets the pitch to a predefined value for the selected voice. + `n%`: A non-negative percentage change in the speaking rate. For example, a value of 100% means no change in speaking rate, a value of 200% means a speaking rate twice the default rate, and a value of 50% means a speaking rate of half the default rate. This value has a range of 20-200%. For example, you could set the speech rate for a passage as follows: ``` For dramatic purposes, you might wish to slow up the speaking rate of your text. ``` Or you could set it this way: ``` Although in some cases, it might help your audience to slow the speaking rate slightly to aid in comprehension. ``` + `pitch` + `default`: Resets pitch to the default level for the current voice. + `x-low`, `low`, `medium`, `high`, `x-high`: Sets the pitch to a predefined value for the current voice. + `+n%` or `-n%`: Adjusts pitch by a relative percentage. For example, a value of `+0%` means no baseline pitch change, `+5%` gives a little higher baseline pitch, and `-5%` results in a little lower baseline pitch. For example, you could set the pitch for a passage as follows: ``` Do you like sythesized speech with a pitch that is higher than normal? ``` Or you could set it this way: ``` Or do you prefer your speech with a somewhat lower pitch? ``` The tag must contain at least one attribute, but can include more within the same tag. ``` Each morning when I wake up, I speak quite slowly and deliberately until I have my coffee. ``` It can also be combined with nested tags, as follows: ``` Sometimes combining attributes can change the impression your audience has of a voice as well. ``` **Note** Currently `` is partially available for the Generative voices. # Setting a maximum duration for synthesized speech ** This tag is currently supported only by the standard TTS format. To control how long you want a speech to take when it is synthesized, use the `` tag with the `amazon:max-duration` attribute. The duration of synthesized speech varies slightly, depending on the voice you select. This can make it difficult to match synthesized speech with visuals or other activities that require precise timing. This issue is magnified for translation applications because the time it takes to say particular phrases can vary widely with different languages. The `` tag matches synthesized speech to the amount of time you want it to take (the duration). This tag uses the following syntax: ``` ``` With the `` tag, you can specify duration in either seconds or milliseconds: + `ns`: the maximum duration in seconds + `nms`: the maximum duration in milliseconds For example, the following spoken text has a maximum duration of 2 seconds: ``` Human speech is a powerful way to communicate. ``` Text placed within the tag, it doesn't exceed the specified duration. If the chosen voice or language would normally take longer than that duration, Amazon Polly speeds up the speech so that it fits into the specified duration. If the specified duration is longer than it takes to read the text at a normal rate, Amazon Polly reads the speech normally. It doesn't slow down the speech or add silence, so the resulting audio is shorter than requested. **Note** Amazon Polly increases the speed no more than 5 times the normal rate. If text is spoken faster than this, it usually doesn't make sense. If a speech cannot fit within your specified duration even when speeded up to the maximum, the audio will be speeded up but will last longer than the specified duration. You can include a single sentence or multiple sentences within a `` tag, and you can use multiple `` tags within your text. For example: ``` Human speech is a powerful way to communicate. Even a simple ‘Hello’ can convey a lot of information depending on the pitch, intonation, and tempo. We naturally understand this information, which is why speech is ideal for creating applications where a screen isn’t practical or possible, or simply isn’t convenient. ``` ``` ``` Using the `` tag can increase latency when Amazon Polly is returns synthesized speech. The degree of latency depends on the passage and its length. We recommend using text comprised of relatively short text passages. **Limitations** There are limitations both in how you use `` tag and in how it works with other SSML tags: + The text inside a `` tag can't be longer than 1500 characters. + You can't nest `` tags. If you put one `` tag inside another, Amazon Polly ignores the inner tag. For example, in the following, the `` tag is ignored: ``` Human speech is a powerful way to communicate. Even a simple ‘Hello’ can convey a lot of information depending on the pitch, intonation, and tempo. We naturally understand this information, which is why speech is ideal for creating applications where a screen isn’t practical or possible, or simply isn’t convenient. ``` + You can't use the `` tags with the `rate` attribute within a `` tag. This is because both affect the speed at which text is spoken. In the following example, Amazon Polly ignores the `` tag: ``` Human speech is a powerful way to communicate. Even a simple ‘Hello’ can convey a lot of information depending on the pitch, intonation, and tempo. ``` **Pauses and `max-duration` ** When using `max-duration` tag, you can still insert pauses within your text. However, Amazon Polly includes the length of the pause when calculating the maximum duration for speech. Additionally, Amazon Polly preserves the short pauses that occur where commas and periods are placed within a passage and includes in the maximum duration. For example, in the following block, the 600 millisecond break and the breaks caused by the commas and periods occur within the 8-second speech: ``` Human speech is a powerful way to communicate. Even a simple ‘Hello’ can convey a lot of information depending on the pitch, intonation, and tempo. ``` # Adding a pause between sentences ** This tag is supported by generative, long-form, neural, and standard TTS formats. To add a pause between lines or sentences in your text, use the `` tag. Using this tag has the same effect as: + Ending a sentence with a period (.) + Specifying a pause with `` Unlike the `` tag, the tag encloses the sentence. This is useful for synthesizing speech that is organized in lines, rather than sentence, such as poetry. In the following example, the `` tag creates a short pause after both the first and second sentences. The final sentence has no `` tag, but it is also followed by a short pause because it ends with a period. ``` ~~Mary had a little lamb~~ ~~Whose fleece was white as snow~~ And everywhere that Mary went, the lamb was sure to go. ``` # Controlling how special types of words are spoken ** The `` tag is supported by generative, long-form, neural, and standard TTS engines. Note, however, that if Amazon Polly is using a neural voice and encounters the `` tag with the `characters` option at runtime, the affected sentence will be synthesized using the related standard voice. However, the affected sentence will still be billed as if it uses a neural voice. Use the `` tag with the `interpret-as` attribute to tell Amazon Polly how to say certain characters, words, and numbers. This enables you to provide additional context to eliminate any ambiguity on how Amazon Polly should render the text. The `` tag uses one attribute, `interpret-as`, which uses a number of possible available values. Each uses the same syntax: ``` [text to be interpreted] ``` The following values are available with `interpret-as`: + `characters` or `spell-out`: Spells out each letter of the text, as in a-b-c. **Note** This option is not currently supported for neural voices. If you're using a neural voice and this SSML code is encountered by Amazon Polly at run-time, the affected sentence will be synthesized using the related standard voice. Please note, however, that this sentence will still be billed as if it uses a neural voice. + `cardinal` or `number`: Interprets the numerical text as a cardinal number, as in 1,234. + `ordinal`: Interprets the numerical text as an ordinal number, as in 1,234th. + `digits`: Spells out each digit individually, as in 1-2-3-4. + `fraction`: Interprets the numerical text as a fraction. This works for both common fractions such as 3/20, and mixed fractions, such as 2 ½. See below for more information. + `unit`: Interprets a numerical text as a measurement. The value should be either a number or a fraction followed by a unit with no space in between as in `1/2inch`, or by just a unit, as in `1meter`. + `date`: Interprets the text as a date. The format of the date must be specified with the format attribute. See below for more information. + `time`: Interprets the numerical text as duration, in minutes and seconds, as in `1'21"`. + `address`: Interprets the text as part of a street address. + `expletive`: "Beeps out" the content included within the tag. + `telephone`: Interprets the numerical text as a 7-digit or 10-digit telephone number, as in `2025551212`. You can also use this value for handle telephone extensions, as in `2025551212x345`. See below for more information. **Note** Currently the `telephone` option is not available for all languages. However, it is available for voices speaking English language variants (en-AU, en-GB, en-IN, en-US, and en-GB-WLS), Spanish language variants (es-ES, es-MX, and es-US), French language variants (fr-FR and fr-CA), and Portuguese variants (pt-BR and pt-PT), as well as German (de-DE), Italian (it-IT), Japanese (ja-JP), and Russian (ru-RU). It should also be noted that in some cases, languages such as Arabic (arb) automatically handle the number set as a telephone number and so don't actually implement the `telephone` SSML tag. **Fractions** Amazon Polly interprets values within the `say-as` tag that have the `interpret-as="fraction"` attribute as common fractions. The following is the syntax for fractions: + *Fraction* Syntax: *cardinal number*/*cardinal number*, such as 2/9. For example: `2/9` is pronounced "two ninths." + *Non-negative Mixed Number* Syntax: *cardinal number*\$1*cardinal number*/*cardinal number*, such as 3\$11/2. For example, `3+1/2` is pronounced "three and a half." **Note** There must be a `+` between the "3" and the "1/2". Amazon Polly doesn't support a mixed number without the `+`, such as "3 1/2". **Dates** When `interpret-as` is set to `date`, you also need to indicate the format of the date. This uses the following syntax: ``` [date] ``` For example: ``` I was born on 12-31-1900. ``` The following formats can be used with the `date` attribute. + `mdy`: Month-day-year. + `dmy`: Day-month-year. + `ymd`: Year-month-day. + `md`: Month-day. + `dm`: Day-month. + `ym`: Year-month. + `my`: Month-year. + `d`: Day. + `m`: Month. + `y`: Year. + `yyyymmdd`: Year-month-day. If you use this format, you can make Amazon Polly skip parts of the date using question marks. For example, Amazon Polly renders the following as "September 22nd": ``` ????0922 ``` `Format` is not needed. **Telephone** Amazon Polly attempts to interpret the text you provide correctly based on the text’s formatting even without the `` tag. For example, if your text includes "202-555-1212," Amazon Polly interprets it as a 10-digit telephone number and says each digit individually, with a brief pause for each dash. In this case, you don't need to use ``. However, if you provide the text “2025551212” and want Amazon Polly to say it as a phone number, you would specify ``. The logic for interpreting each element is language-specific. For example, US and UK English differ in how phone numbers are pronounced (in UK English, sequences of the same digit are grouped together, as in "double five" or "triple four"). To see the difference, test the following example with a US voice and with a UK voice: ``` Richard's number is 2122241555 ``` # Pronouncing acronyms and abbreviations *_{*

This tag is supported by generative, long-form, neural, and standard TTS formats.

Use the `_{` tag with the `alias` attribute to substitute a different word (or pronunciation) for selected text such as an acronym or abbreviation.

This uses the syntax:

```
_abbreviation
```

In the following example, the name "Mercury" is substituted for the element's chemical symbol to make the audio content clearer.

```

My favorite chemical element is _Hg, because it looks so shiny.

```

# Improving pronunciation by specifying parts of speech

**

This tag is supported by generative, long-form, neural, and standard TTS formats.

You can use the tag to customize the pronunciation of words by specifying the word’s part of speech or alternate meaning. This is done using the `role` attribute.

This tag uses the following syntax:

```
text
```

The following values can be used for the `role` attribute:

To specify the part of speech:
+ `amazon:VB`: interprets the word as a verb (present simple).
+ `amazon:VBD`: interprets the word as past tense verb.
+ `amazon:DT`: interprets the word as a determiner.
+ `amazon:IN`: interprets the word as a preposition.
+ `amazon:JJ`: interprets the word as an adjective.
+ `amazon:NN`: interprets the word as a noun.

For example, depending on its part of speech, the US English pronunciation of the word "read" varies based on the tag:

```

The word read may be interpreted
as either the present simple form read, or the past
participle form read.

```

To specify a specific meaning:
+ `amazon:DEFAULT`: uses the default sense of the word.
+ `amazon:SENSE_1`: uses the non-default sense of the word when present. For example, the noun "bass" is pronounced differently depending on its meaning. The default meaning is the lowest part of the musical range. The alternate meaning is a species of freshwater fish, also called "bass" but pronounced differently. Using `bass` renders the non-default pronunciation (freshwater fish) for the audio text.

This difference in pronunciation and meaning can be heard if you synthesize the following:

```

Depending on your meaning, the word bass
may be interpreted as either a musical element: bass, or as its alternative meaning,
a freshwater fish bass.

```

**Note**
Some languages may have a different selection of supported parts of speech.

# Adding the sound of breathing

* and *

This tag is supported only by the standard TTS format.

Natural-sounding speech includes both correctly spoken words and breathing sounds. By adding breathing sounds to synthesized speech, you can make it sound more natural. The `` and `` tags provide breaths. You have the following options:
+ Manual mode: you set the location, length, and volume of a breath sound within the text
+ Automated mode: Amazon Polly automatically inserts breathing sounds into the speech output
+ Mixed mode: both you and Amazon Polly add breathing sounds

**Manual Mode**
In manual mode, you place the `` tag in the input text where you want to locate a breath. You can customize the length and volume of breaths with the `duration` and `volume` attributes, respectively:

+ `duration`: Controls the length of the breath. Valid values are: `default`, `x-short`, `short`, `medium`, `long`, `x-long`. The default value is `medium`.
+ `volume`: Controls how loud breathing sounds. Valid values are: `default`, `x-soft`, `soft`, `medium`, `loud`, `x-loud`. The default value is `medium`.

**Note**
The exact length and volume of each attribute value is dependent on the specific Amazon Polly voice used.

To set a breath sound using the defaults, use `` without attributes.

For example, to use attributes to set the duration and volume for a breath to medium, you would set the attributes as follows:

```

Sometimes you want to insert only a single breath.

```

To use the defaults, you would just use the tag:

```

Sometimes you need to insert one or more average breaths so that the
text sounds correct.

```

You can add individual breathing sounds within a passage, as follows:

```

Wow! That was quite fast. I almost beat my personal best time on this track.

```

**Automated Mode**
In automated mode, you use the `` tag to tell Amazon Polly to automatically create breathing noises at appropriate intervals. You can set the frequency of the intervals, their volume, and their duration. Place the `` tag at the beginning of the text that you want to apply automated breathing to and then close the tag at the end.

**Note**
Unlike the manual mode tag, ``, the `` tag requires a closing tag (``).

You can use the following optional attributes with the `` tag:
+ `volume`: Controls how loud the breathing sounds. Valid values are: `default`, `x-soft`, `soft`, `medium`, `loud`, `x-loud`. The default value is `medium`.
+ `frequency`: Controls how often breathing sounds occur in the text. Valid values are: `default`, `x-low`, `low`, `medium`, `high`, `x-high`. The default value is `medium`.
+ `duration`: Controls the length of the breath. Valid values are: `default`, `x-short`, `short`, `medium`, `long`, `x-long`. The default value is `medium`.

By default, the frequency of breathing sounds depends on the input text. However, breathing sounds often occur after commas and periods.

The following examples show how to use the `` tag. To decide which options to use for your content, copy the applicable examples to the Amazon Polly console and listen to the differences.
+ Using automated mode without optional parameters.

```

Amazon Polly is a service that turns text into lifelike speech,
allowing you to create applications that talk and build entirely new categories of speech-
enabled products. Amazon Polly is a text-to-speech service that uses advanced deep learning
technologies to synthesize speech that sounds like a human voice. With dozens of lifelike
voices across a variety of languages, you can select the ideal voice and build speech-
enabled applications that work in many different countries.

```
+ Using automated mode with volume control. The unspecified parameters (`duration` and `frequency`) are set to the default values (`medium`).

```

Amazon Polly is a service that turns text into lifelike
speech, allowing you to create applications that talk and build entirely new categories of
speech-enabled products. Amazon Polly is a text-to-speech service, that uses advanced deep
learning technologies to synthesize speech that sounds like a human voice. With dozens of
lifelike voices across a variety of languages, you can select the ideal voice and build speech-
enabled applications that work in many different countries.

```
+ Using automated mode with frequency control. The unspecified parameters (`duration` and `volume`) are set to the default values (`medium`).

```

Amazon Polly is a service that turns text into lifelike
speech, allowing you to create applications that talk and build entirely new categories of
speech-enabled products. Amazon Polly is a text-to-speech service, that uses advanced deep
learning technologies to synthesize speech that sounds like a human voice. With dozens of
lifelike voices across a variety of languages, you can select the ideal voice and build speech-
enabled applications that work in many different countries.

```
+ Using automated mode with multiple parameters. For the unspecified `Duration` parameter, Amazon Polly uses the default value (`medium`).

```

Amazon Polly is a service that turns
text into lifelike speech, allowing you to create applications that talk and build entirely new
categories of speech-enabled products. Amazon Polly is a text-to-speech service, that uses
advanced deep learning technologies to synthesize speech that sounds like a human voice. With
dozens of lifelike voices across a variety of languages, you can select the ideal voice and build
speech-enabled applications that work in many different countries.

```

# Newscaster speaking style

**

The newscaster style is available only for the Matthew or Joanna voices, which are available only in American English (en-US), Lupe, in US Spanish (es-US) and Amy, in British English (en-GB). It is only supported when using `Neural` format.

To use the newscaster style, you use SSML tags and the following syntax::

```
text
```

For example, you might use the newscaster style with the Amy voice as follows:

```

From the Tuesday, April 16th, 1912 edition of The Guardian newspaper:

The maiden voyage of the White Star liner Titanic, the largest ship ever launched, has ended in disaster.

The Titanic started her trip from Southampton for New York on Wednesday. Late on Sunday night she struck
an iceberg off the Grand Banks of Newfoundland. By wireless telegraphy she sent out signals of distress,
and several liners were near enough to catch and respond to the call.

```

# Adding dynamic range compression

**

This tag is supported by long-form, neural, and standard TTS formats.

Depending on the text, language, and voice used in an audio file, the sounds range from soft to loud. Environmental sounds, such as the sound of a moving vehicle, can often mask the softer sounds, which makes the audio track difficult to hear clearly. To enhance the volume of certain sounds in your audio file, use the dynamic range compression (`drc`) tag.

The `drc` tag sets a midrange "loudness" threshold for your audio, and increases the volume (the gain) of the sounds around that threshold. It applies the greatest gain increase closest to the threshold, and the gain increase is lessened farther away from the threshold.

![\[Dynamic range compression increases the volume of the sounds around a certain threshold.\]](http://docs.aws.amazon.com/polly/latest/dg/images/drc-on.png)

This makes the middle-range sounds easier to hear in a noisy environment, which makes the entire audio file clearer.

The `drc` tag is a Boolean parameter (it's either present or it isn't). It uses the syntax: `` and is closed with ``.

You can use the `drc` tag with any voice or language supported by Amazon Polly. You can apply it to an entire section of the recording, or for only a few words. For example:

```

Some audio is difficult to hear in a moving vehicle, but this audio
is less difficult to hear in a moving vehicle.

```

**Note**
When you use "`drc`" in the `amazon:effect `syntax, it is case-sensitive.

**Using `drc` with the `prosody volume` Tag**
As the following graphic shows, the `prosody volume` tag evenly increases the volume of an entire audio file from the original level (dotted line) to an adjusted level (solid line). To further increase the volume of certain parts of the file, use the `drc` tag with the `prosody volume` tag. Combining tags doesn't affect the settings of the `prosody volume` tag.

![\[Using the prosody volume tag increases the volume across the entire audio file.\]](http://docs.aws.amazon.com/polly/latest/dg/images/prosodyloud.png)

When you use the `drc` and `prosody volume` tags together, Amazon Polly applies the `drc` tag first, increasing the middle-range sounds (those near the threshold). It then applies the `prosody volume` tag and further increases the volume of the entire audio track evenly.

![\[Using the drc tag with a prosody volume tag increases the volume of the middle-range sounds in addition to the volume of the entire audio track.\]](http://docs.aws.amazon.com/polly/latest/dg/images/prosody+drc.png)

To use the tags together, nest one inside the other. For example:

```

This text needs to be understandable and loud.
This text also needs to be more understandable in a moving car.

```

In this text, the `prosody volume` tag increases the volume of the entire passage to "loud." The `drc` tag enhances the volume of the middle-range values in the second sentence.

**Note**
When using the `drc` and `prosody volume` tags together, use standard XML practices for nesting tags.

# Speaking softly

**

This tag is currently supported only by the standard TTS format.

To specify that input text should be spoken in a softer-than-normal voice, use the tag.

This uses the syntax:

```
text
```

For example, you might use this tag with the Matthew voice as follows:

```

This is Matthew speaking in my normal voice. This
is Matthew speaking in my softer voice.

```

# Controlling timbre

**

This tag is currently supported only by the standard TTS format.

Timbre is the tonal quality of a voice that helps you tell the difference between voices, even when they have the same pitch and loudness. One of the most important physiological features that contributes to speech timbre is the length of the vocal tract. The vocal tract is a cavity of air that spans from the top of the vocal folds up to the edge of the lips.

To control the timbre of output speech in Amazon Polly, use the `vocal-tract-length` tag. This tag has the effect of changing the length of the speaker’s vocal tract, which sounds like a change in the speaker’s size. When you increase the `vocal-tract-length`, the speaker sounds physically bigger. When you decrease it, the speaker sounds smaller. You can use this tag with any of the voices in the Amazon Polly Text-to-Speech portfolio.

To change timbre, use the following values:
+ `+n%` or `-n%`: Adjusts the vocal tract length by a relative percentage change in the current voice. For example, \$14% or -2%. Valid values range from \$1100% to -50%. Values outside this range are clipped. For example, \$1111% sounds like \$1100% and -60% sounds like -50%.
+ `n%`: Changes the vocal tract length to an absolute percentage of the tract length of the current voice. For example, 110% or 75%. An absolute value of 110% is equivalent to a relative value of \$110%. An absolute value of 100% is the same as the default value for the current voice.

The following example shows how to change the vocal tract length to change timbre:

```

This is my original voice, without any modifications.
Now, imagine that I am much bigger.
Or, perhaps you prefer my voice when I'm very small. You can also control the
timbre of my voice by making minor adjustments.
For example, by making me sound just a little bigger. Or, making me sound only somewhat smaller.

```

**Combining Multiple Tags**

You can combine the `vocal-tract-length` tag with any other SSML tag that is supported by Amazon Polly. Because timbre (vocal tract length) and pitch are closely connected, you might get the best results by using both the `vocal-tract-length` and the `` tags. To produce the most realistic voice, we recommend that you use different percentages of change for the two tags. Experiment with various combinations to get the results you want.

The following example shows how to combine tags.

```

The pitch and timbre of a person's voice are connected in human speech.
If you are going to reduce the vocal tract length,
you
might consider increasing the pitch, too.
If you choose to lengthen the vocal tract,

you might also want to lower the pitch.

```

# Whispering

**

This tag is currently supported only by the standard TTS format.

This tag indicates that the input text should be spoken in a whispered voice rather than as normal speech. This can be used with any of the voices in the Amazon Polly Text-to-Speech portfolio.

This uses the following syntax:

```
text
```

For example:

```

If you make any noise,
she said, they will hear us.

```

In this case, the synthesized speech spoken by the character is whispered, but the phrase "she said" is spoken in the normal synthesized speech of the selected Amazon Polly voice.

You can enhance the "whispered" effect by slowing down the prosody rate by up to 10%, depending on the effect you want.

For example:

```

When any voice is made to whisper,
the sound is slower and quieter than normal speech

```

When generating speech marks for a whispered voice, the audio stream must also include the whispered voice to ensure that the speech marks match the audio stream.}}