

# Speech mark output
<a name="output"></a>

Amazon Polly returns speech mark objects in a line-delimited JSON stream. A speech mark object contains the following fields:
+  **time** – the timestamp in milliseconds from the beginning of the corresponding audio stream
+  **type** – the type of speech mark (sentence, word, viseme, or ssml)
+  **start** – the offset in bytes (not characters) of the start of the object in the input text (not including viseme marks)
+  **end** – the offset in bytes (not characters) of the object's end in the input text (not including viseme marks) 
+  **value** – this varies depending on the type of speech mark
  +  **SSML**: <mark> SSML tag
  +  **viseme**: the viseme name
  +  **word** or **sentence**: a substring of the input text, as delimited by the start and end fields

For example, Amazon Polly generates the following `word` speech mark object from the text "Mary had a little lamb":

```
{"time":373,"type":"word","start":5,"end":8,"value":"had"}
```

The described word ("had") begins 373 milliseconds after the audio stream begins, and starts at byte 5 and ends at byte 8 of the input text. 

**Note**  
This metadata is for the `Joanna` voice-id. If you use another voice with the same input text, the metadata might differ.