

本文属于机器翻译版本。若本译文内容与英语原文存在差异，则一律以英文原文为准。

# 划分发言者（分类）
<a name="diarization"></a>

通过说话人日记法，您可以在转录输出中区分不同的说话者。 Amazon Transcribe可以区分最多 30 个不同的扬声器，并用唯一的值（`spk_0`直到`spk_9`）标记每个唯一发言者的文本。

除了[标准转录部分](how-input.md#how-it-works-output)（`transcripts` 和 `items`）之外，启用发言人划分的请求还包括一个 `speaker_labels` 部分。本部分按发言者进行分组，并且包含每个言语的信息，其中包括发言者标签和时间戳。

```
"speaker_labels": {
    "channel_label": "ch_0",
    "speakers": 2,
    "segments": [
         {
            "start_time": "4.87",
            "speaker_label": "spk_0",
            "end_time": "6.88",
            "items": [                                                 
                {
                    "start_time": "4.87",
                    "speaker_label": "spk_0",
                    "end_time": "5.02"
                },
        ...
        {
            "start_time": "8.49",
            "speaker_label": "spk_1",
            "end_time": "9.24",
            "items": [
                {
                    "start_time": "8.49",
                    "speaker_label": "spk_1",
                    "end_time": "8.88"
                },
```

要查看包含发言者划分（适用于两个发言者）的完整转录示例，请参阅[分类输出示例（批量转录）](diarization-output-batch.md)。

## 在批量转录中对发言者进行划分
<a name="diarization-batch"></a>

要在批量转录中对发言者进行划分，请参阅以下示例：

### AWS 管理控制台
<a name="diarization-console-batch"></a>

1. 登录到 [AWS 管理控制台](https://console.aws.amazon.com/transcribe/)。

1. 在导航窗格中，选择**转录作业**，然后选择**创建作业**（右上角）。这将打开**指定作业详细信息**页面。  
![\[Amazon Transcribe控制台 “指定任务详细信息” 页面。在“作业设置”面板中，您可以为转录作业指定名称，选择模型类型并指定语言设置。\]](http://docs.aws.amazon.com/zh_cn/transcribe/latest/dg/images/console-batch-job-details-1.png)

1. 在**指定作业详细信息**页面上填写要包含的任何字段，然后选择**下一步**。此时您将会看到**配置作业 - *可选***页面。

   要启用扬声器分区，请在**音频设置**中选择**音频识别**。然后选择**扬声器分区**并指定扬声器数量。  
![\[Amazon Transcribe控制台 “配置作业” 页面。在“音频设置”面板中，您可以启用“发言者划分”。\]](http://docs.aws.amazon.com/zh_cn/transcribe/latest/dg/images/diarization-batch.png)

1. 选择**创建作业**以运行您的转录作业。

### AWS CLI
<a name="diarization-cli"></a>

此示例使用 [start-transcription-job](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/transcribe/start-transcription-job.html)。有关更多信息，请参阅 [https://docs.aws.amazon.com/transcribe/latest/APIReference/API_StartTranscriptionJob.html](https://docs.aws.amazon.com/transcribe/latest/APIReference/API_StartTranscriptionJob.html)。

```
aws transcribe start-transcription-job \
--region us-west-2 \
--transcription-job-name my-first-transcription-job \
--media MediaFileUri=s3://amzn-s3-demo-bucket/my-input-files/my-media-file.flac \
--output-bucket-name amzn-s3-demo-bucket \
--output-key my-output-files/ \
--language-code en-US \
--settings ShowSpeakerLabels=true,MaxSpeakerLabels=3
```

以下是另一个使用[start-transcription-job](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/transcribe/start-transcription-job.html)命令的示例，以及允许使用该作业对扬声器进行分区的请求正文。

```
aws transcribe start-transcription-job \
--region us-west-2 \
--cli-input-json file://my-first-transcription-job.json
```

*my-first-transcription-job.json* 文件包含以下请求正文。

```
{
  "TranscriptionJobName": "my-first-transcription-job",
  "Media": {
        "MediaFileUri": "s3://amzn-s3-demo-bucket/my-input-files/my-media-file.flac"
  },
  "OutputBucketName": "amzn-s3-demo-bucket",
  "OutputKey": "my-output-files/", 
  "LanguageCode": "en-US",
  "ShowSpeakerLabels": 'TRUE',    
  "MaxSpeakerLabels": 3
 }
```

### 适用于 Python (Boto3) 的 AWS SDK
<a name="diarization-python-batch"></a>

此示例使用 [start\$1transcription\$1](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/transcribe.html#TranscribeService.Client.start_transcription_job) job 方法来识别频道。适用于 Python (Boto3) 的 AWS SDK有关更多信息，请参阅 [StartTranscriptionJob](https://docs.aws.amazon.com/transcribe/latest/APIReference/API_StartTranscriptionJob.html)。

```
from __future__ import print_function
import time
import boto3
transcribe = boto3.client('transcribe', 'us-west-2')
job_name = "my-first-transcription-job"
job_uri = "s3://amzn-s3-demo-bucket/my-input-files/my-media-file.flac"
transcribe.start_transcription_job(
    TranscriptionJobName = job_name,
    Media = {
        'MediaFileUri': job_uri
    },
    OutputBucketName = 'amzn-s3-demo-bucket',
    OutputKey = 'my-output-files/', 
    LanguageCode = 'en-US', 
    Settings = {
        'ShowSpeakerLabels': True,
        'MaxSpeakerLabels': 3
    }    
)

while True:
    status = transcribe.get_transcription_job(TranscriptionJobName = job_name)
    if status['TranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED', 'FAILED']:
        break
    print("Not ready yet...")
    time.sleep(5)
print(status)
```

## 在流式转录中对发言者进行划分
<a name="diarization-stream"></a>

要在流式转录中对发言者进行划分，请参阅以下示例：

### 流式转录
<a name="diarization-console-stream"></a>

1. 登录到 [AWS 管理控制台](https://console.aws.amazon.com/transcribe/)。

1. 在导航窗格中，选择 **Real-time transcription (实时转录)**。向下滚动到**音频设置**，如果该字段已最小化，则将其展开。  
![\[Amazon Transcribe控制台屏幕截图：“实时转录” 页面上的 “音频设置” 选项卡。\]](http://docs.aws.amazon.com/zh_cn/transcribe/latest/dg/images/diarization-streaming1.png)

1. 开启**发言者划分**。  
![\[Amazon Transcribe控制台屏幕截图：启用扬声器分区的扩展的 “音频设置” 选项卡。\]](http://docs.aws.amazon.com/zh_cn/transcribe/latest/dg/images/diarization-streaming2.png)

1. 您现在已准备就绪，可以转录音频流了。选择**开始流式转录**并开始讲话。要结束口述，请选择**停止流式转录**。

### HTTP/2 音频流
<a name="diarization-http2"></a>

此示例创建了一个 HTTP/2 请求，用于对转录输出中的发言者进行划分。有关使用 HTTP/2 流式传输的更多信息Amazon Transcribe，请参阅。[设置 HTTP/2 音频流](streaming-setting-up.md#streaming-http2)有关特定于 Amazon Transcribe 的参数和标题的更多详细信息，请参阅 [StartStreamTranscription](https://docs.aws.amazon.com/transcribe/latest/APIReference/API_streaming_StartStreamTranscription.html)。

```
POST /stream-transcription HTTP/2
host: transcribestreaming.us-west-2.amazonaws.com
X-Amz-Target: com.amazonaws.transcribe.Transcribe.StartStreamTranscription
Content-Type: application/vnd.amazon.eventstream
X-Amz-Content-Sha256: string
X-Amz-Date: 20220208T235959Z
Authorization: AWS4-HMAC-SHA256 Credential=access-key/20220208/us-west-2/transcribe/aws4_request, SignedHeaders=content-type;host;x-amz-content-sha256;x-amz-date;x-amz-target;x-amz-security-token, Signature=string
x-amzn-transcribe-language-code: en-US
x-amzn-transcribe-media-encoding: flac
x-amzn-transcribe-sample-rate: 16000             
x-amzn-transcribe-show-speaker-label: true
transfer-encoding: chunked
```

参数定义可在 [API 参考](https://docs.aws.amazon.com/transcribe/latest/APIReference/API_Reference.html)中找到；所有 AWS API 操作的通用参数列在[常用参数](https://docs.aws.amazon.com/transcribe/latest/APIReference/CommonParameters.html)部分中。

### WebSocket 直播
<a name="diarization-websocket"></a>

此示例创建了一个预签名 URL，用于分隔转录输出中的发言者。为了便于阅读，已增加了换行符。有关将 WebSocket 直播与配合使用的更多信息Amazon Transcribe，请参阅[设置直 WebSocket 播](streaming-setting-up.md#streaming-websocket)。有关参数的更多详细信息，请参阅 [https://docs.aws.amazon.com/transcribe/latest/APIReference/API_streaming_StartStreamTranscription.html](https://docs.aws.amazon.com/transcribe/latest/APIReference/API_streaming_StartStreamTranscription.html)。

```
GET wss://transcribestreaming.us-west-2.amazonaws.com:8443/stream-transcription-websocket?
&X-Amz-Algorithm=AWS4-HMAC-SHA256
&X-Amz-Credential=AKIAIOSFODNN7EXAMPLE%2F20220208%2Fus-west-2%2Ftranscribe%2Faws4_request
&X-Amz-Date=20220208T235959Z
&X-Amz-Expires=300
&X-Amz-Security-Token=security-token
&X-Amz-Signature=string
&X-Amz-SignedHeaders=content-type%3Bhost%3Bx-amz-date
&language-code=en-US
&specialty=PRIMARYCARE
&type=DICTATION
&media-encoding=flac
&sample-rate=16000        
&show-speaker-label=true
```

参数定义可在 [API 参考](https://docs.aws.amazon.com/transcribe/latest/APIReference/API_Reference.html)中找到；所有 AWS API 操作的通用参数列在[常用参数](https://docs.aws.amazon.com/transcribe/latest/APIReference/CommonParameters.html)部分中。

# 分类输出示例（批量转录）
<a name="diarization-output-batch"></a>

以下是启用分类的批量转录的输出示例。

```
{
    "jobName": "my-first-transcription-job",
    "accountId": "111122223333",
    "results": {
        "transcripts": [
            {
                "transcript": "I've been on hold for an hour. Sorry about that."
            }
        ],
        "speaker_labels": {
            "channel_label": "ch_0",
            "speakers": 2,
            "segments": [
                {
                    "start_time": "4.87",
                    "speaker_label": "spk_0",
                    "end_time": "6.88",
                    "items": [                                                 
                        {
                            "start_time": "4.87",
                            "speaker_label": "spk_0",
                            "end_time": "5.02"
                        },
                        {
                            "start_time": "5.02",
                            "speaker_label": "spk_0",
                            "end_time": "5.17"
                        },
                        {
                            "start_time": "5.17",
                            "speaker_label": "spk_0",
                            "end_time": "5.29"
                        },
                        {
                            "start_time": "5.29",
                            "speaker_label": "spk_0",
                            "end_time": "5.64"
                        },
                        {
                            "start_time": "5.64",
                            "speaker_label": "spk_0",
                            "end_time": "5.84"
                        },                     
                        {
                            "start_time": "6.11",
                            "speaker_label": "spk_0",
                            "end_time": "6.26"
                        },
                        {
                            "start_time": "6.26",
                            "speaker_label": "spk_0",
                            "end_time": "6.88"
                        }
                    ]
                },
                {
                    "start_time": "8.49",
                    "speaker_label": "spk_1",
                    "end_time": "9.24",
                    "items": [
                        {
                            "start_time": "8.49",
                            "speaker_label": "spk_1",
                            "end_time": "8.88"
                        },
                        {
                            "start_time": "8.88",
                            "speaker_label": "spk_1",
                            "end_time": "9.05"
                        },
                        {
                            "start_time": "9.05",
                            "speaker_label": "spk_1",
                            "end_time": "9.24"
                        }                                           
                    ]
                }
            ]
        },
        "items": [            
            {
                "id": 0,
                "start_time": "4.87",
                "speaker_label": "spk_0",
                "end_time": "5.02",
                "alternatives": [
                    {
                        "confidence": "1.0",
                        "content": "I've"
                    }
                ],
                "type": "pronunciation"
            },
            {
                "id": 1,
                "start_time": "5.02",
                "speaker_label": "spk_0",
                "end_time": "5.17",
                "alternatives": [
                    {
                        "confidence": "1.0",
                        "content": "been"
                    }
                ],
                "type": "pronunciation"
            },
            {
                "id": 2,
                "start_time": "5.17",
                "speaker_label": "spk_0",
                "end_time": "5.29",
                "alternatives": [
                    {
                        "confidence": "1.0",
                        "content": "on"
                    }
                ],
                "type": "pronunciation"
            },
            {
                "id": 3,
                "start_time": "5.29",
                "speaker_label": "spk_0",
                "end_time": "5.64",
                "alternatives": [
                    {
                        "confidence": "1.0",
                        "content": "hold"
                    }
                ],
                "type": "pronunciation"
            },
            {
                "id": 4,
                "start_time": "5.64",
                "speaker_label": "spk_0",
                "end_time": "5.84",
                "alternatives": [
                    {
                        "confidence": "1.0",
                        "content": "for"
                    }
                ],
                "type": "pronunciation"
            },      
            {
                "id": 5,
                "start_time": "6.11",
                "speaker_label": "spk_0",
                "end_time": "6.26",
                "alternatives": [
                    {
                        "confidence": "1.0",
                        "content": "an"
                    }
                ],
                "type": "pronunciation"
            },
            {
                "id": 6,
                "start_time": "6.26",
                "speaker_label": "spk_0",
                "end_time": "6.88",
                "alternatives": [
                    {
                        "confidence": "1.0",
                        "content": "hour"
                    }
                ],
                "type": "pronunciation"
            },
            {
                "id": 7,
                "speaker_label": "spk_0",
                "alternatives": [
                    {
                        "confidence": "0.0",
                        "content": "."
                    }
                ],
                "type": "punctuation"
            },
            {
                "id": 8,
                "start_time": "8.49",
                "speaker_label": "spk_1",
                "end_time": "8.88",
                "alternatives": [
                    {
                        "confidence": "1.0",
                        "content": "Sorry"
                    }
                ],
                "type": "pronunciation"
            },
            {
                "id": 9,
                "start_time": "8.88",
                "speaker_label": "spk_1",
                "end_time": "9.05",
                "alternatives": [
                    {
                        "confidence": "0.902",
                        "content": "about"
                    }
                ],
                "type": "pronunciation"
            },
            {
                "id": 10,
                "start_time": "9.05",
                "speaker_label": "spk_1",
                "end_time": "9.24",
                "alternatives": [
                    {
                        "confidence": "1.0",
                        "content": "that"
                    }
                ],
                "type": "pronunciation"
            },
            {
                "id": 11,
                "speaker_label": "spk_1",
                "alternatives": [
                    {
                        "confidence": "0.0",
                        "content": "."
                    }
                ],
                "type": "punctuation"
            }
        ],
        "audio_segments": [
            {
                "id": 0,
                "transcript": "I've been on hold for an hour.",
                "start_time": "4.87",
                "end_time": "6.88",
                "speaker_label": "spk_0",
                "items": [
                    0,
                    1,
                    2,
                    3,
                    4,
                    5,
                    6,
                    7
                ]
            },
            {
                "id": 1,
                "transcript": "Sorry about that.",
                "start_time": "8.49",
                "end_time": "9.24",
                "speaker_label": "spk_1",
                "items": [
                    8,
                    9,
                    10,
                    11
                ]
            }
        ]
    },
    "status": "COMPLETED"
}
```