本文属于机器翻译版本。若本译文内容与英语原文存在差异，则一律以英文原文为准。

# 视频
<a name="bda-ouput-video"></a>

BDA 提供了一组标准输出，用于处理视频和生成见解。下面是每种操作类型的详细介绍：

## 完整视频摘要
<a name="video-summarization"></a>

完整视频摘要生成整个视频文件的整体摘要。它将整个视频中呈现的关键主题、事件和信息提炼成简明摘要。完整视频摘要针对内容进行了优化，并提供描述性对话，例如产品概述、培训、新闻广播、脱口秀和纪录片。BDA 将尝试根据完整视频摘要和场景摘要中的音频信号（例如，发言人自我介绍）或视觉信号（例如，演示幻灯片中显示的发言人的姓名），提供每位独特发言人的姓名。如果系统无法解析某个独特发言人的姓名，将使用一个唯一的数字（例如 speaker\_0）表示发言人。

## 章节摘要
<a name="video-scene-summarization"></a>

视频章节汇总为视频中的各个场景提供描述性摘要。视频章节是一系列镜头，在视频中构成连贯的动作单元或叙事单元。此功能根据视觉线索和听觉线索将视频拆分为有意义的片段，提供这些片段的时间戳，并对每个片段进行总结。

## IAB 分类法
<a name="video-iab-classification"></a>

互动广告局（IAB）分类采用标准广告分类法，根据视觉和音频元素对视频场景进行分类。对于预览版，BDA 支持 24 个顶级（L1）类别和 85 个二级（L2）类别。要下载 BDA 支持的 IAB 类别列表，请单击[此处](samples/iab-taxonomy.zip)。

## 完整音频转录
<a name="full-audio-transcript"></a>

完整音频转录功能提供了音频中所有语音内容的完整文本表示。它使用先进的语音识别技术来准确转录对话、旁白和其他音频元素。转录内容包括发言人身份，这样就可以轻松地根据发言人浏览和搜索音频内容。

## 视频中的文本
<a name="text-in-video"></a>

此功能检测和提取在视频中直观显示的文本。它可以识别静态文本（如标题或字幕）和动态文本（例如在图像中移动文本）。与图像文本检测类似，此功能为检测到的每个文本元素提供边界框信息，这样就可以在视频帧中进行精确定位。

## 徽标检测
<a name="video-logo-detection"></a>

此功能可识别视频中的徽标并提供边界框信息，指示在视频帧中检测到的每个徽标的坐标以及置信度分数。该功能默认情况下不启用。

## 内容审核
<a name="video-content-moderation"></a>

内容审核可检测视频中的不当、不需要或冒犯性内容。BDA 支持 7 个审核类别：露骨的、私密部位的 Non-Explicit 裸露和接吻、泳装或内衣、暴力、毒品和烟草、酒精、仇恨符号。视频中的露骨文字不会被标记。

对于文本检测（用于提供在视频文件中的位置坐标和时间戳）等相关的功能，可以启用或禁用边界框和相关的置信度分数。默认情况下，完整视频汇总、场景汇总和视频文本检测功能均启用。

**注意**  
 对于每个视频只支持一条音轨。不支持字幕文件格式（例如 SRT、VTT 等）。

## 视频标准输出
<a name="video-standard-output"></a>

以下是通过 BDA 处理的视频的标准输出示例。

```
{
"metadata": {
    "asset_id": "0",
    "semantic_modality": "VIDEO",
    "s3_bucket": "bedrock-data-automation-gamma-assets-us-east-1",
    "s3_key": "demo-assets/Video/MakingTheCut.mp4",
    "format": "QuickTime / MOV",
    "frame_rate": 30,
    "codec": "h264",
    "duration_millis": 378233,
    "frame_width": 852,
    "frame_height": 480
  },
```

这个初始部分介绍有关视频的元数据信息。包括存储桶位置、格式、帧率和其他关键信息。

```
"shots": [ ...

    {
      "shot_index": 3,
      "start_timecode_smpte": "00:00:08:19",
      "end_timecode_smpte": "00:00:09:25",
      "start_timestamp_millis": 8633,
      "end_timestamp_millis": 9833,
      "start_frame_index": 259,
      "end_frame_index": 295,
      "duration_smpte": "00:00:01:06",
      "duration_millis": 1200,
      "duration_frames": 36,
      "confidence": 0.9956437242589935,
      "chapter_indices": [
        1
      ]
    },
```

这是响应中提供的 shot 元素示例。镜头代表视频的一小部分，通常与视频中的编辑或剪辑有关。镜头包含 start 和 end 元素，还包括一个 chapter\_indicies 元素。此元素表示该镜头属于视频中哪个较大部分（称为章节）。

```
"chapters": [
    {
      "start_timecode_smpte": "00:00:00:00",
      "end_timecode_smpte": "00:00:08:18",
      "start_timestamp_millis": 0,
      "end_timestamp_millis": 8600,
      "start_frame_index": 0,
      "end_frame_index": 258,
      "duration_millis": 8600,
      "shot_indices": [
        0,
        1,
        2
      ],
      "summary": "At an elegant outdoor venue, a man in a suit and a woman in a patterned dress stand on a raised platform overlooking a reflective pool. The setting is adorned with palm trees and lush greenery, creating a tropical atmosphere. The man initiates the event by asking if they should begin, to which the woman responds affirmatively. As the scene progresses, the focus shifts to a woman wearing a distinctive black and white patterned coat, her hair styled in a bun. She stands alone in a dimly lit room, facing away from the camera. The narrative then moves to a formal setting where a man in a dark suit stands before a curtain backdrop, suggesting he may be about to address an audience or perform. The scene concludes with a view of the entire venue, showcasing its tropical charm with a swimming pool surrounded by palm trees and decorative lighting, indicating it's prepared for a special occasion.",
```

章节是视频中较大的片段。其内容与镜头一样，包含开始和结束信息，以及 shot\_indicies 元素。shot\_indicies 说明章节中有哪些镜头。最后是 summary 元素，提供对本章内容生成的摘要。

```
 "frames": [...
         {
          "timecode_smpte": "00:00:03:15",
          "timestamp_millis": 3500,
          "frame_index": 105,
          "content_moderation": [],
          "text_words": [
            {
              "id": "266db64a-a7dc-463c-b710-7a178a2cc4cc",
              "type": "TEXT_WORD",
              "confidence": 0.99844897,
              "text": "ANDREA",
              "locations": [
                {
                  "bounding_box": {
                    "left": 0.1056338,
                    "top": 0.7363281,
                    "width": 0.19806337,
                    "height": 0.068359375
                  },
                  "polygon": [
                    {
                      "x": 0.1056338,
                      "y": 0.7363281
                    },
                    {
                      "x": 0.30369717,
                      "y": 0.7363281
                    },
                    {
                      "x": 0.30369717,
                      "y": 0.8046875
                    },
                    {
                      "x": 0.1056338,
                      "y": 0.8046875
                    }
                  ]
                }
              ],
              "line_id": "57b760fc-c410-418e-aee3-7c7ba58a71c2"
            },
```

视频的最小粒度是帧，表示视频中的单个图像。帧有两个需要加以注意的响应元素，content\_moderation 和 text\_words。第一个元素 content\_moderation 基于内容审核类别，向您提供有关帧内容的信息（如果检测到对应内容）。第二个元素 text\_words 向您提供视频中出现的任何文本的位置和信息，例如隐藏字幕。

```
    "statistics": {
    "shot_count": 148,
    "chapter_count": 11,
    "speaker_count": 11
  }
}
```

最后，statistics 提供有关检测内容的信息细分，例如给定视频中有多少个镜头、发言人和章节。