支持 P6 的 DLAMI - AWS Deep Learning AMIs

本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。

支持 P6 的 DLAMI

以下是在亚马逊 P6 实例上 EC2 运行 DLAMI 的详细要求

支持 P6 DLAMIs

以下 DLAMI 支持 P6 实例:

这些 DLAMI 包含操作 P6-B200 实例所需的以下软件:

软件

最低版本要求

英伟达 CUDA 工具包

12.8

英伟达驱动程序

R570

NVLINK 5

R570

Linux 内

6.1

Elastic Fabric Adapter(EFA)

1.41.0

AWS OFI NCCL 插件

1.15.0

确认 GPU 功能

要确认功能,请执行 GPUs以下操作:

  1. 运行以下 Nvidia GPU 设备查询测试

    $ /usr/local/cuda/extras/demo_suite/deviceQuery
  2. 确认运行设备查询的以下输出:

    $ /usr/local/cuda/extras/demo_suite/deviceQuery /usr/local/cuda/extras/demo_suite/deviceQuery Starting... CUDA Device Query (Runtime API) Detected 8 CUDA Capable device(s) ... deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.8, CUDA Runtime Version = 12.8, NumDevs = 8, Device0 = NVIDIA B200, Device1 = NVIDIA B200, Device2 = NVIDIA B200, Device3 = NVIDIA B200, Device4 = NVIDIA B200, Device5 = NVIDIA B200, Device6 = NVIDIA B200, Device7 = NVIDIA B200 Result = PASS

要确认 NVIDIA 驱动程序正常运行,

  1. 运行 Nvidia 系统管理接口

    $ nvidia-smi
  2. 确认系统管理界面的以下输出

    +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 570.133.20 Driver Version: 570.133.20 CUDA Version: 12.8 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA B200 Off | 00000000:51:00.0 Off | 0 | | N/A 32C P0 145W / 1000W | 0MiB / 183359MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ | 1 NVIDIA B200 Off | 00000000:52:00.0 Off | 0 | | N/A 30C P0 140W / 1000W | 0MiB / 183359MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ | 2 NVIDIA B200 Off | 00000000:62:00.0 Off | 0 | | N/A 31C P0 139W / 1000W | 0MiB / 183359MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ | 3 NVIDIA B200 Off | 00000000:63:00.0 Off | 0 | | N/A 29C P0 139W / 1000W | 0MiB / 183359MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ | 4 NVIDIA B200 Off | 00000000:75:00.0 Off | 0 | | N/A 31C P0 141W / 1000W | 0MiB / 183359MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ | 5 NVIDIA B200 Off | 00000000:76:00.0 Off | 0 | | N/A 31C P0 141W / 1000W | 0MiB / 183359MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ | 6 NVIDIA B200 Off | 00000000:86:00.0 Off | 0 | | N/A 32C P0 141W / 1000W | 0MiB / 183359MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ | 7 NVIDIA B200 Off | 00000000:87:00.0 Off | 0 | | N/A 30C P0 138W / 1000W | 0MiB / 183359MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | No running processes found | +-----------------------------------------------------------------------------------------+

如果您在使用 P6-B200 实例时遇到任何问题,请联系 Su AWS pport。