本文為英文版的機器翻譯版本,如內容有任何歧義或不一致之處,概以英文版為準。
P6 支援的 DLAMI
以下是在 Amazon EC2 P6 執行個體
P6 支援的 DLAMIs
下列 DLAMI 支援 P6 執行個體:
這些 DLAMI 包含操作 P6-B200 執行個體所需的下列軟體:
軟體 |
最低版本需求 |
---|---|
Nvidia CUDA 工具組 |
12.8 |
Nvidia 驅動程式 |
R570 |
NVLINK 5 |
R570 |
Linux 核心 |
6.1 |
Elastic Fabric Adapter (EFA) |
1.41.0 |
AWS OFI NCCL 外掛程式 |
1.15.0 |
確認 GPU 功能
若要確認功能 GPUs:
-
執行下列 Nvidia GPU 裝置查詢測試
$
/usr/local/cuda/extras/demo_suite/deviceQuery -
從 Device Query Run 確認下列輸出:
$ /usr/local/cuda/extras/demo_suite/deviceQuery /usr/local/cuda/extras/demo_suite/deviceQuery Starting... CUDA Device Query (Runtime API) Detected 8 CUDA Capable device(s) ... deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.8, CUDA Runtime Version = 12.8, NumDevs = 8, Device0 = NVIDIA B200, Device1 = NVIDIA B200, Device2 = NVIDIA B200, Device3 = NVIDIA B200, Device4 = NVIDIA B200, Device5 = NVIDIA B200, Device6 = NVIDIA B200, Device7 = NVIDIA B200 Result = PASS
若要確認功能正常的 NVIDIA 驅動程式:
-
執行 Nvidia 系統管理界面
$
nvidia-smi -
從系統管理界面確認下列輸出
+-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 570.133.20 Driver Version: 570.133.20 CUDA Version: 12.8 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA B200 Off | 00000000:51:00.0 Off | 0 | | N/A 32C P0 145W / 1000W | 0MiB / 183359MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ | 1 NVIDIA B200 Off | 00000000:52:00.0 Off | 0 | | N/A 30C P0 140W / 1000W | 0MiB / 183359MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ | 2 NVIDIA B200 Off | 00000000:62:00.0 Off | 0 | | N/A 31C P0 139W / 1000W | 0MiB / 183359MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ | 3 NVIDIA B200 Off | 00000000:63:00.0 Off | 0 | | N/A 29C P0 139W / 1000W | 0MiB / 183359MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ | 4 NVIDIA B200 Off | 00000000:75:00.0 Off | 0 | | N/A 31C P0 141W / 1000W | 0MiB / 183359MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ | 5 NVIDIA B200 Off | 00000000:76:00.0 Off | 0 | | N/A 31C P0 141W / 1000W | 0MiB / 183359MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ | 6 NVIDIA B200 Off | 00000000:86:00.0 Off | 0 | | N/A 32C P0 141W / 1000W | 0MiB / 183359MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ | 7 NVIDIA B200 Off | 00000000:87:00.0 Off | 0 | | N/A 30C P0 138W / 1000W | 0MiB / 183359MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | No running processes found | +-----------------------------------------------------------------------------------------+
如果您遇到 P6-B200 執行個體的任何問題,請聯絡 AWS Support。