Configuring EFA clients
Use the following procedures to set up your Lustre client to access an FSx for Lustre file system via Elastic Fabric Adapter (EFA).
EFA is supported on Lustre clients running the following operating systems:
Amazon Linux 2023 (AL2023)
Red Hat Enterprise Linux (RHEL) 9.5 or newer
Ubuntu 22.04 or newer with kernel version 6.8+
EFA is supported on Lustre clients listed below. For more information, see Installing the Lustre client.
EFA is supported on Nitro v4 (or higher) EC2 instances that support EFA, excluding the trn2 instance family. See Supported instance types in the Amazon EC2 User Guide.
Topics
Step 1: Install required drivers
Note
If you are using a Deep Learning AMI, you can skip this step as both the EFA driver and NVIDIA GPUDirect Storage (GDS) driver are pre-installed.
Install the EFA driver
Follow the instructions in Step 3: Install the EFA software in the Amazon EC2 User Guide.
Install the GDS driver (optional)
This step is only required if you plan to use NVIDIA GPUDirect Storage (GDS) with FSx for Lustre.
Requirements:
Amazon EC2 P5, P5e, P5en, P6-B200, or P6e-GB200 instance
NVIDIA GDS driver version 2.24.2 or higher
To install the NVIDIA GPUDirect Storage driver on your client instance
Clone the NVIDIA GDS repository:
git clone https://github.com/NVIDIA/gds-nvidia-fs.git
Build and install the driver:
cd gds-nvidia-fs/src/ export NVFS_MAX_PEER_DEVS=128 export NVFS_MAX_PCI_DEPTH=16 sudo -E make sudo insmod nvidia-fs.ko
Step 2: Configure EFA for the Lustre client
To access an FSx for Lustre file system using an EFA interface, you must install the Lustre EFA modules and configure EFA interfaces.
Quick setup
To quickly configure your Lustre client
Connect to your Amazon EC2 instance.
Download and unzip the file containing the configuration script:
curl -O https://docs.aws.amazon.com/fsx/latest/LustreGuide/samples/configure-efa-fsx-lustre-client.zip unzip configure-efa-fsx-lustre-client.zip
Change to the
configure-efa-fsx-lustre-client
folder and run the setup script:cd
configure-efa-fsx-lustre-client
sudo ./setup.shThe script automatically does the following:
Imports Lustre modules
Configures TCP and EFA interfaces
Creates a systemd service for automatic configuration on reboot
For a list of options and usage examples you can use with the
setup.sh
script, see theREADME.md
file in the zip file.
Managing the systemd service manually
The systemd service file is created at /etc/systemd/system/configure-efa-fsx-lustre-client.service. The following are some helpful systemd-related commands:
# Check status sudo systemctl status configure-efa-fsx-lustre-client.service # View logs sudo journalctl -u configure-efa-fsx-lustre-client.service # View warnings/errors from dmesg sudo dmesg
For more information, see the README.md
file in the zip file.
Auto-mount configuration (optional)
For information on automatically mounting your Amazon FSx for Lustre file system on boot, see Mounting your Amazon FSx file system automatically.
Step 3: EFA interfaces
Each FSx for Lustre file system has a maximum limit of 1024 EFA connections across all client instances.
The configure-efa-fsx-lustre-client.sh
script automatically configures EFA interfaces based on the instance type.
Instance Type | Default Number of EFA Interfaces |
---|---|
p6e-gb200.36xlarge | 8 |
p6-b200.48xlarge | 8 |
p5en.48xlarge | 8 |
p5e.48xlarge | 8 |
p5.48xlarge | 8 |
Other instances with multiple network cards | 2 |
Other instances with a single network card | 1 |
Each configured EFA interface on a client instance counts as one connection against the 1024 EFA connection limit when connected to an FSx for Lustre file system.
Managing EFA interfaces manually
Instances with more EFA interfaces typically support higher throughput. You can customize the number of interfaces to optimize performance for your specific workloads, as long as you stay within the total EFA connection limit.
You can manually manage EFA interfaces using the following commands:
View available EFA devices:
for interface in /sys/class/infiniband/*; do if [ ! -e "$interface/device/driver" ]; then continue; fi driver=$(basename "$(realpath "$interface/device/driver")") if [ "$driver" != "efa" ]; then continue; fi echo $(basename $interface) done
View currently configured interfaces:
sudo lnetctl net show
Add an EFA interface:
sudo lnetctl net add --net efa --if
device_name
—peer-credits 32Replace
device_name
with an actual device name from the list in step 1.Remove an EFA interface:
sudo lnetctl net del --net efa --if
device_name
Replace
device_name
with an actual device name from the list in step 2.