Struct RdmaBackendConfig

Struct Documentation

struct switchml::RdmaBackendConfig

Configuration options specific to using the RDMA backend.

Public Members

uint32_t msg_numel

RDMA sends messages then the NIC splits a message into multiple packets. Thus the number of elements in a message must be a multiple of a packet’s number of elements. This reduced the overheads involved in sending packet by packet. However, it also makes losses more costly for UC transport since the loss of a single packet will make us retransmit the whole message. Hence you should tweak this value until you find the sweet spot.

std::string device_name

The name of the Infiniband device to use. It will be something like mlx5_0. You can run the ibv_devices command to list your available devices.

uint16_t device_port_id

Each Infiniband device can have multiple ports. This value lets you choose a specific port. Use the ibv_devinfo command to list all ports in each device and see their id/index. Its the first number in the description of a port “port: 1” means you should use 1 for this variable.

uint16_t gid_index

Choose from the following: 0: RoCEv1 with MAC-based GID, 1:RoCEv2 with MAC-based GID, 2: RoCEv1 with IP-based GID, 3: RoCEv2 with IP-based GID

bool use_gdr

(Not implemented yet) Whether to try to use GPU Direct or not. In case the submitted job’s data resides on the GPU, then using GPU Direct allows us to have our registerd buffer be also in GPU memory and directly send data from the GPU instead of having to copy it to a registered CPU buffer.