Struct GeneralConfig¶
Defined in File config.h
Struct Documentation¶
-
struct
switchml::GeneralConfig¶ Struct that groups general configuration options that must always be configured.
Public Members
-
uint16_t
rank¶ A unique identifier for a worker node. Like MPI ranks.
-
uint16_t
num_workers¶ The number of worker nodes in the system
-
uint16_t
num_worker_threads¶ The number of worker threads to launch for each node
-
uint32_t
max_outstanding_packets¶ The maximum number of pending packets for this worker (Not worker thread).
This number is divided between worker threads. This means that each worker thread will first send its initial burst up to this number divided by num_worker_threads. Then sends new packets only after packets are received doing this until all packets have been sent.
If you have this set to 256 and num_worker_threads set to 8 then each worker thread will send up to 32 packets.
-
uint64_t
packet_numel¶ The number of elements in a packet
-
std::string
backend¶ Which backend should the SwitchML client use?. Choose from [‘dummy’, ‘dpdk’, ‘rdma’]. Make sure that the backend you choose has been compiled.
-
std::string
scheduler¶ Which scheduler should we use to dispatch jobs to worker threads?. Choose from [‘fifo’].
-
std::string
prepostprocessor¶ Which prepostprocessor should we use to load and unload the data into and from the network. Choose from [‘bypass’, ‘cpu_exponent_quantizer’]
-
bool
instant_job_completion¶ If set to true then all jobs will be instantly completed regardless of the job type. This is used for debugging to disable all backend communication. The backend is still used to for setup and cleanup.
-
std::string
controller_ip_str¶ The IP address of the machine that’s running the controller program. Note: This is not the same as the ip address that is passed to the switch_ip argument when starting the controller.
-
uint16_t
controller_port¶ The port that the controller program is using. This is the value that you passed to the port argument when starting the controller.
-
double
timeout¶ How much time in ms should we wait before we consider that a packet is lost.
Each worker thread creates a copy of this value at the start of working on a job slice. From that point the timeout value can be increased if the number of timeouts exceeds a threshold as a backoff mechanism.
-
uint64_t
timeout_threshold¶ How many timeouts should occur before we double the timeout time?
-
uint64_t
timeout_threshold_increment¶ By how much should we increment the threshold each time its exceeded. (Setting the bar higher to avoid doubling the timeout value too much)
-
uint16_t