Class Context¶
Defined in File context.h
Class Documentation¶
-
class
switchml::Context¶ Singleton class that represents the SwitchML API.
This is the starting point for all SwitchML operations. Simply create a context, start the context, do your operations, stop the context.
Public Types
-
enum
ContextState¶ An enum to describe the context’s state.
The context goes through all states sequentially during its lifetime.
Values:
-
enumerator
STARTING¶ In the process of initializing and starting.
-
enumerator
RUNNING¶ Running and ready to receive job requests.
-
enumerator
STOPPING¶ In the process of shutting down.
-
enumerator
STOPPED¶ Shutdown completed.
-
enumerator
Public Functions
-
bool
Start(Config *config = NULL)¶ Perform all needed initializations to make SwitchML ready to be used through the context api.
The function performs all of the following:
Parse configuration files
Initialize and allocate variables and structures.
Setup the backend (This includes starting worker threads)
- See
- Parameters
config – [in] A pointer to a configuration object to use. If the argument is not passed then the configuration will be created and loaded from the default configuration paths using Config::LoadFromFile().
- Returns
true Initialization was successfull and you can start using the context.
- Returns
false Initialization failed. Any subsequent calls to the context api will have undefined behavior.
-
void
Stop()¶ Performs all needed steps to stop switchml and cleanup all of its state.
The function performs all of the following:
Clean up the backend (This includes stopping worker threads and waiting for them)
Clean up all dynamically allocated memory.
- See
-
std::shared_ptr<Job>
AllReduceAsync(void *in_ptr, void *out_ptr, uint64_t numel, DataType data_type, AllReduceOperation all_reduce_operation)¶ The function will submit an all reduce Job to the Context Scheduler then return immedietly.
The reduced tensor will be stored inplace in the same buffer provided. Consider calling WaitForCompletion or GetJobStatus on the returned Job object reference to make sure that it completed.
- See
- Parameters
in_ptr – [in] Pointer to the memory where to read data
out_ptr – [in] Pointer to the memory where to write processed data (The results)
numel – [in] Number of elements (Not size)
data_type – [in] The type of the data (FLOAT32, INT32).
all_reduce_operation – [in] what kind of all reduce operation do you want to perform?
- Returns
std::shared_ptr<Job> A shared pointer to the job that was submitted.
-
std::shared_ptr<Job>
AllReduce(void *in_ptr, void *out_ptr, uint64_t numel, DataType data_type, AllReduceOperation all_reduce_operation)¶ Convenience function equivelant to calling AllReduceAsync then waiting on the returned job reference.
-
void
WaitForAllJobs()¶ Blocks the calling thread until SwitchML finishes all submited work.
Finishing includes failing and dropping the job. So the job status should be checked.
-
ContextState
GetContextState()¶ Get the current Context State.
- Returns
ContextState
Public Static Functions
-
static Context &
GetInstance()¶ Gets a reference to the single Context object.
A new instance is created (Constructor is called) when you call this function for the first time. Subsequent calls will retrieve the same context object. The instance only gets destroyed (Destructor is called) when the program exits like the default with any static object.
- Returns
Context& A reference to the context object.
-
enum