Class Context

Class Documentation

class switchml::Context

Singleton class that represents the SwitchML API.

This is the starting point for all SwitchML operations. Simply create a context, start the context, do your operations, stop the context.

Public Types

enum ContextState

An enum to describe the context’s state.

The context goes through all states sequentially during its lifetime.

Values:

enumerator CREATED

Was just constructed. must call Start().

enumerator STARTING

In the process of initializing and starting.

enumerator RUNNING

Running and ready to receive job requests.

enumerator STOPPING

In the process of shutting down.

enumerator STOPPED

Shutdown completed.

Public Functions

Context(Context const&) = delete
void operator=(Context const&) = delete
Context(Context&&) = delete
Context &operator=(Context&&) = delete
bool Start(Config *config = NULL)

Perform all needed initializations to make SwitchML ready to be used through the context api.

The function performs all of the following:

  • Parse configuration files

  • Initialize and allocate variables and structures.

  • Setup the backend (This includes starting worker threads)

See

Stop()

Parameters

config[in] A pointer to a configuration object to use. If the argument is not passed then the configuration will be created and loaded from the default configuration paths using Config::LoadFromFile().

Returns

true Initialization was successfull and you can start using the context.

Returns

false Initialization failed. Any subsequent calls to the context api will have undefined behavior.

void Stop()

Performs all needed steps to stop switchml and cleanup all of its state.

The function performs all of the following:

  • Clean up the backend (This includes stopping worker threads and waiting for them)

  • Clean up all dynamically allocated memory.

    See

    Start()

std::shared_ptr<Job> AllReduceAsync(void *in_ptr, void *out_ptr, uint64_t numel, DataType data_type, AllReduceOperation all_reduce_operation)

The function will submit an all reduce Job to the Context Scheduler then return immedietly.

The reduced tensor will be stored inplace in the same buffer provided. Consider calling WaitForCompletion or GetJobStatus on the returned Job object reference to make sure that it completed.

See

AllReduce()

Parameters
  • in_ptr[in] Pointer to the memory where to read data

  • out_ptr[in] Pointer to the memory where to write processed data (The results)

  • numel[in] Number of elements (Not size)

  • data_type[in] The type of the data (FLOAT32, INT32).

  • all_reduce_operation[in] what kind of all reduce operation do you want to perform?

Returns

std::shared_ptr<Job> A shared pointer to the job that was submitted.

std::shared_ptr<Job> AllReduce(void *in_ptr, void *out_ptr, uint64_t numel, DataType data_type, AllReduceOperation all_reduce_operation)

Convenience function equivelant to calling AllReduceAsync then waiting on the returned job reference.

See

AllReduceAsync()

See

Job::WaitToComplete()

void WaitForAllJobs()

Blocks the calling thread until SwitchML finishes all submited work.

Finishing includes failing and dropping the job. So the job status should be checked.

See

Job::WaitToComplete()

ContextState GetContextState()

Get the current Context State.

Returns

ContextState

const Config &GetConfig()

Get a constant reference to the active configuration.

Returns

const Config&

Stats &GetStats()

Get a reference to the statistics object used.

Returns

Stats&

Public Static Functions

static Context &GetInstance()

Gets a reference to the single Context object.

A new instance is created (Constructor is called) when you call this function for the first time. Subsequent calls will retrieve the same context object. The instance only gets destroyed (Destructor is called) when the program exits like the default with any static object.

Returns

Context& A reference to the context object.