cntk.train.distributed module

Distributed learners manage learners in distributed environment.

class Communicator(*args, **kwargs)[source]

Bases: cntk.cntk_py.DistributedCommunicator

A communicator interface exposing communication primitives that serve as building blocks for distributed training.


Sync point to make sure all workers reach the same state.


Returns worker descriptor of current process.

Returns:descriptor of current process.
Return type:WorkerDescriptor
static finalize()[source]

Should be called when all communication is finished. No more communication should happen after this call.


Indicates if the current communicator is instantiated on the main node. The node with rank 0 is considered the main.

static num_workers()[source]

Returns information about all MPI workers.

static rank()[source]

Returns rank of current process.


Returns workers in this communicator.

Returns:workers in this communicator.
Return type:(list) of WorkerDescriptor
class DistributedLearner(*args, **kwargs)[source]

Bases: cntk.cntk_py.DistributedLearner

A distributed learner that handles data like gradients/momentums across multiple MPI workers


Returns the distributed communicator that talks to other MPI workers

Returns:descriptor of current process.
Return type:Communicator

The number of samples seen by the distributed learner.

class WorkerDescriptor[source]

Bases: cntk.cntk_py.DistributedWorkerDescriptor

Distributed worker descriptor, returned by Communicator instance.


The global rank of the worker.


The host id of the worker.

block_momentum_distributed_learner(learner, block_size, block_momentum_as_time_constant=None, use_nestrov_momentum=True, reset_sgd_momentum_after_aggregation=True, block_learning_rate=1.0, distributed_after=0)[source]

Creates a block momentum distributed learner. See [1] for more information.

Block Momentum divides the full dataset into M non-overlapping blocks, and each block is partitioned into N non-overlapping splits.

During training, a random, unprocessed block is randomly taken by the trainer and the N partitions of this block are dispatched on the workers.

  • learner – a local learner (i.e. sgd)
  • block_size (int) – size of the partition in samples
  • block_momentum_as_time_constant (float) – block momentum as time constant
  • use_nestrov_momentum (bool) – use nestrov momentum
  • reset_sgd_momentum_after_aggregation (bool) – reset SGD momentum after aggregation
  • block_learning_rate (float) – block learning rate
  • distributed_after (int) – number of samples after which distributed training starts

a distributed learner instance

data_parallel_distributed_learner(learner, distributed_after=0, num_quantization_bits=32, use_async_buffered_parameter_update=False)[source]

Creates a data parallel distributed learner

  • learner – a local learner (i.e. sgd)
  • distributed_after (int) – number of samples after which distributed training starts
  • num_quantization_bits (int) – number of bits for quantization (1 to 32)
  • use_async_buffered_parameter_update (bool) – use async buffered parameter update, currently must be False

a distributed learner instance


Creates a non quantized MPI communicator.