cntk.layers.layers module¶
Blocks in the network that are used layerlike, i.e. layered on top of each other e.g. a fully connected layer with nonlinearity.

Activation
(activation=identity, name='')[source]¶ Layer factory function to create an activation layer. Activation functions can be used directly in CNTK, so there is no difference between
y = relu(x)
andy = Activation(relu)(x)
. This layer is useful if one wants to configure the activation function withdefault_options
, or when its invocation should be named.Example
>>> model = Dense(500) >> Activation(C.relu) >> Dense(10) >>> # is the same as >>> model = Dense(500) >> C.relu >> Dense(10) >>> # and also the same as >>> model = Dense(500, activation=C.relu) >> Dense(10)
Parameters:  activation (
Function
, defaults to identity) – function to apply at the end, e.g. relu  name (str, defaults to '') – the name of the function instance in the network
Returns: A function that accepts one argument and applies the operation to it
Return type:  activation (

AveragePooling
(filter_shape, strides=1, pad=False, name='')[source]¶ Layer factory function to create an averagepooling layer.
Like
Convolution()
,AveragePooling()
processes items arranged on an Ndimensional grid, such as an image. Typically, each item is a vector. For each item, averagepooling computes the elementwise mean over a window (“receptive field”) of items surrounding the item’s position on the grid.The size (spatial extent) of the receptive field is given by
filter_shape
. E.g. for 2D pooling,filter_shape
should be a tuple of two integers, such as (5,5).Example
>>> f = AveragePooling((3,3), strides=2) # reduce dimensionality by 2, pooling over windows of 3x3 >>> h = C.input_variable((32,240,320)) # e.g. 32dim feature map >>> hp = f(h) >>> hp.shape # spatial dimension has been halved due to stride, and lost one due to 3x3 window without padding (32, 119, 159)
>>> f = AveragePooling((2,2), strides=2) >>> f.update_signature((1,4,4)) >>> im = np.array([[[3, 5, 2, 6], [4, 2, 8, 3], [1, 6, 4, 7], [7, 3, 5, 9]]]) # a 4x4 image (featuremap depth 1 for simplicity) >>> im array([[[3, 5, 2, 6], [4, 2, 8, 3], [1, 6, 4, 7], [7, 3, 5, 9]]]) >>> f([[im]]) # due to strides=2, this computes the averages of each 2x2 subblock array([[[[ 3.5 , 4.75], [ 4.25, 6.25]]]], dtype=float32)
Parameters:  filter_shape (int or tuple of ints) – shape (spatial extent) of the receptive field, not including the input featuremap depth. E.g. (3,3) for a 2D convolution.
 strides (int or tuple of ints, defaults to 1) – stride (increment when sliding over the input). Use a tuple to specify a peraxis value.
 pad (bool or tuple of bools, defaults to False) – if False, then the pooling operation will be shifted over the “valid”
area of input, that is, no value outside the area is used. If
pad=True
on the other hand, pooling will be applied to all input positions, and positions outside the valid region will be excluded from the averaging. Use a tuple to specify a peraxis value.  name (str, defaults to '') – the name of the function instance in the network
Returns: A function that accepts one argument and applies the averagepooling operation to it
Return type:

BatchNormalization
(map_rank=None, init_scale=1, normalization_time_constant=5000, blend_time_constant=0, epsilon=0.00001, use_cntk_engine=False, disable_regularization=False, name='')[source]¶ Layer factory function to create a batchnormalization layer.
Batch normalization applies this formula to every input element (elementwise):
y = (x  batch_mean) / (batch_stddev + epsilon) * scale + bias
wherebatch_mean
andbatch_stddev
are estimated on the minibatch andscale
andbias
are learned parameters.During operation, this layer also estimates an aggregate running mean and standard deviation for use in inference.
A
BatchNormalization
layer instance owns its learnable parameter tensors and exposes them as attributes.scale
and.bias
. The aggregate estimates are exposed as attributesaggregate_mean
,aggregate_variance
, andaggregate_count
.Example
>>> # BatchNorm on an image with spatial pooling >>> f = BatchNormalization(map_rank=1) >>> f.update_signature((3,480,640)) >>> f.bias.shape, f.scale.shape # due to spatial pooling (map_rank=1), there are only 3 biases and scales, shared across all pixel positions ((3,), (3,))
Parameters:  map_rank (1 or
None
) – passing 1 means spatiallypooled batchnormalization, where normalization values will be tied across all pixel positions; whileNone
will normalize all elements of the input tensor independently  init_scale (float, default 1) – initial value for the
scale
parameter  normalization_time_constant (int, default 5000) – time constant for smoothing the batch statistics in order to compute aggregate estimates for inference.
 epsilon (float, default 0.00001) – epsilon added to the variance to avoid division by 0
 use_cntk_engine (bool, default
False
) – ifTrue
then use CNTK’s own engine instead of NVidia’s.  disable_regularization (bool, default
False
) – ifTrue
then disable regularization in BatchNormalization  name (str, optional) – the name of the function instance in the network
Returns: A function that accepts one argument and applies the operation to it
Return type:  map_rank (1 or

Convolution
(filter_shape, num_filters=None, sequential=False, activation=identity, init=glorot_uniform(), pad=False, strides=1, sharing=True, bias=True, init_bias=0, reduction_rank=1, transpose_weight=False, dilation=1, groups=1, max_temp_mem_size_in_samples=0, op_name='Convolution', name='')[source]¶ Layer factory function to create a convolution layer.
This implements a convolution operation over items arranged on an Ndimensional grid, such as pixels in an image. Typically, each item is a vector (e.g. pixel: R,G,B), and the result is, in turn, a vector. The itemgrid dimensions are referred to as the spatial dimensions (e.g. dimensions of an image), while the vector dimension of the individual items is often called featuremap depth.
For each item, convolution gathers a window (“receptive field”) of items surrounding the item’s position on the grid, and applies a little fullyconnected network to it (the same little network is applied to all item positions). The size (spatial extent) of the receptive field is given by
filter_shape
. E.g. to specify a 2D convolution,filter_shape
should be a tuple of two integers, such as (5,5); an example for a 3D convolution (e.g. video or an MRI scan) would befilter_shape=(3,3,3)
; while for a 1D convolution (e.g. audio or text),filter_shape
has one element, such as (3,) or just 3.The dimension of the input items (input featuremap depth) is not to be specified. It is known from the input. The dimension of the output items (output featuremap depth) generated for each item position is given by
num_filters
.If the input is a sequence, the sequence elements are by default treated independently. To convolve along the sequence dimension as well, pass
sequential=True
. This is useful for variablelength inputs, such as video or naturallanguage processing (word ngrams). Note, however, that convolution does not support sparse inputs.Both input and output items can be scalars intead of vectors. For scalarvalued input items, such as pixels on a blackandwhite image, or samples of an audio clip, specify
reduction_rank=0
. If the output items are scalar, passnum_filters=()
orNone
.A
Convolution
instance owns its weight parameter tensors W and b, and exposes them as an attributes.W
and.b
. The weights will have the shape(num_filters, input_feature_map_depth, *filter_shape)
Example
>>> # 2D convolution of 5x4 receptive field with output featuremap depth 128: >>> f = Convolution((5,4), 128, activation=C.relu) >>> x = C.input_variable((3,480,640)) # 3channel color image >>> h = f(x) >>> h.shape (128, 476, 637) >>> f.W.shape # will have the form (num_filters, input_depth, *filter_shape) (128, 3, 5, 4)
>>> # 2D convolution over a onechannel blackandwhite image, padding, and stride 2 along width dimension >>> f = Convolution((3,3), 128, reduction_rank=0, pad=True, strides=(1,2), activation=C.relu) >>> x = C.input_variable((480,640)) >>> h = f(x) >>> h.shape (128, 480, 320) >>> f.W.shape (128, 1, 3, 3)
>>> # 3D convolution along dynamic axis over a sequence of 2D color images >>> from cntk.layers.typing import Sequence, Tensor >>> f = Convolution((2,5,4), 128, sequential=True, activation=C.relu) # over 2 consecutive frames >>> x = C.input_variable(**Sequence[Tensor[3,480,640]]) # a variablelength video of 640x480 RGB images >>> h = f(x) >>> h.shape # this is the shape per video frame: 637x476 activation vectors of length 128 each (128, 476, 637) >>> f.W.shape # (output featuer map depth, input depth, and the three filter dimensions) (128, 3, 2, 5, 4)
Parameters:  filter_shape (int or tuple of ints) – shape (spatial extent) of the receptive field, not including the input featuremap depth. E.g. (3,3) for a 2D convolution.
 num_filters (int, defaults to None) – number of filters (output featuremap depth), or
()
to denote scalar output items (output shape will have no depth axis).  sequential (bool, defaults to False) – if True, also convolve along the dynamic axis.
filter_shape[0]
corresponds to dynamic axis.  activation (
Function
, defaults to identity) – optional function to apply at the end, e.g. relu  init (scalar or NumPy array or
cntk.initializer
, defaults toglorot_uniform()
) – initial value of weights W  pad (bool or tuple of bools, defaults to False) – if False, then the filter will be shifted over the “valid”
area of input, that is, no value outside the area is used. If
pad=True
on the other hand, the filter will be applied to all input positions, and positions outside the valid region will be considered containing zero. Use a tuple to specify a peraxis value.  strides (int or tuple of ints, defaults to 1) – stride of the convolution (increment when sliding the filter over the input). Use a tuple to specify a peraxis value.
 sharing (bool, defaults to True) – When True, every position uses the same Convolution kernel. When False, you can have a different Convolution kernel per position, but False is not supported.
 bias (bool, optional, defaults to True) – the layer will have no bias if False is passed here
 init_bias (scalar or NumPy array or
cntk.initializer
, defaults to 0) – initial value of weights b  reduction_rank (int, defaults to 1) – set to 0 if input items are scalars (input has no depth axis), e.g. an audio signal or a blackandwhite image that is stored with tensor shape (H,W) instead of (1,H,W)
 transpose_weight (bool, defaults to False) – When this is True this is convolution, otherwise this is correlation (which is common for most toolkits)
 dilation (tuple, optional) – the dilation value along each axis, default 1 mean no dilation.
 groups (int, default 1) – number of groups during convolution, that controls the connections between input and output channels. Deafult value is 1, which means that all input channels are convolved to produce all output channels. A value of N would mean that the input (and output) channels are divided into N groups with the input channels in one group (say ith input group) contributing to output channels in only one group (ith output group). Number of input and output channels must be divisble by value of groups argument. Also, value of this argument must be strictly positive, i.e. groups > 0.
 max_temp_mem_size_in_samples (int, defaults to 0) – Limits the amount of memory for intermediate convolution results. A value of 0 means, memory is automatically managed.
 name (str, defaults to '') – the name of the function instance in the network
Returns: A function that accepts one argument and applies the convolution operation to it
Return type:

Convolution1D
(filter_shape, num_filters=None, activation=identity, init=glorot_uniform(), pad=False, strides=1, bias=True, init_bias=0, reduction_rank=1, name='')[source]¶ Layer factory function to create a 1D convolution layer with optional nonlinearity. Same as Convolution() except that filter_shape is verified to be 1dimensional. See Convolution() for extensive documentation.
Parameters:  filter_shape (int or tuple of ints) – shape (spatial extent) of the receptive field, not including the input featuremap depth. E.g. (3,3) for a 2D convolution.
 num_filters (int, defaults to None) – number of filters (output featuremap depth), or
()
to denote scalar output items (output shape will have no depth axis).  activation (
Function
, defaults to identity) – optional function to apply at the end, e.g. relu  init (scalar or NumPy array or
cntk.initializer
, defaults toglorot_uniform()
) – initial value of weights W  pad (bool or tuple of bools, defaults to False) – if False, then the filter will be shifted over the “valid”
area of input, that is, no value outside the area is used. If
pad=True
on the other hand, the filter will be applied to all input positions, and positions outside the valid region will be considered containing zero. Use a tuple to specify a peraxis value.  strides (int or tuple of ints, defaults to 1) – stride of the convolution (increment when sliding the filter over the input). Use a tuple to specify a peraxis value.
 bias (bool, defaults to True) – the layer will have no bias if False is passed here
 init_bias (scalar or NumPy array or
cntk.initializer
, defaults to 0) – initial value of weights b  reduction_rank (int, defaults to 1) – set to 0 if input items are scalars (input has no depth axis), e.g. an audio signal or a blackandwhite image that is stored with tensor shape (H,W) instead of (1,H,W)
 dilation (tuple, optional) – the dilation value along each axis, default 1 mean no dilation.
 name (str, defaults to '') – the name of the function instance in the network
Returns: A function that accepts one argument and applies the convolution operation to it
Return type:

Convolution2D
(filter_shape, num_filters=None, activation=identity, init=glorot_uniform(), pad=False, strides=1, bias=True, init_bias=0, reduction_rank=1, name='')[source]¶ Layer factory function to create a 2D convolution layer with optional nonlinearity. Same as Convolution() except that filter_shape is verified to be 2dimensional. See Convolution() for extensive documentation.
Parameters:  filter_shape (int or tuple of ints) – shape (spatial extent) of the receptive field, not including the input featuremap depth. E.g. (3,3) for a 2D convolution.
 num_filters (int, defaults to None) – number of filters (output featuremap depth), or
()
to denote scalar output items (output shape will have no depth axis).  activation (
Function
, defaults to identity) – optional function to apply at the end, e.g. relu  init (scalar or NumPy array or
cntk.initializer
, defaults toglorot_uniform()
) – initial value of weights W  pad (bool or tuple of bools, defaults to False) – if False, then the filter will be shifted over the “valid”
area of input, that is, no value outside the area is used. If
pad=True
on the other hand, the filter will be applied to all input positions, and positions outside the valid region will be considered containing zero. Use a tuple to specify a peraxis value.  strides (int or tuple of ints, defaults to 1) – stride of the convolution (increment when sliding the filter over the input). Use a tuple to specify a peraxis value.
 bias (bool, defaults to True) – the layer will have no bias if False is passed here
 init_bias (scalar or NumPy array or
cntk.initializer
, defaults to 0) – initial value of weights b  reduction_rank (int, defaults to 1) – set to 0 if input items are scalars (input has no depth axis), e.g. an audio signal or a blackandwhite image that is stored with tensor shape (H,W) instead of (1,H,W)
 dilation (tuple, optional) – the dilation value along each axis, default 1 mean no dilation.
 groups (int, default 1) – number of groups during convolution, that controls the connections between input and output channels. Deafult value is 1, which means that all input channels are convolved to produce all output channels. A value of N would mean that the input (and output) channels are divided into N groups with the input channels in one group (say ith input group) contributing to output channels in only one group (ith output group). Number of input and output channels must be divisble by value of groups argument. Also, value of this argument must be strictly positive, i.e. groups > 0.
 name (str, defaults to '') – the name of the function instance in the network
Returns: A function that accepts one argument and applies the convolution operation to it
Return type:

Convolution3D
(filter_shape, num_filters=None, activation=identity, init=glorot_uniform(), pad=False, strides=1, bias=True, init_bias=0, reduction_rank=1, name='')[source]¶ Layer factory function to create a 3D convolution layer with optional nonlinearity. Same as Convolution() except that filter_shape is verified to be 3dimensional. See Convolution() for extensive documentation.
Parameters:  filter_shape (int or tuple of ints) – shape (spatial extent) of the receptive field, not including the input featuremap depth. E.g. (3,3) for a 2D convolution.
 num_filters (int, defaults to None) – number of filters (output featuremap depth), or
()
to denote scalar output items (output shape will have no depth axis).  activation (
Function
, defaults to identity) – optional function to apply at the end, e.g. relu  init (scalar or NumPy array or
cntk.initializer
, defaults toglorot_uniform()
) – initial value of weights W  pad (bool or tuple of bools, defaults to False) – if False, then the filter will be shifted over the “valid”
area of input, that is, no value outside the area is used. If
pad=True
on the other hand, the filter will be applied to all input positions, and positions outside the valid region will be considered containing zero. Use a tuple to specify a peraxis value.  strides (int or tuple of ints, defaults to 1) – stride of the convolution (increment when sliding the filter over the input). Use a tuple to specify a peraxis value.
 bias (bool, defaults to True) – the layer will have no bias if False is passed here
 init_bias (scalar or NumPy array or
cntk.initializer
, defaults to 0) – initial value of weights b  reduction_rank (int, defaults to 1) – set to 0 if input items are scalars (input has no depth axis), e.g. an audio signal or a blackandwhite image that is stored with tensor shape (H,W) instead of (1,H,W)
 dilation (tuple, optional) – the dilation value along each axis, default 1 mean no dilation.
 groups (int, default 1) – number of groups during convolution, that controls the connections between input and output channels. Deafult value is 1, which means that all input channels are convolved to produce all output channels. A value of N would mean that the input (and output) channels are divided into N groups with the input channels in one group (say ith input group) contributing to output channels in only one group (ith output group). Number of input and output channels must be divisble by value of groups argument. Also, value of this argument must be strictly positive, i.e. groups > 0.
 name (str, defaults to '') – the name of the function instance in the network
Returns: A function that accepts one argument and applies the convolution operation to it
Return type:

ConvolutionTranspose
(filter_shape, num_filters, activation=identity, init=glorot_uniform(), pad=False, strides=1, sharing=True, bias=True, init_bias=0, output_shape=None, reduction_rank=1, max_temp_mem_size_in_samples=0, name='')[source]¶ Layer factory function to create a convolution transpose layer.
This implements a convolution_transpose operation over items arranged on an Ndimensional grid, such as pixels in an image. Typically, each item is a vector (e.g. pixel: R,G,B), and the result is, in turn, a vector. The itemgrid dimensions are referred to as the spatial dimensions (e.g. dimensions of an image), while the vector dimensions of the individual items are often called featuremap depth.
Convolution transpose is also known as
fractionally strided convolutional layers
, or,deconvolution
. This operation is used in image and language processing applications. It supports arbitrary dimensions, strides, and padding.The forward and backward computation of convolution transpose is the inverse of convolution. That is, during forward pass the input layer’s items are spread into the output same as the backward spread of gradients in convolution. The backward pass, on the other hand, performs a convolution same as the forward pass of convolution.
The size (spatial extent) of the receptive field for convolution transpose is given by
filter_shape
. E.g. to specify a 2D convolution transpose,filter_shape
should be a tuple of two integers, such as (5,5); an example for a 3D convolution transpose (e.g. video or an MRI scan) would befilter_shape=(3,3,3)
; while for a 1D convolution transpose (e.g. audio or text),filter_shape
has one element, such as (3,).The dimension of the input items (featuremap depth) is not specified, but known from the input. The dimension of the output items generated for each item position is given by
num_filters
.A
ConvolutionTranspose
instance owns its weight parameter tensors W and b, and exposes them as an attributes.W
and.b
. The weights will have the shape(input_feature_map_depth, num_filters, *filter_shape)
.Example
>>> # 2D convolution transpose of 3x4 receptive field with output featuremap depth 128: >>> f = ConvolutionTranspose((3,4), 128, activation=C.relu) >>> x = C.input_variable((3,480,640)) # 3channel color image >>> h = f(x) >>> h.shape (128, 482, 643) >>> f.W.shape # will have the form (input_depth, num_filters, *filter_shape) (3, 128, 3, 4)
Parameters:  filter_shape (int or tuple of ints) – shape (spatial extent) of the receptive field, not including the input featuremap depth. E.g. (3,3) for a 2D convolution.
 num_filters (int) – number of filters (output featuremap depth), or
()
to denote scalar output items (output shape will have no depth axis).  activation (
Function
, optional) – optional function to apply at the end, e.g. relu  init (scalar or
cntk.initializer
, defaultglorot_uniform()
) – initial value of weights W  pad (bool or tuple of bools, default False) – if False, then the filter will be shifted over the “valid”
area of input, that is, no value outside the area is used. If
pad=True
on the other hand, the filter will be applied to all input positions, and positions outside the valid region will be considered containing zero. Use a tuple to specify a peraxis value.  strides (int or tuple of ints, default 1) – stride of the convolution (increment when sliding the filter over the input). Use a tuple to specify a peraxis value.
 sharing (bool, default True) – weight sharing, must be True for now.
 bias (bool, optional, default True) – the layer will have no bias if False is passed here
 init_bias (scalar or NumPy array or
cntk.initializer
) – initial value of weights b  output_shape (int or tuple of ints) – output shape. When strides > 2, the output shape is nondeterministic. User can specify the wanted output shape. Note the specified shape must satisify the condition that if a convolution is perform from the output with the same setting, the result must have same shape as the input. that is stored with tensor shape (H,W) instead of (1,H,W)
 reduction_rank (int, defaults to 1) – set to 0 if input items are scalars (input has no depth axis), e.g. an audio signal or a blackandwhite image.
 dilation (tuple, optional) – the dilation value along each axis, default 1 mean no dilation.
 max_temp_mem_size_in_samples (int, default 0) – set to a positive number to define the maximum workspace memory for convolution.
 name (str, optional) – the name of the Function instance in the network
Returns: Function
that accepts one argument and applies the convolution operation to it

ConvolutionTranspose1D
(filter_shape, num_filters, activation=identity, init=glorot_uniform(), pad=False, strides=1, bias=True, init_bias=0, output_shape=None, name='')[source]¶ Layer factory function to create a 1D convolution transpose layer with optional nonlinearity. Same as ConvolutionTranspose() except that filter_shape is verified to be 1dimensional. See ConvolutionTranspose() for extensive documentation.

ConvolutionTranspose2D
(filter_shape, num_filters, activation=identity, init=glorot_uniform(), pad=False, strides=1, bias=True, init_bias=0, output_shape=None, name='')[source]¶ Layer factory function to create a 2D convolution transpose layer with optional nonlinearity. Same as ConvolutionTranspose() except that filter_shape is verified to be 2dimensional. See ConvolutionTranspose() for extensive documentation.

ConvolutionTranspose3D
(filter_shape, num_filters, activation=identity, init=glorot_uniform(), pad=False, strides=1, bias=True, init_bias=0, output_shape=None, name='')[source]¶ Layer factory function to create a 3D convolution transpose layer with optional nonlinearity. Same as ConvolutionTranspose() except that filter_shape is verified to be 3dimensional. See ConvolutionTranspose() for extensive documentation.

Dense
(shape, activation=identity, init=glorot_uniform(), input_rank=None, map_rank=None, bias=True, init_bias=0, name='')[source]¶ Layer factory function to create an instance of a fullyconnected linear layer of the form activation(input @ W + b) with weights W and bias b, and activation and b being optional. shape may describe a tensor as well.
A
Dense
layer instance owns its parameter tensors W and b, and exposes them as attributes.W
and.b
.Example
>>> f = Dense(5, activation=C.relu) >>> x = C.input_variable(3) >>> h = f(x) >>> h.shape (5,) >>> f.W.shape (3, 5) >>> f.b.value array([ 0., 0., 0., 0., 0.], dtype=float32)
>>> # activation through default options >>> with C.default_options(activation=C.relu): ... f = Dense(500)
The
Dense
layer can be applied to inputs that are tensors, not just vectors. This is useful, e.g., at the top of a imageprocessing cascade, where after many convolutions with padding and strides it is difficult to know the precise dimensions. For this case, CNTK has an extended definition of matrix product, in which the input tensor will be treated as if it had been automatically flattened. The weight matrix will be a tensor that reflects the “flattened” dimensions in its axes.Example
>>> f = Dense(5, activation=C.softmax) # a 5class classifier >>> x = C.input_variable((64,16,16)) # e.g. an image reduced by a convolution stack >>> y = f(x) >>> y.shape (5,) >>> f.W.shape # "row" dimension of "matrix" consists of 3 axes that match the input (64, 16, 16, 5)
This behavior can be modified by telling CNTK either the number of axes that should not be projected (
map_rank
) or the rank of the input (input_rank
). If neither is specified, all input dimensions are projected, as in the example above.Example
>>> f = Dense(5, activation=C.softmax, input_rank=2) # a 5class classifier >>> x = C.input_variable((10, 3, 3)) # e.g. 10 parallel 3x3 objects. Input has input_rank=2 axes >>> y = f(x) >>> y.shape # the 10 parallel objects are classified separately, the "10" dimension is retained (10, 5) >>> f.W.shape # "row" dimension of "matrix" consists of (3,3) matching the input axes to project (3, 3, 5)
>>> f = Dense(5, activation=C.softmax, map_rank=2) >>> x = C.input_variable((4, 6, 3, 3, 3)) # e.g. 24 parallel 3x3x3 objects arranged in a 4x6 grid. The grid is to be retained >>> y = f(x) >>> y.shape # the 4x6 elements are classified separately, the grid structure is retained (4, 6, 5) >>> f.W.shape # "row" dimension of "matrix" consists of (3,3) matching the input axes to project (3, 3, 3, 5) >>> z = y([np.zeros(x.shape)]) >>> assert z.shape == (1, 4, 6, 5)
Parameters:  shape (int or tuple of ints) – vector or tensor dimension of the output of this layer
 activation (
Function
, defaults to identity) – optional function to apply at the end, e.g. relu  init (scalar or NumPy array or
cntk.initializer
, defaults toglorot_uniform()
) – initial value of weights W  input_rank (int, defaults to None) – number of inferred axes to add to W (map_rank must not be given)
 map_rank (int, defaults to None) – expand W to leave exactly map_rank axes (input_rank must not be given)
 bias (bool, optional, defaults to True) – the layer will have no bias if False is passed here
 init_bias (scalar or NumPy array or
cntk.initializer
, defaults to 0) – initial value of weights b  name (str, defaults to '') – the name of the function instance in the network
Returns: A function that accepts one argument and applies the operation to it
Return type:

Dropout
(dropout_rate=None, keep_prob=None, seed=4294967293, name='')[source]¶ Layer factory function to create a dropout layer.
The dropout rate can be specified as the probability of dropping a value (
dropout_rate
). E.g.Dropout(0.3)
means “drop 30% of the activation values.” Alternatively, it can also be specified as the probability of keeping a value (keep_prob
).The dropout operation is only applied during training. During testing, this is a noop. To make sure that this leads to correct results, the dropout operation in training multiplies the result by (1/(1
dropout_rate
)).Example
>>> f = Dropout(0.2) # "drop 20% of activations" >>> h = C.input_variable(3) >>> hd = f(h)
>>> f = Dropout(keep_prob=0.8) # "keep 80%" >>> h = C.input_variable(3) >>> hd = f(h)
Parameters:  dropout_rate (float) – probability of dropping out an element, mutually exclusive with
keep_prob
 keep_prob (float) – probability of keeping an element, mutually exclusive with
dropout_rate
 seed (int) – random seed.
 name (str, defaults to '') – the name of the function instance in the network
Returns: A function that accepts one argument and applies the operation to it
Return type:  dropout_rate (float) – probability of dropping out an element, mutually exclusive with

Embedding
(shape=None, init=glorot_uniform(), weights=None, name='')[source]¶ Layer factory function to create a embedding layer.
An embedding is conceptually a lookup table. For every input token (e.g. a word or any category label), the corresponding entry in the lookup table is returned.
In CNTK, discrete items such as words are represented as onehot vectors. The table lookup is realized as a matrix product, with a matrix whose rows are the embedding vectors. Note that multiplying a matrix from the left with a onehot vector is the same as copying out the row for which the input vector is 1. CNTK has special optimizations to make this operation as efficient as an actual table lookup if the input is sparse.
The lookup table in this layer is learnable, unless a userspecified one is supplied through the
weights
parameter. For example, to use an existing embedding table from a file in numpy format, use this:Embedding(weights=np.load('PATH.npy'))
To initialize a learnable lookup table with a given numpy array that is to be used as the initial value, pass that array to the
init
parameter (notweights
).An
Embedding
instance owns its weight parameter tensor E, and exposes it as an attribute.E
.Example
>>> # learnable embedding >>> f = Embedding(5) >>> x = C.input_variable(3) >>> e = f(x) >>> e.shape (5,) >>> f.E.shape (3, 5)
>>> # usersupplied embedding >>> f = Embedding(weights=[[.5, .3, .1, .4, .2], [.7, .6, .3, .2, .9]]) >>> f.E.value array([[ 0.5, 0.3, 0.1, 0.4, 0.2], [ 0.7, 0.6, 0.3, 0.2, 0.9]], dtype=float32) >>> x = C.input_variable(2, is_sparse=True) >>> e = f(x) >>> e.shape (5,) >>> e(C.Value.one_hot([[1], [0], [0], [1]], num_classes=2)) array([[ 0.7, 0.6, 0.3, 0.2, 0.9], [ 0.5, 0.3, 0.1, 0.4, 0.2], [ 0.5, 0.3, 0.1, 0.4, 0.2], [ 0.7, 0.6, 0.3, 0.2, 0.9]], dtype=float32)
Parameters:  shape (int or tuple of ints) – vector or tensor dimension of the output of this layer
 init (scalar or NumPy array or
cntk.initializer
, defaults toglorot_uniform()
) – (learnable embedding only) initial value of weights E  weights (NumPy array, mutually exclusive with
init
, defuats to None) – (usersupplied embedding only) the lookup table. The matrix rows are the embedding vectors,weights[i,:]
being the embedding that corresponds to input category i.  name (str, defaults to '') – the name of the function instance in the network
Returns: A function that accepts one argument and applies the embedding operation to it
Return type:

GlobalAveragePooling
(name='')[source]¶ Layer factory function to create a global averagepooling layer.
The global averagepooling operation computes the elementwise mean over all items on an Ndimensional grid, such as an image.
This operation is the same as applying
reduce_mean()
to all grid dimensions.Example
>>> f = GlobalAveragePooling() >>> f.update_signature((1,4,4)) >>> im = np.array([[[3, 5, 2, 6], [4, 2, 8, 3], [1, 6, 4, 7], [7, 3, 5, 9]]]) # a 4x4 image (featuremap depth 1 for simplicity) >>> im array([[[3, 5, 2, 6], [4, 2, 8, 3], [1, 6, 4, 7], [7, 3, 5, 9]]]) >>> f([[im]]) array([[[[ 4.6875]]]], dtype=float32)
Parameters: name (str, defaults to '') – the name of the function instance in the network Returns: A function that accepts one argument and applies the operation to it Return type: cntk.ops.functions.Function

GlobalMaxPooling
(name='')[source]¶ Layer factory function to create a global maxpooling layer.
The global maxpooling operation computes the elementwise maximum over all items on an Ndimensional grid, such as an image.
This operation is the same as applying
reduce_max()
to all grid dimensions.Example
>>> f = GlobalMaxPooling() >>> f.update_signature((1,4,4)) >>> im = np.array([[[3, 5, 2, 6], [4, 2, 8, 3], [1, 6, 4, 7], [7, 3, 5, 9]]]) # a 4x4 image (featuremap depth 1 for simplicity) >>> im array([[[3, 5, 2, 6], [4, 2, 8, 3], [1, 6, 4, 7], [7, 3, 5, 9]]]) >>> f([[im]]) array([[[[ 9.]]]], dtype=float32)
Parameters: name (str, defaults to '') – the name of the function instance in the network Returns: A function that accepts one argument and applies the operation to it Return type: cntk.ops.functions.Function

Label
(name)[source]¶ Layer factory function to create a dummy layer with a given name. This can be used to access an intermediate value flowing through computation.
Parameters: name (str) – the name of the function instance in the network Example
>>> model = Dense(500) >> Label('hidden') >> Dense(10) >>> model.update_signature(10) >>> intermediate_val = model.hidden >>> intermediate_val.shape (500,)
Returns: A function that accepts one argument and returns it with the desired name attached Return type: cntk.ops.functions.Function

LayerNormalization
(initial_scale=1, initial_bias=0, epsilon=0.00001, name='')[source]¶ Layer factory function to create a function that implements layer normalization.
Layer normalization applies this formula to every input element (elementwise):
y = (x  mean(x)) / (stddev(x) + epsilon) * scale + bias
wherescale
andbias
are learned parameters of the same dimention as the input/output.Example
>>> f = LayerNormalization(initial_scale=2, initial_bias=1) >>> f.update_signature(4) >>> f([np.array([4,0,0,4])]) # result has mean 1 and standard deviation 2, reflecting the initial values for scale and bias array([[ 2.99999, 0.99999, 0.99999, 2.99999]], dtype=float32)
Parameters:  initial_scale (float, default 1) – initial value for the
scale
parameter  initial_bias (float, default 0) – initial value for the
bias
parameter  epsilon (float, default 0.00001) – epsilon added to the standard deviation to avoid division by 0
 name (str, optional) – the name of the Function instance in the network
Returns: A function that accepts one argument and applies the operation to it
Return type:  initial_scale (float, default 1) – initial value for the

MaxPooling
(filter_shape, strides=1, pad=False, name='')[source]¶ Layer factory function to create a maxpooling layer.
Like
Convolution()
,MaxPooling()
processes items arranged on an Ndimensional grid, such as an image. Typically, each item is a vector. For each item, maxpooling computes the elementwise maximum over a window (“receptive field”) of items surrounding the item’s position on the grid.The size (spatial extent) of the receptive field is given by
filter_shape
. E.g. for 2D pooling,filter_shape
should be a tuple of two integers, such as (5,5).Example
>>> f = MaxPooling((3,3), strides=2) # reduce dimensionality by 2, pooling over windows of 3x3 >>> h = C.input_variable((32,240,320)) # e.g. 32dim feature map >>> hp = f(h) >>> hp.shape # spatial dimension has been halved due to stride, and lost one due to 3x3 window without padding (32, 119, 159)
>>> f = MaxPooling((2,2), strides=2) >>> f.update_signature((1,4,4)) >>> im = np.array([[[3, 5, 2, 6], [4, 2, 8, 3], [1, 6, 4, 7], [7, 3, 5, 9]]]) # a 4x4 image (featuremap depth 1 for simplicity) >>> im array([[[3, 5, 2, 6], [4, 2, 8, 3], [1, 6, 4, 7], [7, 3, 5, 9]]]) >>> f([[im]]) # due to strides=2, this picks the max out of each 2x2 subblock array([[[[ 5., 8.], [ 7., 9.]]]], dtype=float32)
Parameters:  filter_shape (int or tuple of ints) – shape (spatial extent) of the receptive field, not including the input featuremap depth. E.g. (3,3) for a 2D convolution.
 strides (int or tuple of ints, defaults to 1) – stride (increment when sliding over the input). Use a tuple to specify a peraxis value.
 pad (bool or tuple of bools, defaults to False) – if False, then the pooling operation will be shifted over the “valid”
area of input, that is, no value outside the area is used. If
pad=True
on the other hand, pooling will be applied to all input positions, and positions outside the valid region will be considered containing zero. Use a tuple to specify a peraxis value.  name (str, defaults to '') – the name of the function instance in the network
Returns: A function that accepts one argument and applies the maxpooling operation to it
Return type:

SequentialConvolution
(filter_shape, num_filters=None, activation=identity, init=glorot_uniform(), pad=False, strides=1, sharing=True, bias=True, init_bias=0, reduction_rank=1, transpose_weight=False, dilation=1, groups=1, max_temp_mem_size_in_samples=0, op_name='Convolution', name='')[source]¶ Layer factory function to create a sequential convolution layer.
This implements a sequential convolution operation pretty much almost the same as convolution operation, except that this operation convolves also on the dynamic axis(sequence), and filter_shape[0] is applied to that axis.
The dimension of the input items (input featuremap depth) is not to be specified. It is known from the input. The dimension of the output items (output featuremap depth) generated for each item position is given by
num_filters
.This is useful for variablelength inputs, such as video or naturallanguage processing (word ngrams). Note, however, that convolution does not support sparse inputs.
Both input and output items can be scalars intead of vectors. For scalarvalued input items, such as pixels on a blackandwhite image, or samples of an audio clip, specify
reduction_rank=0
. If the output items are scalar, passnum_filters=()
orNone
.A
Convolution
instance owns its weight parameter tensors W and b, and exposes them as an attributes.W
and.b
. The weights will have the shape(num_filters, input_feature_map_depth, *filter_shape)
Example
>>> # 2D sequential convolution of 5x4 receptive field with output featuremap depth 128: >>> f = Convolution((5,4), 128, activation=C.relu) >>> x = C.input_variable(**Sequence[Tensor[3,640]]) # 3channel color image, the real shape would be [width] x [channel, height], where now width is at sequence axis and can take arbitrary length. >>> h = f(x) >>> h.shape (128, 637) >>> f.W.shape # will have the form (num_filters, input_depth, *filter_shape) (128, 3, 5, 4)
>>> # 2D sequential convolution over a onechannel blackandwhite image, padding, and stride 2 along width dimension >>> f = Convolution((3,3), 128, reduction_rank=0, pad=True, strides=(1,2), activation=C.relu) >>> x = C.input_variable(**Sequence[Tensor[640]]) # similar to the above example, the image height is 640. Image width is arbitrary and takes the sequence axis. >>> h = f(x) >>> h.shape (128, 320) >>> f.W.shape (128, 1, 3, 3)
Parameters:  filter_shape (int or tuple of ints) – shape (spatial extent) of the receptive field, not including the input featuremap depth. E.g. (3,3) for a 2D convolution.
 num_filters (int, defaults to None) – number of filters (output featuremap depth), or
()
to denote scalar output items (output shape will have no depth axis).  activation (
Function
, defaults to identity) – optional function to apply at the end, e.g. relu  init (scalar or NumPy array or
cntk.initializer
, defaults toglorot_uniform()
) – initial value of weights W  pad (bool or tuple of bools, defaults to False) – if False, then the filter will be shifted over the “valid”
area of input, that is, no value outside the area is used. If
pad=True
on the other hand, the filter will be applied to all input positions, and positions outside the valid region will be considered containing zero. Use a tuple to specify a peraxis value.  strides (int or tuple of ints, defaults to 1) – stride of the convolution (increment when sliding the filter over the input). Use a tuple to specify a peraxis value.
 sharing (bool, defaults to True) – When True, every position uses the same Convolution kernel. When False, you can have a different Convolution kernel per position, but False is not supported.
 bias (bool, optional, defaults to True) – the layer will have no bias if False is passed here
 init_bias (scalar or NumPy array or
cntk.initializer
, defaults to 0) – initial value of weights b  reduction_rank (int, defaults to 1) – set to 0 if input items are scalars (input has no depth axis), e.g. an audio signal or a blackandwhite image that is stored with tensor shape (H,W) instead of (1,H,W)
 transpose_weight (bool, defaults to False) – When this is True this is convolution, otherwise this is correlation (which is common for most toolkits)
 dilation (tuple, optional) – the dilation value along each axis, default 1 mean no dilation.
 groups (int, default 1) – number of groups during convolution, that controls the connections between input and output channels. Deafult value is 1, which means that all input channels are convolved to produce all output channels. A value of N would mean that the input (and output) channels are divided into N groups with the input channels in one group (say ith input group) contributing to output channels in only one group (ith output group). Number of input and output channels must be divisble by value of groups argument. Also, value of this argument must be strictly positive, i.e. groups > 0.
 max_temp_mem_size_in_samples (int, defaults to 0) – Limits the amount of memory for intermediate convolution results. A value of 0 means, memory is automatically managed.
 name (str, defaults to '') – the name of the function instance in the network
Returns: A function that accepts one argument and applies the sequential convolution operation to it
Return type: