cntk.ops.sequence package¶
CNTK operators that are specialized in sequences. Calling these operators creates nodes in the CNTK computational graph.

broadcast_as
(operand, broadcast_as_operand, name='')[source]¶ Creates a sequence out of a nonsequence by endowing the
operand
with dynamic axes of the same type as thebroadcast_as_operand
and broadcasting the value of theoperand
along those dynamic axes.Example
>>> x = C.sequence.input_variable(shape=(3,2)) >>> t = C.sequence.last(x) >>> b = C.sequence.is_first(x) >>> y = C.sequence.broadcast_as(t, b) >>> # create one sequence of 4 tensors each with shape (3,2) >>> x0 = np.reshape(np.arange(24.0,dtype=np.float32),(1,4,3,2)) >>> y.eval({x:x0}) [array([[[ 18., 19.], [ 20., 21.], [ 22., 23.]], [[ 18., 19.], [ 20., 21.], [ 22., 23.]], [[ 18., 19.], [ 20., 21.], [ 22., 23.]], [[ 18., 19.], [ 20., 21.], [ 22., 23.]]], dtype=float32)]
Parameters:  operand – the symbolic tensor whose value will be broadcast
 broadcast_as_operand – the symbolic tensor whose dynamic axes will be used to broadcast the operand
 name (str) – the name of the node in the network
Returns:

delay
(x, initial_state=None, time_step=1, name='')[source]¶ This function combines
past_value()
andfuture_value()
into a single function. This is useful when the time_step is computed and can be positive, negative, or 0.Parameters:  x – the tensor (or its name) from which the past value is obtained
 initial_state – tensor or scalar representing the initial value to be used when the input tensor is shifted in time.
 time_step (int) – the number of time steps to look into the past, where negative values mean to look into the future, and 0 means a noop (default 1).
 name (str, optional) – the name of the Function instance in the network

first
(seq, name='')[source]¶ Returns the first element of its symbolic input sequence
seq
Example
>>> x = C.sequence.input_variable(shape=(3,2)) >>> y = C.sequence.first(x) >>> # create one sequence of 4 tensors each with shape (3,2) >>> x0 = np.reshape(np.arange(24.0,dtype=np.float32),(1,4,3,2)) >>> y.eval({x:x0}) array([[[ 0., 1.], [ 2., 3.], [ 4., 5.]]], dtype=float32)
Parameters:  seq – the symbolic tensor denoting a sequence
 name (str) – the name of the node in the network
Returns:

future_value
(x, initial_state=None, time_step=1, name='')[source]¶ This function returns the future value w.r.t.
x
. It is most often used when creating RNNs. The resulting tensor has the same shape as the input but is the next logical sample. Thetime_step
parameter is the number of steps to look into the future and is 1 by default. If there is no future value (i.e. the current sample is the last one in the tensor) then theinitial_state
value is returned.The initial state can be a constant (scalar or tensor), a learnable tensor or input data (which has a batch dimension, as needed for sequencetosequence models).
Example
>>> x = C.sequence.input_variable(shape=(3,2)) >>> # Create one sequence with 4 tensors of shape (3, 2) >>> x0 = np.reshape(np.arange(24,dtype=np.float32),(1,4,3,2)) >>> y = C.sequence.future_value(x) # using initial state of 0 by default >>> y.eval({x:x0}) [array([[[ 6., 7.], [ 8., 9.], [ 10., 11.]], [[ 12., 13.], [ 14., 15.], [ 16., 17.]], [[ 18., 19.], [ 20., 21.], [ 22., 23.]], [[ 0., 0.], [ 0., 0.], [ 0., 0.]]], dtype=float32)]
Parameters:  x – the tensor (or its name) from which the future value is obtained.
 initial_state – tensor or scalar representing the initial value to be used when the input tensor is shifted in time.
 time_step (int) – the number of time steps to look into the future (default 1)
 name (str, optional) – the name of the Function instance in the network
Returns:

gather
(seq, condition, new_sequence_axis_typeinfo=None, name='')[source]¶ Takes two sequences of the same length and returns a new sequence whose elements are those elements of sequence
seq
whose corresponding element incondition
is True, preserving the ordering ofseq
.This operation is also known as stream compaction, or copy_if.
Example
>>> x = C.sequence.input_variable(shape=(3,2)) >>> z = C.greater(C.reduce_sum(x),60) >>> y = C.sequence.gather(x,z) >>> # create one sequence of 4 tensors each with shape (3,2) >>> x0 = np.reshape(np.arange(24.0,dtype=np.float32),(1,4,3,2)) >>> y.eval({x:x0}) [array([[[ 12., 13.], [ 14., 15.], [ 16., 17.]], [[ 18., 19.], [ 20., 21.], [ 22., 23.]]], dtype=float32)]
Parameters:  seq – the symbolic sequence from which elements will be selected
 condition – the symbolic sequence of booleans which indicate which elements should be selected
 new_sequence_axis_typeinfo – tuple of integers indicating the scaling and additive factors for the length of the new sequence axis w.r.t. the operand sequence. This is used to determine the sequence axis to be used for the output of the gather operation. If this argument is left unspecified, a new independent sequence axis is created.
 name (str) – the name of the node in the network
Returns:

input
(shape, dtype=<cntk.default_options.default_override_or object>, needs_gradient=False, is_sparse=False, sequence_axis=Axis('defaultDynamicAxis'), name='')[source]¶ DEPRECATED.
It creates an input in the network: a place where data, such as features and labels, should be provided.
Parameters:  shape (tuple or int) – the shape of the input tensor
 dtype (np.float32 or np.float64 or np.float16) – data type. Default is np.float32.
 needs_gradients (bool, optional) – whether to backpropagates to it or not. False by default.
 is_sparse (bool, optional) – whether the variable is sparse (False by default)
 sequence_axis (
Axis
) – a dynamic axis (e.g., default_dynamic_axis())  name (str, optional) – the name of the Function instance in the network
Returns:

input_variable
(shape, dtype=np.float32, needs_gradient=False, is_sparse=False, sequence_axis=Axis.default_dynamic_axis(), name='')[source]¶ It creates an input in the network: a place where data, such as features and labels, should be provided.
Parameters:  shape (tuple or int) – the shape of the input tensor
 dtype (np.float32 or np.float64 or np.float16) – data type. Default is np.float32.
 needs_gradients (bool, optional) – whether to backpropagates to it or not. False by default.
 is_sparse (bool, optional) – whether the variable is sparse (False by default)
 sequence_axis (
Axis
) – a dynamic axis (e.g., default_dynamic_axis())  name (str, optional) – the name of the Function instance in the network
Returns:

is_first
(seq, name='')[source]¶ Returns a symbolic sequence of booleans with the same length as
seq
. The first element of the sequence is 1 and all others are 0.Example
>>> x = C.sequence.input_variable(shape=(3,2)) >>> y = C.sequence.is_first(x) >>> # create one sequence of 4 tensors each with shape (3,2) >>> x0 = np.reshape(np.arange(24.0,dtype=np.float32),(1,4,3,2)) >>> y.eval({x:x0}) [array([ 1., 0., 0., 0.], dtype=float32)]
Parameters:  seq – the symbolic tensor denoting a sequence
 name (str) – the name of the node in the network
Returns:

is_last
(seq, name='')[source]¶ Returns a symbolic sequence of booleans with the same length as
seq
. The last element of the sequence is 1 and all others are 0.Example
>>> x = C.sequence.input_variable(shape=(3,2)) >>> y = C.sequence.is_last(x) >>> # create one sequence of 4 tensors each with shape (3,2) >>> x0 = np.reshape(np.arange(24.0,dtype=np.float32),(1,4,3,2)) >>> y.eval({x:x0}) [array([ 0., 0., 0., 1.], dtype=float32)]
Parameters:  seq – the symbolic tensor denoting a sequence
 name (str) – the name of the node in the network
Returns:

last
(seq, name='')[source]¶ Returns the last element of its symbolic input sequence
seq
Example
>>> x = C.sequence.input_variable(shape=(3,2)) >>> y = C.sequence.last(x) >>> # create one sequence of 4 tensors each with shape (3,2) >>> x0 = np.reshape(np.arange(24.0,dtype=np.float32),(1,4,3,2)) >>> y.eval({x:x0}) array([[[ 18., 19.], [ 20., 21.], [ 22., 23.]]], dtype=float32)
Parameters:  seq – the symbolic tensor denoting a sequence
 name (str) – the name of the node in the network
Returns:

past_value
(x, initial_state=None, time_step=1, name='')[source]¶ This function returns the past value w.r.t.
x
. It is most often used when creating RNNs. The resulting tensor has the same shape as the input but is the previous logical sample. Thetime_step
parameter is the number of steps to look into the past and is 1 by default. If there is no past value (i.e. the current sample is the first one in the tensor) then theinitial_state
value is returned.The initial state can be a constant (scalar or tensor), a learnable tensor or input data (which has a batch dimension, as needed for sequencetosequence models).
Example
>>> # create example input: one sequence with 4 tensors of shape (3, 2) >>> from cntk.layers.typing import Tensor, Sequence >>> x = C.sequence.input_variable((3,2)) >>> x0 = np.reshape(np.arange(24,dtype=np.float32),(1,4,3,2)) >>> x0 array([[[[ 0., 1.], [ 2., 3.], [ 4., 5.]], [[ 6., 7.], [ 8., 9.], [ 10., 11.]], [[ 12., 13.], [ 14., 15.], [ 16., 17.]], [[ 18., 19.], [ 20., 21.], [ 22., 23.]]]], dtype=float32)
>>> # this demonstrates how past_value shifts the sequence by one, padding with initial_state >>> y = C.sequence.past_value(x) # initial_state is 0 by default >>> y.eval({x:x0}) [array([[[ 0., 0.], [ 0., 0.], [ 0., 0.]], [[ 0., 1.], [ 2., 3.], [ 4., 5.]], [[ 6., 7.], [ 8., 9.], [ 10., 11.]], [[ 12., 13.], [ 14., 15.], [ 16., 17.]]], dtype=float32)]
>>> # here, we pass a the initial_state as input data (e.g. sequencetosequence) >>> s = C.input_variable((3,2)) # not a sequence, e.g. a final encoder hidden state >>> s0 = np.reshape(np.arange(6,dtype=np.float32)/2,(1,3,2)) >>> s0 array([[[ 0. , 0.5], [ 1. , 1.5], [ 2. , 2.5]]], dtype=float32) >>> y = C.sequence.past_value(x, initial_state=s) >>> y.eval({x:x0, s:s0}) # same as the previous example except for the first time step [array([[[ 0. , 0.5], [ 1. , 1.5], [ 2. , 2.5]], [[ 0. , 1. ], [ 2. , 3. ], [ 4. , 5. ]], [[ 6. , 7. ], [ 8. , 9. ], [ 10. , 11. ]], [[ 12. , 13. ], [ 14. , 15. ], [ 16. , 17. ]]], dtype=float32)]
Parameters:  x – the tensor (or its name) from which the past value is obtained
 initial_state – tensor or scalar representing the initial value to be used when the input tensor is shifted in time.
 time_step (int) – the number of time steps to look into the past (default 1)
 name (str, optional) – the name of the Function instance in the network
Returns:

reduce_max
(seq, name='')[source]¶ Computes the max of the input sequence’s elements across the sequence axis.
Parameters:  seq – sequence input tensor
 name (str, optional) – the name of the Function instance in the network
Returns:

reduce_sum
(seq, name='')[source]¶ Computes the sum of the input sequence’s elements across the sequence axis.
Examples
>>> x = C.sequence.input_variable(shape=(3,2)) >>> # create one sequence of 4 tensors each with shape (3,2) >>> x0 = np.reshape(np.arange(24.0,dtype=np.float32),(1,4,3,2)) >>> y = C.sequence.reduce_sum(x) >>> y.eval({x:x0}) array([[[ 36., 40.], [ 44., 48.], [ 52., 56.]]], dtype=float32)
Parameters:  seq – sequence input tensor
 name (str, optional) – the name of the Function instance in the network
Returns:

scatter
(seq, condition, new_sequence_axis_typeinfo=None, name='')[source]¶ Performs the inverse of gather. The sequence
seq
must have as many elements as the number of True values in the sequencecondition
. It will return a sequence whose length is the same as thecondition
sequence with zeroes everywhere except for the locations wherecondition
evaluates to True in which case it will copy the elements fromseq
preserving their order.Example
>>> x = C.sequence.input_variable(shape=(3,2)) >>> t = C.sequence.last(x) >>> b = C.sequence.is_first(x) >>> y = C.sequence.scatter(t, b) >>> # create one sequence of 4 tensors each with shape (3,2) >>> x0 = np.reshape(np.arange(24.0,dtype=np.float32),(1,4,3,2)) >>> y.eval({x:x0}) [array([[[ 18., 19.], [ 20., 21.], [ 22., 23.]], [[ 0., 0.], [ 0., 0.], [ 0., 0.]], [[ 0., 0.], [ 0., 0.], [ 0., 0.]], [[ 0., 0.], [ 0., 0.], [ 0., 0.]]], dtype=float32)]
Parameters:  seq – the symbolic sequence from which elements will be copied in the output
 condition – the symbolic sequence which denotes the locations where elements should be copied
 new_sequence_axis_typeinfo – tuple of integers indicating the scaling and additive factors for the length of the new sequence axis w.r.t. the condition sequence. This is used to determine the sequence axis to be used for the output of the gather operation. If this argument is left unspecified a new independent sequence axis is created.
 name (str) – the name of the node in the network
Returns:

slice
(seq, begin_index, end_index, name='')[source]¶ Slice the input sequence.
Parameters:  seq – sequence input tensor
 begin_index (int) – the index along sequence axis where the slicing starts
 end_index (int) – the index along sequence axis where the slicing ends
 name (str, optional) – the name of the Function instance in the network
See also
Indexing in NumPy: https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html
Returns: Function

softmax
(seq, name='')[source]¶ Computes the softmax of the input across the sequence axis.
Parameters:  seq – sequence input tensor
 name (str, optional) – the name of the Function instance in the network
Returns:

unpack
(x, padding_value, no_mask_output=False, name='')[source]¶ This function unpacks the specified sequence operand ‘x’ along the most significant static axis [1] and pads any gaps with the specified ‘padding_value’. If the ‘no_mask_output’ argument is False, the returned Function has 2 outputs; viz. the unpacked nonsequence data and a mask denoting the gaps in the unpacked output due to differences across lengths of the sequences in the operand.
Parameters:  x – the sequence tensor (or its name) which is unpacked
 padding_value (np.float32 or np.float64 or np.float16) – The value to pad gaps in the unpacked tensor with.
 no_mask_output (bool, optional) – whether the Function has a mask tensor output denoting the gaps in the unpacked output due to differences across lengths of the sequences in the operand.
 name (str, optional) – the name of the Function instance in the network
Returns:

where
(condition, name='')[source]¶ Given a symbolic sequence
condition
of booleanlike (1/0) values, it will return a new sequence containing the indices for which the values were true.If
condition
has a value other than 0 or 1, it will denote a repeat factor. If a repeat factor is fractional, it will round up but deduct the overshoot from the next repeat factor.Example
>>> x = C.sequence.input_variable(shape=(3,2)) >>> z = C.greater(C.reduce_sum(x), 60) >>> # create one sequence of 4 tensors each with shape (3,2) >>> x0 = np.reshape(np.arange(24.0, dtype=np.float32), (1,4,3,2)) >>> z.eval({x:x0}) [array([ 0., 0., 1., 1.], dtype=float32)] >>> y = C.sequence.where(z) >>> y.eval({x:x0}) [array([ 2., 3.], dtype=float32)]
>>> # repeat frame[1] twice, frame[3] three times, and frame[4] twice >>> C.sequence.where(C.sequence.input_variable(1)).eval([[[1], [2], [1], [3], [2]]]) [array([ 0., 1., 1., 2., 3., 3., 3., 4., 4.], dtype=float32)] >>> # note that the above are the indices that are passed to
>>> # repeat frames with a fractional factor >>> C.sequence.where(C.sequence.input_variable(1)).eval([[[1.2]]*10]) [array([ 0., 0., 1., 2., 3., 4., 5., 5., 6., 7., 8., 9.], dtype=float32)] >>> # as a result, a 1.2 times stretch is realized by duplicating frame[0] and frame[5]
Parameters:  condition – sequence of 0 or 1 values for filtering, or other positive values for repetition (also fractional)
 name (str) – the name of the node in the network
Returns: