paddlespeech.s2t.modules.encoder module

Encoder definition.

class paddlespeech.s2t.modules.encoder.BaseEncoder(input_size: int, output_size: int = 256, attention_heads: int = 4, linear_units: int = 2048, num_blocks: int = 6, dropout_rate: float = 0.1, positional_dropout_rate: float = 0.1, attention_dropout_rate: float = 0.0, input_layer: str = 'conv2d', pos_enc_layer_type: str = 'abs_pos', normalize_before: bool = True, concat_after: bool = False, static_chunk_size: int = 0, use_dynamic_chunk: bool = False, global_cmvn: Optional[Layer] = None, use_dynamic_left_chunk: bool = False, max_len: int = 5000)[source]

Bases: Layer

Methods

__call__(*inputs, **kwargs)

Call self as a function.

add_parameter(name, parameter)

Adds a Parameter instance.

add_sublayer(name, sublayer)

Adds a sub Layer instance.

apply(fn)

Applies fn recursively to every sublayer (as returned by .sublayers()) as well as self.

buffers([include_sublayers])

Returns a list of all buffers from current layer and its sub-layers.

children()

Returns an iterator over immediate children layers.

clear_gradients()

Clear the gradients of all parameters for this layer.

create_parameter(shape[, attr, dtype, ...])

Create parameters for this layer.

create_tensor([name, persistable, dtype])

Create Tensor for this layer.

create_variable([name, persistable, dtype])

Create Tensor for this layer.

eval()

Sets this Layer and all its sublayers to evaluation mode.

extra_repr()

Extra representation of this layer, you can have custom implementation of your own layer.

forward(xs, xs_lens[, decoding_chunk_size, ...])

Embed positions in tensor. Args: xs: padded input tensor (B, L, D) xs_lens: input length (B) decoding_chunk_size: decoding chunk size for dynamic chunk 0: default for training, use random dynamic chunk. <0: for decoding, use full chunk. >0: for decoding, use fixed chunk size as set. num_decoding_left_chunks: number of left chunks, this is for decoding, the chunk size is decoding_chunk_size. >=0: use num_decoding_left_chunks <0: use all left chunks Returns: encoder output tensor, lens and mask.

forward_chunk(xs, offset, required_cache_size)

Forward just one chunk Args: xs (paddle.Tensor): chunk audio feat input, [B=1, T, D], where T==(chunk_size-1)*subsampling_rate + subsample.right_context + 1 offset (int): current offset in encoder output time stamp required_cache_size (int): cache size required for next chunk compuation >=0: actual cache size <0: means all history cache is required att_cache(paddle.Tensor): cache tensor for key & val in transformer/conformer attention. Shape is (elayers, head, cache_t1, d_k * 2), where`head * d_k == hidden-dim` and cache_t1 == chunk_size * num_decoding_left_chunks. cnn_cache (paddle.Tensor): cache tensor for cnn_module in conformer, (elayers, B=1, hidden-dim, cache_t2), where cache_t2 == cnn.lorder - 1 Returns: paddle.Tensor: output of current input xs, (B=1, chunk_size, hidden-dim) paddle.Tensor: new attention cache required for next chunk, dyanmic shape (elayers, head, T, d_k*2) depending on required_cache_size paddle.Tensor: new conformer cnn cache required for next chunk, with same shape as the original cnn_cache.

forward_chunk_by_chunk(xs, decoding_chunk_size)

Forward input chunk by chunk with chunk_size like a streaming

full_name()

Full name for this layer, composed by name_scope + "/" + MyLayer.__class__.__name__

load_dict(state_dict[, use_structured_name])

Set parameters and persistable buffers from state_dict.

named_buffers([prefix, include_sublayers])

Returns an iterator over all buffers in the Layer, yielding tuple of name and Tensor.

named_children()

Returns an iterator over immediate children layers, yielding both the name of the layer as well as the layer itself.

named_parameters([prefix, include_sublayers])

Returns an iterator over all parameters in the Layer, yielding tuple of name and parameter.

named_sublayers([prefix, include_self, ...])

Returns an iterator over all sublayers in the Layer, yielding tuple of name and sublayer.

parameters([include_sublayers])

Returns a list of all Parameters from current layer and its sub-layers.

register_buffer(name, tensor[, persistable])

Registers a tensor as buffer into the layer.

register_forward_post_hook(hook)

Register a forward post-hook for Layer.

register_forward_pre_hook(hook)

Register a forward pre-hook for Layer.

set_dict(state_dict[, use_structured_name])

Set parameters and persistable buffers from state_dict.

set_state_dict(state_dict[, use_structured_name])

Set parameters and persistable buffers from state_dict.

state_dict([destination, include_sublayers, ...])

Get all parameters and persistable buffers of current layer and its sub-layers.

sublayers([include_self])

Returns a list of sub layers.

to([device, dtype, blocking])

Cast the parameters and buffers of Layer by the give device, dtype and blocking.

to_static_state_dict([destination, ...])

Get all parameters and buffers of current layer and its sub-layers.

train()

Sets this Layer and all its sublayers to training mode.

backward

output_size

register_state_dict_hook

forward(xs: Tensor, xs_lens: Tensor, decoding_chunk_size: int = 0, num_decoding_left_chunks: int = -1) Tuple[Tensor, Tensor][source]

Embed positions in tensor. Args:

xs: padded input tensor (B, L, D) xs_lens: input length (B) decoding_chunk_size: decoding chunk size for dynamic chunk

0: default for training, use random dynamic chunk. <0: for decoding, use full chunk. >0: for decoding, use fixed chunk size as set.

num_decoding_left_chunks: number of left chunks, this is for decoding,

the chunk size is decoding_chunk_size. >=0: use num_decoding_left_chunks <0: use all left chunks

Returns:

encoder output tensor, lens and mask

forward_chunk(xs: ~paddle.Tensor, offset: int, required_cache_size: int, att_cache: ~paddle.Tensor = Tensor(shape=[0, 0, 0, 0], dtype=float32, place=Place(cpu), stop_gradient=True,        []), cnn_cache: ~paddle.Tensor = Tensor(shape=[0, 0, 0, 0], dtype=float32, place=Place(cpu), stop_gradient=True,        []), att_mask: ~paddle.Tensor = Tensor(shape=[0, 0, 0], dtype=bool, place=Place(cpu), stop_gradient=True,        [])) Tuple[Tensor, Tensor, Tensor][source]

Forward just one chunk Args:

xs (paddle.Tensor): chunk audio feat input, [B=1, T, D], where

T==(chunk_size-1)*subsampling_rate + subsample.right_context + 1

offset (int): current offset in encoder output time stamp required_cache_size (int): cache size required for next chunk

compuation >=0: actual cache size <0: means all history cache is required

att_cache(paddle.Tensor): cache tensor for key & val in

transformer/conformer attention. Shape is (elayers, head, cache_t1, d_k * 2), where`head * d_k == hidden-dim` and cache_t1 == chunk_size * num_decoding_left_chunks.

cnn_cache (paddle.Tensor): cache tensor for cnn_module in conformer,

(elayers, B=1, hidden-dim, cache_t2), where cache_t2 == cnn.lorder - 1

Returns:

paddle.Tensor: output of current input xs, (B=1, chunk_size, hidden-dim) paddle.Tensor: new attention cache required for next chunk, dyanmic shape

(elayers, head, T, d_k*2) depending on required_cache_size

paddle.Tensor: new conformer cnn cache required for next chunk, with

same shape as the original cnn_cache

forward_chunk_by_chunk(xs: Tensor, decoding_chunk_size: int, num_decoding_left_chunks: int = -1) Tuple[Tensor, Tensor][source]
Forward input chunk by chunk with chunk_size like a streaming

fashion

Here we should pay special attention to computation cache in the streaming style forward chunk by chunk. Three things should be taken into account for computation in the current network:

  1. transformer/conformer encoder layers output cache

  2. convolution in conformer

  3. convolution in subsampling

However, we don't implement subsampling cache for:
  1. We can control subsampling module to output the right result by overlapping input instead of cache left context, even though it wastes some computation, but subsampling only takes a very small fraction of computation in the whole model.

  2. Typically, there are several covolution layers with subsampling in subsampling module, it is tricky and complicated to do cache with different convolution layers with different subsampling rate.

  3. Currently, nn.Sequential is used to stack all the convolution layers in subsampling, we need to rewrite it to make it work with cache, which is not prefered.

Args:

xs (paddle.Tensor): (1, max_len, dim) chunk_size (int): decoding chunk size. num_left_chunks (int): decoding with num left chunks.

output_size() int[source]
class paddlespeech.s2t.modules.encoder.ConformerEncoder(input_size: int, output_size: int = 256, attention_heads: int = 4, linear_units: int = 2048, num_blocks: int = 6, dropout_rate: float = 0.1, positional_dropout_rate: float = 0.1, attention_dropout_rate: float = 0.0, input_layer: str = 'conv2d', pos_enc_layer_type: str = 'rel_pos', normalize_before: bool = True, concat_after: bool = False, static_chunk_size: int = 0, use_dynamic_chunk: bool = False, global_cmvn: Optional[Layer] = None, use_dynamic_left_chunk: bool = False, positionwise_conv_kernel_size: int = 1, macaron_style: bool = True, selfattention_layer_type: str = 'rel_selfattn', activation_type: str = 'swish', use_cnn_module: bool = True, cnn_module_kernel: int = 15, causal: bool = False, cnn_module_norm: str = 'batch_norm', max_len: int = 5000)[source]

Bases: BaseEncoder

Conformer encoder module.

Methods

__call__(*inputs, **kwargs)

Call self as a function.

add_parameter(name, parameter)

Adds a Parameter instance.

add_sublayer(name, sublayer)

Adds a sub Layer instance.

apply(fn)

Applies fn recursively to every sublayer (as returned by .sublayers()) as well as self.

buffers([include_sublayers])

Returns a list of all buffers from current layer and its sub-layers.

children()

Returns an iterator over immediate children layers.

clear_gradients()

Clear the gradients of all parameters for this layer.

create_parameter(shape[, attr, dtype, ...])

Create parameters for this layer.

create_tensor([name, persistable, dtype])

Create Tensor for this layer.

create_variable([name, persistable, dtype])

Create Tensor for this layer.

eval()

Sets this Layer and all its sublayers to evaluation mode.

extra_repr()

Extra representation of this layer, you can have custom implementation of your own layer.

forward(xs, xs_lens[, decoding_chunk_size, ...])

Embed positions in tensor. Args: xs: padded input tensor (B, L, D) xs_lens: input length (B) decoding_chunk_size: decoding chunk size for dynamic chunk 0: default for training, use random dynamic chunk. <0: for decoding, use full chunk. >0: for decoding, use fixed chunk size as set. num_decoding_left_chunks: number of left chunks, this is for decoding, the chunk size is decoding_chunk_size. >=0: use num_decoding_left_chunks <0: use all left chunks Returns: encoder output tensor, lens and mask.

forward_chunk(xs, offset, required_cache_size)

Forward just one chunk Args: xs (paddle.Tensor): chunk audio feat input, [B=1, T, D], where T==(chunk_size-1)*subsampling_rate + subsample.right_context + 1 offset (int): current offset in encoder output time stamp required_cache_size (int): cache size required for next chunk compuation >=0: actual cache size <0: means all history cache is required att_cache(paddle.Tensor): cache tensor for key & val in transformer/conformer attention. Shape is (elayers, head, cache_t1, d_k * 2), where`head * d_k == hidden-dim` and cache_t1 == chunk_size * num_decoding_left_chunks. cnn_cache (paddle.Tensor): cache tensor for cnn_module in conformer, (elayers, B=1, hidden-dim, cache_t2), where cache_t2 == cnn.lorder - 1 Returns: paddle.Tensor: output of current input xs, (B=1, chunk_size, hidden-dim) paddle.Tensor: new attention cache required for next chunk, dyanmic shape (elayers, head, T, d_k*2) depending on required_cache_size paddle.Tensor: new conformer cnn cache required for next chunk, with same shape as the original cnn_cache.

forward_chunk_by_chunk(xs, decoding_chunk_size)

Forward input chunk by chunk with chunk_size like a streaming

full_name()

Full name for this layer, composed by name_scope + "/" + MyLayer.__class__.__name__

load_dict(state_dict[, use_structured_name])

Set parameters and persistable buffers from state_dict.

named_buffers([prefix, include_sublayers])

Returns an iterator over all buffers in the Layer, yielding tuple of name and Tensor.

named_children()

Returns an iterator over immediate children layers, yielding both the name of the layer as well as the layer itself.

named_parameters([prefix, include_sublayers])

Returns an iterator over all parameters in the Layer, yielding tuple of name and parameter.

named_sublayers([prefix, include_self, ...])

Returns an iterator over all sublayers in the Layer, yielding tuple of name and sublayer.

parameters([include_sublayers])

Returns a list of all Parameters from current layer and its sub-layers.

register_buffer(name, tensor[, persistable])

Registers a tensor as buffer into the layer.

register_forward_post_hook(hook)

Register a forward post-hook for Layer.

register_forward_pre_hook(hook)

Register a forward pre-hook for Layer.

set_dict(state_dict[, use_structured_name])

Set parameters and persistable buffers from state_dict.

set_state_dict(state_dict[, use_structured_name])

Set parameters and persistable buffers from state_dict.

state_dict([destination, include_sublayers, ...])

Get all parameters and persistable buffers of current layer and its sub-layers.

sublayers([include_self])

Returns a list of sub layers.

to([device, dtype, blocking])

Cast the parameters and buffers of Layer by the give device, dtype and blocking.

to_static_state_dict([destination, ...])

Get all parameters and buffers of current layer and its sub-layers.

train()

Sets this Layer and all its sublayers to training mode.

backward

output_size

register_state_dict_hook

class paddlespeech.s2t.modules.encoder.SqueezeformerEncoder(input_size: int, encoder_dim: int = 256, output_size: int = 256, attention_heads: int = 4, num_blocks: int = 12, reduce_idx: Optional[Union[int, List[int]]] = 5, recover_idx: Optional[Union[int, List[int]]] = 11, feed_forward_expansion_factor: int = 4, dw_stride: bool = False, input_dropout_rate: float = 0.1, pos_enc_layer_type: str = 'rel_pos', time_reduction_layer_type: str = 'conv1d', feed_forward_dropout_rate: float = 0.1, attention_dropout_rate: float = 0.1, cnn_module_kernel: int = 31, cnn_norm_type: str = 'layer_norm', dropout: float = 0.1, causal: bool = False, adaptive_scale: bool = True, activation_type: str = 'swish', init_weights: bool = True, global_cmvn: Optional[Layer] = None, normalize_before: bool = False, use_dynamic_chunk: bool = False, concat_after: bool = False, static_chunk_size: int = 0, use_dynamic_left_chunk: bool = False)[source]

Bases: Layer

Methods

__call__(*inputs, **kwargs)

Call self as a function.

add_parameter(name, parameter)

Adds a Parameter instance.

add_sublayer(name, sublayer)

Adds a sub Layer instance.

apply(fn)

Applies fn recursively to every sublayer (as returned by .sublayers()) as well as self.

buffers([include_sublayers])

Returns a list of all buffers from current layer and its sub-layers.

children()

Returns an iterator over immediate children layers.

clear_gradients()

Clear the gradients of all parameters for this layer.

create_parameter(shape[, attr, dtype, ...])

Create parameters for this layer.

create_tensor([name, persistable, dtype])

Create Tensor for this layer.

create_variable([name, persistable, dtype])

Create Tensor for this layer.

eval()

Sets this Layer and all its sublayers to evaluation mode.

extra_repr()

Extra representation of this layer, you can have custom implementation of your own layer.

forward(xs, xs_lens[, decoding_chunk_size, ...])

Embed positions in tensor. Args: xs: padded input tensor (B, L, D) xs_lens: input length (B) decoding_chunk_size: decoding chunk size for dynamic chunk 0: default for training, use random dynamic chunk. <0: for decoding, use full chunk. >0: for decoding, use fixed chunk size as set. num_decoding_left_chunks: number of left chunks, this is for decoding, the chunk size is decoding_chunk_size. >=0: use num_decoding_left_chunks <0: use all left chunks Returns: encoder output tensor, lens and mask.

forward_chunk(xs, offset, required_cache_size)

Forward just one chunk

full_name()

Full name for this layer, composed by name_scope + "/" + MyLayer.__class__.__name__

load_dict(state_dict[, use_structured_name])

Set parameters and persistable buffers from state_dict.

named_buffers([prefix, include_sublayers])

Returns an iterator over all buffers in the Layer, yielding tuple of name and Tensor.

named_children()

Returns an iterator over immediate children layers, yielding both the name of the layer as well as the layer itself.

named_parameters([prefix, include_sublayers])

Returns an iterator over all parameters in the Layer, yielding tuple of name and parameter.

named_sublayers([prefix, include_self, ...])

Returns an iterator over all sublayers in the Layer, yielding tuple of name and sublayer.

parameters([include_sublayers])

Returns a list of all Parameters from current layer and its sub-layers.

register_buffer(name, tensor[, persistable])

Registers a tensor as buffer into the layer.

register_forward_post_hook(hook)

Register a forward post-hook for Layer.

register_forward_pre_hook(hook)

Register a forward pre-hook for Layer.

set_dict(state_dict[, use_structured_name])

Set parameters and persistable buffers from state_dict.

set_state_dict(state_dict[, use_structured_name])

Set parameters and persistable buffers from state_dict.

state_dict([destination, include_sublayers, ...])

Get all parameters and persistable buffers of current layer and its sub-layers.

sublayers([include_self])

Returns a list of sub layers.

to([device, dtype, blocking])

Cast the parameters and buffers of Layer by the give device, dtype and blocking.

to_static_state_dict([destination, ...])

Get all parameters and buffers of current layer and its sub-layers.

train()

Sets this Layer and all its sublayers to training mode.

backward

calculate_downsampling_factor

check_ascending_list

output_size

register_state_dict_hook

calculate_downsampling_factor(i: int) int[source]
check_ascending_list()[source]
forward(xs: Tensor, xs_lens: Tensor, decoding_chunk_size: int = 0, num_decoding_left_chunks: int = -1) Tuple[Tensor, Tensor][source]

Embed positions in tensor. Args:

xs: padded input tensor (B, L, D) xs_lens: input length (B) decoding_chunk_size: decoding chunk size for dynamic chunk

0: default for training, use random dynamic chunk. <0: for decoding, use full chunk. >0: for decoding, use fixed chunk size as set.

num_decoding_left_chunks: number of left chunks, this is for decoding,

the chunk size is decoding_chunk_size. >=0: use num_decoding_left_chunks <0: use all left chunks

Returns:

encoder output tensor, lens and mask

forward_chunk(xs: ~paddle.Tensor, offset: int, required_cache_size: int, att_cache: ~paddle.Tensor = Tensor(shape=[0, 0, 0, 0], dtype=float32, place=Place(cpu), stop_gradient=True,        []), cnn_cache: ~paddle.Tensor = Tensor(shape=[0, 0, 0, 0], dtype=float32, place=Place(cpu), stop_gradient=True,        []), att_mask: ~paddle.Tensor = Tensor(shape=[0, 0, 0], dtype=bool, place=Place(cpu), stop_gradient=True,        [])) Tuple[Tensor, Tensor, Tensor][source]

Forward just one chunk

Args:
xs (paddle.Tensor): chunk input, with shape (b=1, time, mel-dim),

where time == (chunk_size - 1) * subsample_rate + subsample.right_context + 1

offset (int): current offset in encoder output time stamp required_cache_size (int): cache size required for next chunk

compuation >=0: actual cache size <0: means all history cache is required

att_cache (paddle.Tensor): cache tensor for KEY & VALUE in

transformer/conformer attention, with shape (elayers, head, cache_t1, d_k * 2), where head * d_k == hidden-dim and cache_t1 == chunk_size * num_decoding_left_chunks.

cnn_cache (paddle.Tensor): cache tensor for cnn_module in conformer,

(elayers, b=1, hidden-dim, cache_t2), where cache_t2 == cnn.lorder - 1

Returns:
paddle.Tensor: output of current input xs,

with shape (b=1, chunk_size, hidden-dim).

paddle.Tensor: new attention cache required for next chunk, with

dynamic shape (elayers, head, ?, d_k * 2) depending on required_cache_size.

paddle.Tensor: new conformer cnn cache required for next chunk, with

same shape as the original cnn_cache.

output_size() int[source]
class paddlespeech.s2t.modules.encoder.TransformerEncoder(input_size: int, output_size: int = 256, attention_heads: int = 4, linear_units: int = 2048, num_blocks: int = 6, dropout_rate: float = 0.1, positional_dropout_rate: float = 0.1, attention_dropout_rate: float = 0.0, input_layer: str = 'conv2d', pos_enc_layer_type: str = 'abs_pos', normalize_before: bool = True, concat_after: bool = False, static_chunk_size: int = 0, use_dynamic_chunk: bool = False, global_cmvn: Optional[Layer] = None, use_dynamic_left_chunk: bool = False)[source]

Bases: BaseEncoder

Transformer encoder module.

Methods

__call__(*inputs, **kwargs)

Call self as a function.

add_parameter(name, parameter)

Adds a Parameter instance.

add_sublayer(name, sublayer)

Adds a sub Layer instance.

apply(fn)

Applies fn recursively to every sublayer (as returned by .sublayers()) as well as self.

buffers([include_sublayers])

Returns a list of all buffers from current layer and its sub-layers.

children()

Returns an iterator over immediate children layers.

clear_gradients()

Clear the gradients of all parameters for this layer.

create_parameter(shape[, attr, dtype, ...])

Create parameters for this layer.

create_tensor([name, persistable, dtype])

Create Tensor for this layer.

create_variable([name, persistable, dtype])

Create Tensor for this layer.

eval()

Sets this Layer and all its sublayers to evaluation mode.

extra_repr()

Extra representation of this layer, you can have custom implementation of your own layer.

forward(xs, xs_lens[, decoding_chunk_size, ...])

Embed positions in tensor. Args: xs: padded input tensor (B, L, D) xs_lens: input length (B) decoding_chunk_size: decoding chunk size for dynamic chunk 0: default for training, use random dynamic chunk. <0: for decoding, use full chunk. >0: for decoding, use fixed chunk size as set. num_decoding_left_chunks: number of left chunks, this is for decoding, the chunk size is decoding_chunk_size. >=0: use num_decoding_left_chunks <0: use all left chunks Returns: encoder output tensor, lens and mask.

forward_chunk(xs, offset, required_cache_size)

Forward just one chunk Args: xs (paddle.Tensor): chunk audio feat input, [B=1, T, D], where T==(chunk_size-1)*subsampling_rate + subsample.right_context + 1 offset (int): current offset in encoder output time stamp required_cache_size (int): cache size required for next chunk compuation >=0: actual cache size <0: means all history cache is required att_cache(paddle.Tensor): cache tensor for key & val in transformer/conformer attention. Shape is (elayers, head, cache_t1, d_k * 2), where`head * d_k == hidden-dim` and cache_t1 == chunk_size * num_decoding_left_chunks. cnn_cache (paddle.Tensor): cache tensor for cnn_module in conformer, (elayers, B=1, hidden-dim, cache_t2), where cache_t2 == cnn.lorder - 1 Returns: paddle.Tensor: output of current input xs, (B=1, chunk_size, hidden-dim) paddle.Tensor: new attention cache required for next chunk, dyanmic shape (elayers, head, T, d_k*2) depending on required_cache_size paddle.Tensor: new conformer cnn cache required for next chunk, with same shape as the original cnn_cache.

forward_chunk_by_chunk(xs, decoding_chunk_size)

Forward input chunk by chunk with chunk_size like a streaming

forward_one_step(xs, masks[, cache])

Encode input frame.

full_name()

Full name for this layer, composed by name_scope + "/" + MyLayer.__class__.__name__

load_dict(state_dict[, use_structured_name])

Set parameters and persistable buffers from state_dict.

named_buffers([prefix, include_sublayers])

Returns an iterator over all buffers in the Layer, yielding tuple of name and Tensor.

named_children()

Returns an iterator over immediate children layers, yielding both the name of the layer as well as the layer itself.

named_parameters([prefix, include_sublayers])

Returns an iterator over all parameters in the Layer, yielding tuple of name and parameter.

named_sublayers([prefix, include_self, ...])

Returns an iterator over all sublayers in the Layer, yielding tuple of name and sublayer.

parameters([include_sublayers])

Returns a list of all Parameters from current layer and its sub-layers.

register_buffer(name, tensor[, persistable])

Registers a tensor as buffer into the layer.

register_forward_post_hook(hook)

Register a forward post-hook for Layer.

register_forward_pre_hook(hook)

Register a forward pre-hook for Layer.

set_dict(state_dict[, use_structured_name])

Set parameters and persistable buffers from state_dict.

set_state_dict(state_dict[, use_structured_name])

Set parameters and persistable buffers from state_dict.

state_dict([destination, include_sublayers, ...])

Get all parameters and persistable buffers of current layer and its sub-layers.

sublayers([include_self])

Returns a list of sub layers.

to([device, dtype, blocking])

Cast the parameters and buffers of Layer by the give device, dtype and blocking.

to_static_state_dict([destination, ...])

Get all parameters and buffers of current layer and its sub-layers.

train()

Sets this Layer and all its sublayers to training mode.

backward

output_size

register_state_dict_hook

forward_one_step(xs: Tensor, masks: Tensor, cache=None) Tuple[Tensor, Tensor][source]

Encode input frame.

Args:

xs (paddle.Tensor): (Prefix) Input tensor. (B, T, D) masks (paddle.Tensor): Mask tensor. (B, T, T) cache (List[paddle.Tensor]): List of cache tensors.

Returns:

paddle.Tensor: Output tensor. paddle.Tensor: Mask tensor. List[paddle.Tensor]: List of new cache tensors.