paddlespeech.s2t.modules.encoder module
Encoder definition.
- class paddlespeech.s2t.modules.encoder.BaseEncoder(input_size: int, output_size: int = 256, attention_heads: int = 4, linear_units: int = 2048, num_blocks: int = 6, dropout_rate: float = 0.1, positional_dropout_rate: float = 0.1, attention_dropout_rate: float = 0.0, input_layer: str = 'conv2d', pos_enc_layer_type: str = 'abs_pos', normalize_before: bool = True, concat_after: bool = False, static_chunk_size: int = 0, use_dynamic_chunk: bool = False, global_cmvn: Optional[Layer] = None, use_dynamic_left_chunk: bool = False, max_len: int = 5000)[source]
Bases:
Layer
Methods
__call__
(*inputs, **kwargs)Call self as a function.
add_parameter
(name, parameter)Adds a Parameter instance.
add_sublayer
(name, sublayer)Adds a sub Layer instance.
apply
(fn)Applies
fn
recursively to every sublayer (as returned by.sublayers()
) as well as self.buffers
([include_sublayers])Returns a list of all buffers from current layer and its sub-layers.
children
()Returns an iterator over immediate children layers.
clear_gradients
()Clear the gradients of all parameters for this layer.
create_parameter
(shape[, attr, dtype, ...])Create parameters for this layer.
create_tensor
([name, persistable, dtype])Create Tensor for this layer.
create_variable
([name, persistable, dtype])Create Tensor for this layer.
eval
()Sets this Layer and all its sublayers to evaluation mode.
extra_repr
()Extra representation of this layer, you can have custom implementation of your own layer.
forward
(xs, xs_lens[, decoding_chunk_size, ...])Embed positions in tensor. Args: xs: padded input tensor (B, L, D) xs_lens: input length (B) decoding_chunk_size: decoding chunk size for dynamic chunk 0: default for training, use random dynamic chunk. <0: for decoding, use full chunk. >0: for decoding, use fixed chunk size as set. num_decoding_left_chunks: number of left chunks, this is for decoding, the chunk size is decoding_chunk_size. >=0: use num_decoding_left_chunks <0: use all left chunks Returns: encoder output tensor, lens and mask.
forward_chunk
(xs, offset, required_cache_size)Forward just one chunk Args: xs (paddle.Tensor): chunk audio feat input, [B=1, T, D], where T==(chunk_size-1)*subsampling_rate + subsample.right_context + 1 offset (int): current offset in encoder output time stamp required_cache_size (int): cache size required for next chunk compuation >=0: actual cache size <0: means all history cache is required att_cache(paddle.Tensor): cache tensor for key & val in transformer/conformer attention. Shape is (elayers, head, cache_t1, d_k * 2), where`head * d_k == hidden-dim` and cache_t1 == chunk_size * num_decoding_left_chunks. cnn_cache (paddle.Tensor): cache tensor for cnn_module in conformer, (elayers, B=1, hidden-dim, cache_t2), where cache_t2 == cnn.lorder - 1 Returns: paddle.Tensor: output of current input xs, (B=1, chunk_size, hidden-dim) paddle.Tensor: new attention cache required for next chunk, dyanmic shape (elayers, head, T, d_k*2) depending on required_cache_size paddle.Tensor: new conformer cnn cache required for next chunk, with same shape as the original cnn_cache.
forward_chunk_by_chunk
(xs, decoding_chunk_size)Forward input chunk by chunk with chunk_size like a streaming
full_name
()Full name for this layer, composed by name_scope + "/" + MyLayer.__class__.__name__
load_dict
(state_dict[, use_structured_name])Set parameters and persistable buffers from state_dict.
named_buffers
([prefix, include_sublayers])Returns an iterator over all buffers in the Layer, yielding tuple of name and Tensor.
named_children
()Returns an iterator over immediate children layers, yielding both the name of the layer as well as the layer itself.
named_parameters
([prefix, include_sublayers])Returns an iterator over all parameters in the Layer, yielding tuple of name and parameter.
named_sublayers
([prefix, include_self, ...])Returns an iterator over all sublayers in the Layer, yielding tuple of name and sublayer.
parameters
([include_sublayers])Returns a list of all Parameters from current layer and its sub-layers.
register_buffer
(name, tensor[, persistable])Registers a tensor as buffer into the layer.
register_forward_post_hook
(hook)Register a forward post-hook for Layer.
register_forward_pre_hook
(hook)Register a forward pre-hook for Layer.
set_dict
(state_dict[, use_structured_name])Set parameters and persistable buffers from state_dict.
set_state_dict
(state_dict[, use_structured_name])Set parameters and persistable buffers from state_dict.
state_dict
([destination, include_sublayers, ...])Get all parameters and persistable buffers of current layer and its sub-layers.
sublayers
([include_self])Returns a list of sub layers.
to
([device, dtype, blocking])Cast the parameters and buffers of Layer by the give device, dtype and blocking.
to_static_state_dict
([destination, ...])Get all parameters and buffers of current layer and its sub-layers.
train
()Sets this Layer and all its sublayers to training mode.
backward
output_size
register_state_dict_hook
- forward(xs: Tensor, xs_lens: Tensor, decoding_chunk_size: int = 0, num_decoding_left_chunks: int = -1) Tuple[Tensor, Tensor] [source]
Embed positions in tensor. Args:
xs: padded input tensor (B, L, D) xs_lens: input length (B) decoding_chunk_size: decoding chunk size for dynamic chunk
0: default for training, use random dynamic chunk. <0: for decoding, use full chunk. >0: for decoding, use fixed chunk size as set.
- num_decoding_left_chunks: number of left chunks, this is for decoding,
the chunk size is decoding_chunk_size. >=0: use num_decoding_left_chunks <0: use all left chunks
- Returns:
encoder output tensor, lens and mask
- forward_chunk(xs: ~paddle.Tensor, offset: int, required_cache_size: int, att_cache: ~paddle.Tensor = Tensor(shape=[0, 0, 0, 0], dtype=float32, place=Place(cpu), stop_gradient=True, []), cnn_cache: ~paddle.Tensor = Tensor(shape=[0, 0, 0, 0], dtype=float32, place=Place(cpu), stop_gradient=True, []), att_mask: ~paddle.Tensor = Tensor(shape=[0, 0, 0], dtype=bool, place=Place(cpu), stop_gradient=True, [])) Tuple[Tensor, Tensor, Tensor] [source]
Forward just one chunk Args:
- xs (paddle.Tensor): chunk audio feat input, [B=1, T, D], where
T==(chunk_size-1)*subsampling_rate + subsample.right_context + 1
offset (int): current offset in encoder output time stamp required_cache_size (int): cache size required for next chunk
compuation >=0: actual cache size <0: means all history cache is required
- att_cache(paddle.Tensor): cache tensor for key & val in
transformer/conformer attention. Shape is (elayers, head, cache_t1, d_k * 2), where`head * d_k == hidden-dim` and cache_t1 == chunk_size * num_decoding_left_chunks.
- cnn_cache (paddle.Tensor): cache tensor for cnn_module in conformer,
(elayers, B=1, hidden-dim, cache_t2), where cache_t2 == cnn.lorder - 1
- Returns:
paddle.Tensor: output of current input xs, (B=1, chunk_size, hidden-dim) paddle.Tensor: new attention cache required for next chunk, dyanmic shape
(elayers, head, T, d_k*2) depending on required_cache_size
- paddle.Tensor: new conformer cnn cache required for next chunk, with
same shape as the original cnn_cache
- forward_chunk_by_chunk(xs: Tensor, decoding_chunk_size: int, num_decoding_left_chunks: int = -1) Tuple[Tensor, Tensor] [source]
- Forward input chunk by chunk with chunk_size like a streaming
fashion
Here we should pay special attention to computation cache in the streaming style forward chunk by chunk. Three things should be taken into account for computation in the current network:
transformer/conformer encoder layers output cache
convolution in conformer
convolution in subsampling
- However, we don't implement subsampling cache for:
We can control subsampling module to output the right result by overlapping input instead of cache left context, even though it wastes some computation, but subsampling only takes a very small fraction of computation in the whole model.
Typically, there are several covolution layers with subsampling in subsampling module, it is tricky and complicated to do cache with different convolution layers with different subsampling rate.
Currently, nn.Sequential is used to stack all the convolution layers in subsampling, we need to rewrite it to make it work with cache, which is not prefered.
- Args:
xs (paddle.Tensor): (1, max_len, dim) chunk_size (int): decoding chunk size. num_left_chunks (int): decoding with num left chunks.
- class paddlespeech.s2t.modules.encoder.ConformerEncoder(input_size: int, output_size: int = 256, attention_heads: int = 4, linear_units: int = 2048, num_blocks: int = 6, dropout_rate: float = 0.1, positional_dropout_rate: float = 0.1, attention_dropout_rate: float = 0.0, input_layer: str = 'conv2d', pos_enc_layer_type: str = 'rel_pos', normalize_before: bool = True, concat_after: bool = False, static_chunk_size: int = 0, use_dynamic_chunk: bool = False, global_cmvn: Optional[Layer] = None, use_dynamic_left_chunk: bool = False, positionwise_conv_kernel_size: int = 1, macaron_style: bool = True, selfattention_layer_type: str = 'rel_selfattn', activation_type: str = 'swish', use_cnn_module: bool = True, cnn_module_kernel: int = 15, causal: bool = False, cnn_module_norm: str = 'batch_norm', max_len: int = 5000)[source]
Bases:
BaseEncoder
Conformer encoder module.
Methods
__call__
(*inputs, **kwargs)Call self as a function.
add_parameter
(name, parameter)Adds a Parameter instance.
add_sublayer
(name, sublayer)Adds a sub Layer instance.
apply
(fn)Applies
fn
recursively to every sublayer (as returned by.sublayers()
) as well as self.buffers
([include_sublayers])Returns a list of all buffers from current layer and its sub-layers.
children
()Returns an iterator over immediate children layers.
clear_gradients
()Clear the gradients of all parameters for this layer.
create_parameter
(shape[, attr, dtype, ...])Create parameters for this layer.
create_tensor
([name, persistable, dtype])Create Tensor for this layer.
create_variable
([name, persistable, dtype])Create Tensor for this layer.
eval
()Sets this Layer and all its sublayers to evaluation mode.
extra_repr
()Extra representation of this layer, you can have custom implementation of your own layer.
forward
(xs, xs_lens[, decoding_chunk_size, ...])Embed positions in tensor. Args: xs: padded input tensor (B, L, D) xs_lens: input length (B) decoding_chunk_size: decoding chunk size for dynamic chunk 0: default for training, use random dynamic chunk. <0: for decoding, use full chunk. >0: for decoding, use fixed chunk size as set. num_decoding_left_chunks: number of left chunks, this is for decoding, the chunk size is decoding_chunk_size. >=0: use num_decoding_left_chunks <0: use all left chunks Returns: encoder output tensor, lens and mask.
forward_chunk
(xs, offset, required_cache_size)Forward just one chunk Args: xs (paddle.Tensor): chunk audio feat input, [B=1, T, D], where T==(chunk_size-1)*subsampling_rate + subsample.right_context + 1 offset (int): current offset in encoder output time stamp required_cache_size (int): cache size required for next chunk compuation >=0: actual cache size <0: means all history cache is required att_cache(paddle.Tensor): cache tensor for key & val in transformer/conformer attention. Shape is (elayers, head, cache_t1, d_k * 2), where`head * d_k == hidden-dim` and cache_t1 == chunk_size * num_decoding_left_chunks. cnn_cache (paddle.Tensor): cache tensor for cnn_module in conformer, (elayers, B=1, hidden-dim, cache_t2), where cache_t2 == cnn.lorder - 1 Returns: paddle.Tensor: output of current input xs, (B=1, chunk_size, hidden-dim) paddle.Tensor: new attention cache required for next chunk, dyanmic shape (elayers, head, T, d_k*2) depending on required_cache_size paddle.Tensor: new conformer cnn cache required for next chunk, with same shape as the original cnn_cache.
forward_chunk_by_chunk
(xs, decoding_chunk_size)Forward input chunk by chunk with chunk_size like a streaming
full_name
()Full name for this layer, composed by name_scope + "/" + MyLayer.__class__.__name__
load_dict
(state_dict[, use_structured_name])Set parameters and persistable buffers from state_dict.
named_buffers
([prefix, include_sublayers])Returns an iterator over all buffers in the Layer, yielding tuple of name and Tensor.
named_children
()Returns an iterator over immediate children layers, yielding both the name of the layer as well as the layer itself.
named_parameters
([prefix, include_sublayers])Returns an iterator over all parameters in the Layer, yielding tuple of name and parameter.
named_sublayers
([prefix, include_self, ...])Returns an iterator over all sublayers in the Layer, yielding tuple of name and sublayer.
parameters
([include_sublayers])Returns a list of all Parameters from current layer and its sub-layers.
register_buffer
(name, tensor[, persistable])Registers a tensor as buffer into the layer.
register_forward_post_hook
(hook)Register a forward post-hook for Layer.
register_forward_pre_hook
(hook)Register a forward pre-hook for Layer.
set_dict
(state_dict[, use_structured_name])Set parameters and persistable buffers from state_dict.
set_state_dict
(state_dict[, use_structured_name])Set parameters and persistable buffers from state_dict.
state_dict
([destination, include_sublayers, ...])Get all parameters and persistable buffers of current layer and its sub-layers.
sublayers
([include_self])Returns a list of sub layers.
to
([device, dtype, blocking])Cast the parameters and buffers of Layer by the give device, dtype and blocking.
to_static_state_dict
([destination, ...])Get all parameters and buffers of current layer and its sub-layers.
train
()Sets this Layer and all its sublayers to training mode.
backward
output_size
register_state_dict_hook
- class paddlespeech.s2t.modules.encoder.SqueezeformerEncoder(input_size: int, encoder_dim: int = 256, output_size: int = 256, attention_heads: int = 4, num_blocks: int = 12, reduce_idx: Optional[Union[int, List[int]]] = 5, recover_idx: Optional[Union[int, List[int]]] = 11, feed_forward_expansion_factor: int = 4, dw_stride: bool = False, input_dropout_rate: float = 0.1, pos_enc_layer_type: str = 'rel_pos', time_reduction_layer_type: str = 'conv1d', feed_forward_dropout_rate: float = 0.1, attention_dropout_rate: float = 0.1, cnn_module_kernel: int = 31, cnn_norm_type: str = 'layer_norm', dropout: float = 0.1, causal: bool = False, adaptive_scale: bool = True, activation_type: str = 'swish', init_weights: bool = True, global_cmvn: Optional[Layer] = None, normalize_before: bool = False, use_dynamic_chunk: bool = False, concat_after: bool = False, static_chunk_size: int = 0, use_dynamic_left_chunk: bool = False)[source]
Bases:
Layer
Methods
__call__
(*inputs, **kwargs)Call self as a function.
add_parameter
(name, parameter)Adds a Parameter instance.
add_sublayer
(name, sublayer)Adds a sub Layer instance.
apply
(fn)Applies
fn
recursively to every sublayer (as returned by.sublayers()
) as well as self.buffers
([include_sublayers])Returns a list of all buffers from current layer and its sub-layers.
children
()Returns an iterator over immediate children layers.
clear_gradients
()Clear the gradients of all parameters for this layer.
create_parameter
(shape[, attr, dtype, ...])Create parameters for this layer.
create_tensor
([name, persistable, dtype])Create Tensor for this layer.
create_variable
([name, persistable, dtype])Create Tensor for this layer.
eval
()Sets this Layer and all its sublayers to evaluation mode.
extra_repr
()Extra representation of this layer, you can have custom implementation of your own layer.
forward
(xs, xs_lens[, decoding_chunk_size, ...])Embed positions in tensor. Args: xs: padded input tensor (B, L, D) xs_lens: input length (B) decoding_chunk_size: decoding chunk size for dynamic chunk 0: default for training, use random dynamic chunk. <0: for decoding, use full chunk. >0: for decoding, use fixed chunk size as set. num_decoding_left_chunks: number of left chunks, this is for decoding, the chunk size is decoding_chunk_size. >=0: use num_decoding_left_chunks <0: use all left chunks Returns: encoder output tensor, lens and mask.
forward_chunk
(xs, offset, required_cache_size)Forward just one chunk
full_name
()Full name for this layer, composed by name_scope + "/" + MyLayer.__class__.__name__
load_dict
(state_dict[, use_structured_name])Set parameters and persistable buffers from state_dict.
named_buffers
([prefix, include_sublayers])Returns an iterator over all buffers in the Layer, yielding tuple of name and Tensor.
named_children
()Returns an iterator over immediate children layers, yielding both the name of the layer as well as the layer itself.
named_parameters
([prefix, include_sublayers])Returns an iterator over all parameters in the Layer, yielding tuple of name and parameter.
named_sublayers
([prefix, include_self, ...])Returns an iterator over all sublayers in the Layer, yielding tuple of name and sublayer.
parameters
([include_sublayers])Returns a list of all Parameters from current layer and its sub-layers.
register_buffer
(name, tensor[, persistable])Registers a tensor as buffer into the layer.
register_forward_post_hook
(hook)Register a forward post-hook for Layer.
register_forward_pre_hook
(hook)Register a forward pre-hook for Layer.
set_dict
(state_dict[, use_structured_name])Set parameters and persistable buffers from state_dict.
set_state_dict
(state_dict[, use_structured_name])Set parameters and persistable buffers from state_dict.
state_dict
([destination, include_sublayers, ...])Get all parameters and persistable buffers of current layer and its sub-layers.
sublayers
([include_self])Returns a list of sub layers.
to
([device, dtype, blocking])Cast the parameters and buffers of Layer by the give device, dtype and blocking.
to_static_state_dict
([destination, ...])Get all parameters and buffers of current layer and its sub-layers.
train
()Sets this Layer and all its sublayers to training mode.
backward
calculate_downsampling_factor
check_ascending_list
output_size
register_state_dict_hook
- forward(xs: Tensor, xs_lens: Tensor, decoding_chunk_size: int = 0, num_decoding_left_chunks: int = -1) Tuple[Tensor, Tensor] [source]
Embed positions in tensor. Args:
xs: padded input tensor (B, L, D) xs_lens: input length (B) decoding_chunk_size: decoding chunk size for dynamic chunk
0: default for training, use random dynamic chunk. <0: for decoding, use full chunk. >0: for decoding, use fixed chunk size as set.
- num_decoding_left_chunks: number of left chunks, this is for decoding,
the chunk size is decoding_chunk_size. >=0: use num_decoding_left_chunks <0: use all left chunks
- Returns:
encoder output tensor, lens and mask
- forward_chunk(xs: ~paddle.Tensor, offset: int, required_cache_size: int, att_cache: ~paddle.Tensor = Tensor(shape=[0, 0, 0, 0], dtype=float32, place=Place(cpu), stop_gradient=True, []), cnn_cache: ~paddle.Tensor = Tensor(shape=[0, 0, 0, 0], dtype=float32, place=Place(cpu), stop_gradient=True, []), att_mask: ~paddle.Tensor = Tensor(shape=[0, 0, 0], dtype=bool, place=Place(cpu), stop_gradient=True, [])) Tuple[Tensor, Tensor, Tensor] [source]
Forward just one chunk
- Args:
- xs (paddle.Tensor): chunk input, with shape (b=1, time, mel-dim),
where time == (chunk_size - 1) * subsample_rate + subsample.right_context + 1
offset (int): current offset in encoder output time stamp required_cache_size (int): cache size required for next chunk
compuation >=0: actual cache size <0: means all history cache is required
- att_cache (paddle.Tensor): cache tensor for KEY & VALUE in
transformer/conformer attention, with shape (elayers, head, cache_t1, d_k * 2), where head * d_k == hidden-dim and cache_t1 == chunk_size * num_decoding_left_chunks.
- cnn_cache (paddle.Tensor): cache tensor for cnn_module in conformer,
(elayers, b=1, hidden-dim, cache_t2), where cache_t2 == cnn.lorder - 1
- Returns:
- paddle.Tensor: output of current input xs,
with shape (b=1, chunk_size, hidden-dim).
- paddle.Tensor: new attention cache required for next chunk, with
dynamic shape (elayers, head, ?, d_k * 2) depending on required_cache_size.
- paddle.Tensor: new conformer cnn cache required for next chunk, with
same shape as the original cnn_cache.
- class paddlespeech.s2t.modules.encoder.TransformerEncoder(input_size: int, output_size: int = 256, attention_heads: int = 4, linear_units: int = 2048, num_blocks: int = 6, dropout_rate: float = 0.1, positional_dropout_rate: float = 0.1, attention_dropout_rate: float = 0.0, input_layer: str = 'conv2d', pos_enc_layer_type: str = 'abs_pos', normalize_before: bool = True, concat_after: bool = False, static_chunk_size: int = 0, use_dynamic_chunk: bool = False, global_cmvn: Optional[Layer] = None, use_dynamic_left_chunk: bool = False)[source]
Bases:
BaseEncoder
Transformer encoder module.
Methods
__call__
(*inputs, **kwargs)Call self as a function.
add_parameter
(name, parameter)Adds a Parameter instance.
add_sublayer
(name, sublayer)Adds a sub Layer instance.
apply
(fn)Applies
fn
recursively to every sublayer (as returned by.sublayers()
) as well as self.buffers
([include_sublayers])Returns a list of all buffers from current layer and its sub-layers.
children
()Returns an iterator over immediate children layers.
clear_gradients
()Clear the gradients of all parameters for this layer.
create_parameter
(shape[, attr, dtype, ...])Create parameters for this layer.
create_tensor
([name, persistable, dtype])Create Tensor for this layer.
create_variable
([name, persistable, dtype])Create Tensor for this layer.
eval
()Sets this Layer and all its sublayers to evaluation mode.
extra_repr
()Extra representation of this layer, you can have custom implementation of your own layer.
forward
(xs, xs_lens[, decoding_chunk_size, ...])Embed positions in tensor. Args: xs: padded input tensor (B, L, D) xs_lens: input length (B) decoding_chunk_size: decoding chunk size for dynamic chunk 0: default for training, use random dynamic chunk. <0: for decoding, use full chunk. >0: for decoding, use fixed chunk size as set. num_decoding_left_chunks: number of left chunks, this is for decoding, the chunk size is decoding_chunk_size. >=0: use num_decoding_left_chunks <0: use all left chunks Returns: encoder output tensor, lens and mask.
forward_chunk
(xs, offset, required_cache_size)Forward just one chunk Args: xs (paddle.Tensor): chunk audio feat input, [B=1, T, D], where T==(chunk_size-1)*subsampling_rate + subsample.right_context + 1 offset (int): current offset in encoder output time stamp required_cache_size (int): cache size required for next chunk compuation >=0: actual cache size <0: means all history cache is required att_cache(paddle.Tensor): cache tensor for key & val in transformer/conformer attention. Shape is (elayers, head, cache_t1, d_k * 2), where`head * d_k == hidden-dim` and cache_t1 == chunk_size * num_decoding_left_chunks. cnn_cache (paddle.Tensor): cache tensor for cnn_module in conformer, (elayers, B=1, hidden-dim, cache_t2), where cache_t2 == cnn.lorder - 1 Returns: paddle.Tensor: output of current input xs, (B=1, chunk_size, hidden-dim) paddle.Tensor: new attention cache required for next chunk, dyanmic shape (elayers, head, T, d_k*2) depending on required_cache_size paddle.Tensor: new conformer cnn cache required for next chunk, with same shape as the original cnn_cache.
forward_chunk_by_chunk
(xs, decoding_chunk_size)Forward input chunk by chunk with chunk_size like a streaming
forward_one_step
(xs, masks[, cache])Encode input frame.
full_name
()Full name for this layer, composed by name_scope + "/" + MyLayer.__class__.__name__
load_dict
(state_dict[, use_structured_name])Set parameters and persistable buffers from state_dict.
named_buffers
([prefix, include_sublayers])Returns an iterator over all buffers in the Layer, yielding tuple of name and Tensor.
named_children
()Returns an iterator over immediate children layers, yielding both the name of the layer as well as the layer itself.
named_parameters
([prefix, include_sublayers])Returns an iterator over all parameters in the Layer, yielding tuple of name and parameter.
named_sublayers
([prefix, include_self, ...])Returns an iterator over all sublayers in the Layer, yielding tuple of name and sublayer.
parameters
([include_sublayers])Returns a list of all Parameters from current layer and its sub-layers.
register_buffer
(name, tensor[, persistable])Registers a tensor as buffer into the layer.
register_forward_post_hook
(hook)Register a forward post-hook for Layer.
register_forward_pre_hook
(hook)Register a forward pre-hook for Layer.
set_dict
(state_dict[, use_structured_name])Set parameters and persistable buffers from state_dict.
set_state_dict
(state_dict[, use_structured_name])Set parameters and persistable buffers from state_dict.
state_dict
([destination, include_sublayers, ...])Get all parameters and persistable buffers of current layer and its sub-layers.
sublayers
([include_self])Returns a list of sub layers.
to
([device, dtype, blocking])Cast the parameters and buffers of Layer by the give device, dtype and blocking.
to_static_state_dict
([destination, ...])Get all parameters and buffers of current layer and its sub-layers.
train
()Sets this Layer and all its sublayers to training mode.
backward
output_size
register_state_dict_hook
- forward_one_step(xs: Tensor, masks: Tensor, cache=None) Tuple[Tensor, Tensor] [source]
Encode input frame.
- Args:
xs (paddle.Tensor): (Prefix) Input tensor. (B, T, D) masks (paddle.Tensor): Mask tensor. (B, T, T) cache (List[paddle.Tensor]): List of cache tensors.
- Returns:
paddle.Tensor: Output tensor. paddle.Tensor: Mask tensor. List[paddle.Tensor]: List of new cache tensors.