paddlespeech.t2s.models.speedyspeech.speedyspeech module

class paddlespeech.t2s.models.speedyspeech.speedyspeech.DurationPredictor(hidden_size: int = 128)[source]

Bases: Layer


__call__(*inputs, **kwargs)

Call self as a function.

add_parameter(name, parameter)

Adds a Parameter instance.

add_sublayer(name, sublayer)

Adds a sub Layer instance.


Applies fn recursively to every sublayer (as returned by .sublayers()) as well as self.


Returns a list of all buffers from current layer and its sub-layers.


Returns an iterator over immediate children layers.


Clear the gradients of all parameters for this layer.

create_parameter(shape[, attr, dtype, ...])

Create parameters for this layer.

create_tensor([name, persistable, dtype])

Create Tensor for this layer.

create_variable([name, persistable, dtype])

Create Tensor for this layer.


Sets this Layer and all its sublayers to evaluation mode.


Extra representation of this layer, you can have custom implementation of your own layer.


Calculate forward propagation. Args: x(Tensor): Batch of input sequences (B, Tmax, hidden_size).


Full name for this layer, composed by name_scope + "/" + MyLayer.__class__.__name__

load_dict(state_dict[, use_structured_name])

Set parameters and persistable buffers from state_dict.

named_buffers([prefix, include_sublayers])

Returns an iterator over all buffers in the Layer, yielding tuple of name and Tensor.


Returns an iterator over immediate children layers, yielding both the name of the layer as well as the layer itself.

named_parameters([prefix, include_sublayers])

Returns an iterator over all parameters in the Layer, yielding tuple of name and parameter.

named_sublayers([prefix, include_self, ...])

Returns an iterator over all sublayers in the Layer, yielding tuple of name and sublayer.


Returns a list of all Parameters from current layer and its sub-layers.

register_buffer(name, tensor[, persistable])

Registers a tensor as buffer into the layer.


Register a forward post-hook for Layer.


Register a forward pre-hook for Layer.

set_dict(state_dict[, use_structured_name])

Set parameters and persistable buffers from state_dict.

set_state_dict(state_dict[, use_structured_name])

Set parameters and persistable buffers from state_dict.

state_dict([destination, include_sublayers, ...])

Get all parameters and persistable buffers of current layer and its sub-layers.


Returns a list of sub layers.

to([device, dtype, blocking])

Cast the parameters and buffers of Layer by the give device, dtype and blocking.

to_static_state_dict([destination, ...])

Get all parameters and buffers of current layer and its sub-layers.


Sets this Layer and all its sublayers to training mode.



forward(x: Tensor)[source]

Calculate forward propagation. Args:


Batch of input sequences (B, Tmax, hidden_size).


Tensor: Batch of predicted durations in log domain (B, Tmax).

class paddlespeech.t2s.models.speedyspeech.speedyspeech.ResidualBlock(channels: int = 128, kernel_size: int = 3, dilation: int = 3, n: int = 2)[source]

Bases: Layer


__call__(*inputs, **kwargs)

Call self as a function.

add_parameter(name, parameter)

Adds a Parameter instance.

add_sublayer(name, sublayer)

Adds a sub Layer instance.


Applies fn recursively to every sublayer (as returned by .sublayers()) as well as self.


Returns a list of all buffers from current layer and its sub-layers.


Returns an iterator over immediate children layers.


Clear the gradients of all parameters for this layer.

create_parameter(shape[, attr, dtype, ...])

Create parameters for this layer.

create_tensor([name, persistable, dtype])

Create Tensor for this layer.

create_variable([name, persistable, dtype])

Create Tensor for this layer.


Sets this Layer and all its sublayers to evaluation mode.


Extra representation of this layer, you can have custom implementation of your own layer.


Calculate forward propagation. Args: x(Tensor): Batch of input sequences (B, hidden_size, Tmax). Returns: Tensor: The residual output (B, hidden_size, Tmax).


Full name for this layer, composed by name_scope + "/" + MyLayer.__class__.__name__

load_dict(state_dict[, use_structured_name])

Set parameters and persistable buffers from state_dict.

named_buffers([prefix, include_sublayers])

Returns an iterator over all buffers in the Layer, yielding tuple of name and Tensor.


Returns an iterator over immediate children layers, yielding both the name of the layer as well as the layer itself.

named_parameters([prefix, include_sublayers])

Returns an iterator over all parameters in the Layer, yielding tuple of name and parameter.

named_sublayers([prefix, include_self, ...])

Returns an iterator over all sublayers in the Layer, yielding tuple of name and sublayer.


Returns a list of all Parameters from current layer and its sub-layers.

register_buffer(name, tensor[, persistable])

Registers a tensor as buffer into the layer.


Register a forward post-hook for Layer.


Register a forward pre-hook for Layer.

set_dict(state_dict[, use_structured_name])

Set parameters and persistable buffers from state_dict.

set_state_dict(state_dict[, use_structured_name])

Set parameters and persistable buffers from state_dict.

state_dict([destination, include_sublayers, ...])

Get all parameters and persistable buffers of current layer and its sub-layers.


Returns a list of sub layers.

to([device, dtype, blocking])

Cast the parameters and buffers of Layer by the give device, dtype and blocking.

to_static_state_dict([destination, ...])

Get all parameters and buffers of current layer and its sub-layers.


Sets this Layer and all its sublayers to training mode.



forward(x: Tensor)[source]

Calculate forward propagation. Args:


Batch of input sequences (B, hidden_size, Tmax).


Tensor: The residual output (B, hidden_size, Tmax).

class paddlespeech.t2s.models.speedyspeech.speedyspeech.SpeedySpeech(vocab_size, encoder_hidden_size: int = 128, encoder_kernel_size: int = 3, encoder_dilations: List[int] = [1, 3, 9, 27, 1, 3, 9, 27, 1, 1], duration_predictor_hidden_size: int = 128, decoder_hidden_size: int = 128, decoder_output_size: int = 80, decoder_kernel_size: int = 3, decoder_dilations: List[int] = [1, 3, 9, 27, 1, 3, 9, 27, 1, 3, 9, 27, 1, 3, 9, 27, 1, 1], tone_size: Optional[int] = None, spk_num: Optional[int] = None, init_type: str = 'xavier_uniform', positional_dropout_rate: int = 0.1)[source]

Bases: Layer


__call__(*inputs, **kwargs)

Call self as a function.

add_parameter(name, parameter)

Adds a Parameter instance.

add_sublayer(name, sublayer)

Adds a sub Layer instance.


Applies fn recursively to every sublayer (as returned by .sublayers()) as well as self.


Returns a list of all buffers from current layer and its sub-layers.


Returns an iterator over immediate children layers.


Clear the gradients of all parameters for this layer.

create_parameter(shape[, attr, dtype, ...])

Create parameters for this layer.

create_tensor([name, persistable, dtype])

Create Tensor for this layer.

create_variable([name, persistable, dtype])

Create Tensor for this layer.


Sets this Layer and all its sublayers to evaluation mode.


Extra representation of this layer, you can have custom implementation of your own layer.

forward(text, tones, durations[, spk_id])

Calculate forward propagation. Args: text(Tensor(int64)): Batch of padded token ids (B, Tmax). durations(Tensor(int64)): Batch of padded durations (B, Tmax). tones(Tensor, optional(int64)): Batch of padded tone ids (B, Tmax). spk_id(Tnesor, optional(int64)): Batch of speaker ids (B,).


Full name for this layer, composed by name_scope + "/" + MyLayer.__class__.__name__

inference(text[, tones, durations, spk_id])

Generate the sequence of features given the sequences of characters. Args: text(Tensor(int64)): Input sequence of characters (T,). tones(Tensor, optional(int64)): Batch of padded tone ids (T, ). durations(Tensor, optional (int64)): Groundtruth of duration (T,). spk_id(Tensor, optional(int64), optional): spk ids (1,). (Default value = None).

load_dict(state_dict[, use_structured_name])

Set parameters and persistable buffers from state_dict.

named_buffers([prefix, include_sublayers])

Returns an iterator over all buffers in the Layer, yielding tuple of name and Tensor.


Returns an iterator over immediate children layers, yielding both the name of the layer as well as the layer itself.

named_parameters([prefix, include_sublayers])

Returns an iterator over all parameters in the Layer, yielding tuple of name and parameter.

named_sublayers([prefix, include_self, ...])

Returns an iterator over all sublayers in the Layer, yielding tuple of name and sublayer.


Returns a list of all Parameters from current layer and its sub-layers.

register_buffer(name, tensor[, persistable])

Registers a tensor as buffer into the layer.


Register a forward post-hook for Layer.


Register a forward pre-hook for Layer.

set_dict(state_dict[, use_structured_name])

Set parameters and persistable buffers from state_dict.

set_state_dict(state_dict[, use_structured_name])

Set parameters and persistable buffers from state_dict.

state_dict([destination, include_sublayers, ...])

Get all parameters and persistable buffers of current layer and its sub-layers.


Returns a list of sub layers.

to([device, dtype, blocking])

Cast the parameters and buffers of Layer by the give device, dtype and blocking.

to_static_state_dict([destination, ...])

Get all parameters and buffers of current layer and its sub-layers.


Sets this Layer and all its sublayers to training mode.



forward(text: Tensor, tones: Tensor, durations: Tensor, spk_id: Optional[Tensor] = None)[source]

Calculate forward propagation. Args:


Batch of padded token ids (B, Tmax).


Batch of padded durations (B, Tmax).

tones(Tensor, optional(int64)):

Batch of padded tone ids (B, Tmax).

spk_id(Tnesor, optional(int64)):

Batch of speaker ids (B,)


Output tensor (B, T_frames, decoder_output_size).


Predicted durations (B, Tmax).

inference(text: Tensor, tones: Optional[Tensor] = None, durations: Optional[Tensor] = None, spk_id: Optional[Tensor] = None)[source]

Generate the sequence of features given the sequences of characters. Args:


Input sequence of characters (T,).

tones(Tensor, optional(int64)):

Batch of padded tone ids (T, ).

durations(Tensor, optional (int64)):

Groundtruth of duration (T,).

spk_id(Tensor, optional(int64), optional):

spk ids (1,). (Default value = None)


Tensor: logmel (T, decoder_output_size).

class paddlespeech.t2s.models.speedyspeech.speedyspeech.SpeedySpeechDecoder(hidden_size: int = 128, output_size: int = 80, kernel_size: int = 3, dilations: List[int] = [1, 3, 9, 27, 1, 3, 9, 27, 1, 3, 9, 27, 1, 3, 9, 27, 1, 1])[source]

Bases: Layer


__call__(*inputs, **kwargs)

Call self as a function.

add_parameter(name, parameter)

Adds a Parameter instance.

add_sublayer(name, sublayer)

Adds a sub Layer instance.


Applies fn recursively to every sublayer (as returned by .sublayers()) as well as self.


Returns a list of all buffers from current layer and its sub-layers.


Returns an iterator over immediate children layers.


Clear the gradients of all parameters for this layer.

create_parameter(shape[, attr, dtype, ...])

Create parameters for this layer.

create_tensor([name, persistable, dtype])

Create Tensor for this layer.

create_variable([name, persistable, dtype])

Create Tensor for this layer.


Sets this Layer and all its sublayers to evaluation mode.


Extra representation of this layer, you can have custom implementation of your own layer.


Decoder input sequence. Args: x(Tensor): Input tensor (B, time, hidden_size).


Full name for this layer, composed by name_scope + "/" + MyLayer.__class__.__name__

load_dict(state_dict[, use_structured_name])

Set parameters and persistable buffers from state_dict.

named_buffers([prefix, include_sublayers])

Returns an iterator over all buffers in the Layer, yielding tuple of name and Tensor.


Returns an iterator over immediate children layers, yielding both the name of the layer as well as the layer itself.

named_parameters([prefix, include_sublayers])

Returns an iterator over all parameters in the Layer, yielding tuple of name and parameter.

named_sublayers([prefix, include_self, ...])

Returns an iterator over all sublayers in the Layer, yielding tuple of name and sublayer.


Returns a list of all Parameters from current layer and its sub-layers.

register_buffer(name, tensor[, persistable])

Registers a tensor as buffer into the layer.


Register a forward post-hook for Layer.


Register a forward pre-hook for Layer.

set_dict(state_dict[, use_structured_name])

Set parameters and persistable buffers from state_dict.

set_state_dict(state_dict[, use_structured_name])

Set parameters and persistable buffers from state_dict.

state_dict([destination, include_sublayers, ...])

Get all parameters and persistable buffers of current layer and its sub-layers.


Returns a list of sub layers.

to([device, dtype, blocking])

Cast the parameters and buffers of Layer by the give device, dtype and blocking.

to_static_state_dict([destination, ...])

Get all parameters and buffers of current layer and its sub-layers.


Sets this Layer and all its sublayers to training mode.




Decoder input sequence. Args:


Input tensor (B, time, hidden_size).


Tensor: Output tensor (B, time, output_size).

class paddlespeech.t2s.models.speedyspeech.speedyspeech.SpeedySpeechEncoder(vocab_size: int, tone_size: int, hidden_size: int = 128, kernel_size: int = 3, dilations: List[int] = [1, 3, 9, 27, 1, 3, 9, 27, 1, 1], spk_num=None)[source]

Bases: Layer

SpeedySpeech encoder module. Args:

vocab_size (int):

Dimension of the inputs.

tone_size (Optional[int]):

Number of tones.

hidden_size (int):

Number of encoder hidden units.

kernel_size (int):

Kernel size of encoder.

dilations (List[int]):

Dilations of encoder.

spk_num (Optional[int]):

Number of speakers.


__call__(*inputs, **kwargs)

Call self as a function.

add_parameter(name, parameter)

Adds a Parameter instance.

add_sublayer(name, sublayer)

Adds a sub Layer instance.


Applies fn recursively to every sublayer (as returned by .sublayers()) as well as self.


Returns a list of all buffers from current layer and its sub-layers.


Returns an iterator over immediate children layers.


Clear the gradients of all parameters for this layer.

create_parameter(shape[, attr, dtype, ...])

Create parameters for this layer.

create_tensor([name, persistable, dtype])

Create Tensor for this layer.

create_variable([name, persistable, dtype])

Create Tensor for this layer.


Sets this Layer and all its sublayers to evaluation mode.


Extra representation of this layer, you can have custom implementation of your own layer.

forward(text, tones[, spk_id])

Encoder input sequence. Args: text(Tensor(int64)): Batch of padded token ids (B, Tmax). tones(Tensor, optional(int64)): Batch of padded tone ids (B, Tmax). spk_id(Tnesor, optional(int64)): Batch of speaker ids (B,).


Full name for this layer, composed by name_scope + "/" + MyLayer.__class__.__name__

load_dict(state_dict[, use_structured_name])

Set parameters and persistable buffers from state_dict.

named_buffers([prefix, include_sublayers])

Returns an iterator over all buffers in the Layer, yielding tuple of name and Tensor.


Returns an iterator over immediate children layers, yielding both the name of the layer as well as the layer itself.

named_parameters([prefix, include_sublayers])

Returns an iterator over all parameters in the Layer, yielding tuple of name and parameter.

named_sublayers([prefix, include_self, ...])

Returns an iterator over all sublayers in the Layer, yielding tuple of name and sublayer.


Returns a list of all Parameters from current layer and its sub-layers.

register_buffer(name, tensor[, persistable])

Registers a tensor as buffer into the layer.


Register a forward post-hook for Layer.


Register a forward pre-hook for Layer.

set_dict(state_dict[, use_structured_name])

Set parameters and persistable buffers from state_dict.

set_state_dict(state_dict[, use_structured_name])

Set parameters and persistable buffers from state_dict.

state_dict([destination, include_sublayers, ...])

Get all parameters and persistable buffers of current layer and its sub-layers.


Returns a list of sub layers.

to([device, dtype, blocking])

Cast the parameters and buffers of Layer by the give device, dtype and blocking.

to_static_state_dict([destination, ...])

Get all parameters and buffers of current layer and its sub-layers.


Sets this Layer and all its sublayers to training mode.



forward(text: Tensor, tones: Tensor, spk_id: Optional[Tensor] = None)[source]

Encoder input sequence. Args:


Batch of padded token ids (B, Tmax).

tones(Tensor, optional(int64)):

Batch of padded tone ids (B, Tmax).

spk_id(Tnesor, optional(int64)):

Batch of speaker ids (B,)


Tensor: Output tensor (B, Tmax, hidden_size).

class paddlespeech.t2s.models.speedyspeech.speedyspeech.SpeedySpeechInference(normalizer, speedyspeech_model)[source]

Bases: Layer


__call__(*inputs, **kwargs)

Call self as a function.

add_parameter(name, parameter)

Adds a Parameter instance.

add_sublayer(name, sublayer)

Adds a sub Layer instance.


Applies fn recursively to every sublayer (as returned by .sublayers()) as well as self.


Returns a list of all buffers from current layer and its sub-layers.


Returns an iterator over immediate children layers.


Clear the gradients of all parameters for this layer.

create_parameter(shape[, attr, dtype, ...])

Create parameters for this layer.

create_tensor([name, persistable, dtype])

Create Tensor for this layer.

create_variable([name, persistable, dtype])

Create Tensor for this layer.


Sets this Layer and all its sublayers to evaluation mode.


Extra representation of this layer, you can have custom implementation of your own layer.

forward(phones, tones[, spk_id, durations])

Defines the computation performed at every call.


Full name for this layer, composed by name_scope + "/" + MyLayer.__class__.__name__

load_dict(state_dict[, use_structured_name])

Set parameters and persistable buffers from state_dict.

named_buffers([prefix, include_sublayers])

Returns an iterator over all buffers in the Layer, yielding tuple of name and Tensor.


Returns an iterator over immediate children layers, yielding both the name of the layer as well as the layer itself.

named_parameters([prefix, include_sublayers])

Returns an iterator over all parameters in the Layer, yielding tuple of name and parameter.

named_sublayers([prefix, include_self, ...])

Returns an iterator over all sublayers in the Layer, yielding tuple of name and sublayer.


Returns a list of all Parameters from current layer and its sub-layers.

register_buffer(name, tensor[, persistable])

Registers a tensor as buffer into the layer.


Register a forward post-hook for Layer.


Register a forward pre-hook for Layer.

set_dict(state_dict[, use_structured_name])

Set parameters and persistable buffers from state_dict.

set_state_dict(state_dict[, use_structured_name])

Set parameters and persistable buffers from state_dict.

state_dict([destination, include_sublayers, ...])

Get all parameters and persistable buffers of current layer and its sub-layers.


Returns a list of sub layers.

to([device, dtype, blocking])

Cast the parameters and buffers of Layer by the give device, dtype and blocking.

to_static_state_dict([destination, ...])

Get all parameters and buffers of current layer and its sub-layers.


Sets this Layer and all its sublayers to training mode.



forward(phones, tones, spk_id=None, durations=None)[source]

Defines the computation performed at every call. Should be overridden by all subclasses.


*inputs(tuple): unpacked tuple arguments **kwargs(dict): unpacked dict arguments

class paddlespeech.t2s.models.speedyspeech.speedyspeech.TextEmbedding(vocab_size: int, embedding_size: int, tone_vocab_size: Optional[int] = None, tone_embedding_size: Optional[int] = None, padding_idx: Optional[int] = None, tone_padding_idx: Optional[int] = None, concat: bool = False)[source]

Bases: Layer


__call__(*inputs, **kwargs)

Call self as a function.

add_parameter(name, parameter)

Adds a Parameter instance.

add_sublayer(name, sublayer)

Adds a sub Layer instance.


Applies fn recursively to every sublayer (as returned by .sublayers()) as well as self.


Returns a list of all buffers from current layer and its sub-layers.


Returns an iterator over immediate children layers.


Clear the gradients of all parameters for this layer.

create_parameter(shape[, attr, dtype, ...])

Create parameters for this layer.

create_tensor([name, persistable, dtype])

Create Tensor for this layer.

create_variable([name, persistable, dtype])

Create Tensor for this layer.


Sets this Layer and all its sublayers to evaluation mode.


Extra representation of this layer, you can have custom implementation of your own layer.

forward(text[, tone])

Calculate forward propagation. Args: text(Tensor(int64)): Batch of padded token ids (B, Tmax). tones(Tensor, optional(int64)): Batch of padded tone ids (B, Tmax). Returns: Tensor: The residual output (B, Tmax, embedding_size).


Full name for this layer, composed by name_scope + "/" + MyLayer.__class__.__name__

load_dict(state_dict[, use_structured_name])

Set parameters and persistable buffers from state_dict.

named_buffers([prefix, include_sublayers])

Returns an iterator over all buffers in the Layer, yielding tuple of name and Tensor.


Returns an iterator over immediate children layers, yielding both the name of the layer as well as the layer itself.

named_parameters([prefix, include_sublayers])

Returns an iterator over all parameters in the Layer, yielding tuple of name and parameter.

named_sublayers([prefix, include_self, ...])

Returns an iterator over all sublayers in the Layer, yielding tuple of name and sublayer.


Returns a list of all Parameters from current layer and its sub-layers.

register_buffer(name, tensor[, persistable])

Registers a tensor as buffer into the layer.


Register a forward post-hook for Layer.


Register a forward pre-hook for Layer.

set_dict(state_dict[, use_structured_name])

Set parameters and persistable buffers from state_dict.

set_state_dict(state_dict[, use_structured_name])

Set parameters and persistable buffers from state_dict.

state_dict([destination, include_sublayers, ...])

Get all parameters and persistable buffers of current layer and its sub-layers.


Returns a list of sub layers.

to([device, dtype, blocking])

Cast the parameters and buffers of Layer by the give device, dtype and blocking.

to_static_state_dict([destination, ...])

Get all parameters and buffers of current layer and its sub-layers.


Sets this Layer and all its sublayers to training mode.



forward(text: Tensor, tone: Optional[Tensor] = None)[source]

Calculate forward propagation. Args:


Batch of padded token ids (B, Tmax).

tones(Tensor, optional(int64)):

Batch of padded tone ids (B, Tmax).


Tensor: The residual output (B, Tmax, embedding_size).