paddlespeech.t2s.models.fastspeech2.fastspeech2 module

Fastspeech2 related modules for paddle

class paddlespeech.t2s.models.fastspeech2.fastspeech2.FastSpeech2(idim: int, odim: int, adim: int = 384, aheads: int = 4, elayers: int = 6, eunits: int = 1536, dlayers: int = 6, dunits: int = 1536, postnet_layers: int = 5, postnet_chans: int = 512, postnet_filts: int = 5, postnet_dropout_rate: float = 0.5, positionwise_layer_type: str = 'conv1d', positionwise_conv_kernel_size: int = 1, use_scaled_pos_enc: bool = True, use_batch_norm: bool = True, encoder_normalize_before: bool = True, decoder_normalize_before: bool = True, encoder_concat_after: bool = False, decoder_concat_after: bool = False, reduction_factor: int = 1, encoder_type: str = 'transformer', decoder_type: str = 'transformer', transformer_enc_dropout_rate: float = 0.1, transformer_enc_positional_dropout_rate: float = 0.1, transformer_enc_attn_dropout_rate: float = 0.1, transformer_dec_dropout_rate: float = 0.1, transformer_dec_positional_dropout_rate: float = 0.1, transformer_dec_attn_dropout_rate: float = 0.1, transformer_activation_type: str = 'relu', conformer_pos_enc_layer_type: str = 'rel_pos', conformer_self_attn_layer_type: str = 'rel_selfattn', conformer_activation_type: str = 'swish', use_macaron_style_in_conformer: bool = True, use_cnn_in_conformer: bool = True, zero_triu: bool = False, conformer_enc_kernel_size: int = 7, conformer_dec_kernel_size: int = 31, cnn_dec_dropout_rate: float = 0.2, cnn_postnet_dropout_rate: float = 0.2, cnn_postnet_resblock_kernel_sizes: List[int] = [256, 256], cnn_postnet_kernel_size: int = 5, cnn_decoder_embedding_dim: int = 256, duration_predictor_layers: int = 2, duration_predictor_chans: int = 384, duration_predictor_kernel_size: int = 3, duration_predictor_dropout_rate: float = 0.1, energy_predictor_layers: int = 2, energy_predictor_chans: int = 384, energy_predictor_kernel_size: int = 3, energy_predictor_dropout: float = 0.5, energy_embed_kernel_size: int = 9, energy_embed_dropout: float = 0.5, stop_gradient_from_energy_predictor: bool = False, pitch_predictor_layers: int = 2, pitch_predictor_chans: int = 384, pitch_predictor_kernel_size: int = 3, pitch_predictor_dropout: float = 0.5, pitch_embed_kernel_size: int = 9, pitch_embed_dropout: float = 0.5, stop_gradient_from_pitch_predictor: bool = False, spk_num: Optional[int] = None, spk_embed_dim: Optional[int] = None, spk_embed_integration_type: str = 'add', tone_num: Optional[int] = None, tone_embed_dim: Optional[int] = None, tone_embed_integration_type: str = 'add', init_type: str = 'xavier_uniform', init_enc_alpha: float = 1.0, init_dec_alpha: float = 1.0, enable_speaker_classifier: bool = False, hidden_sc_dim: int = 256)[source]

Bases: Layer

FastSpeech2 module.

This is a module of FastSpeech2 described in FastSpeech 2: Fast and High-Quality End-to-End Text to Speech. Instead of quantized pitch and energy, we use token-averaged value introduced in FastPitch: Parallel Text-to-speech with Pitch Prediction.

Args:

Returns:

Methods

`__call__`(inputs, *kwargs)	Call self as a function.
`add_parameter`(name, parameter)	Adds a Parameter instance.
`add_sublayer`(name, sublayer)	Adds a sub Layer instance.
`apply`(fn)	Applies `fn` recursively to every sublayer (as returned by `.sublayers()`) as well as self.
`buffers`([include_sublayers])	Returns a list of all buffers from current layer and its sub-layers.
`children`()	Returns an iterator over immediate children layers.
`clear_gradients`()	Clear the gradients of all parameters for this layer.
`create_parameter`(shape[, attr, dtype, ...])	Create parameters for this layer.
`create_tensor`([name, persistable, dtype])	Create Tensor for this layer.
`create_variable`([name, persistable, dtype])	Create Tensor for this layer.
`eval`()	Sets this Layer and all its sublayers to evaluation mode.
`extra_repr`()	Extra representation of this layer, you can have custom implementation of your own layer.
`forward`(text, text_lengths, speech, ...[, ...])	Calculate forward propagation.
`full_name`()	Full name for this layer, composed by name_scope + "/" + MyLayer.__class__.__name__
`inference`(text[, durations, pitch, energy, ...])	Generate the sequence of features given the sequences of characters.
`load_dict`(state_dict[, use_structured_name])	Set parameters and persistable buffers from state_dict.
`named_buffers`([prefix, include_sublayers])	Returns an iterator over all buffers in the Layer, yielding tuple of name and Tensor.
`named_children`()	Returns an iterator over immediate children layers, yielding both the name of the layer as well as the layer itself.
`named_parameters`([prefix, include_sublayers])	Returns an iterator over all parameters in the Layer, yielding tuple of name and parameter.
`named_sublayers`([prefix, include_self, ...])	Returns an iterator over all sublayers in the Layer, yielding tuple of name and sublayer.
`parameters`([include_sublayers])	Returns a list of all Parameters from current layer and its sub-layers.
`register_buffer`(name, tensor[, persistable])	Registers a tensor as buffer into the layer.
`register_forward_post_hook`(hook)	Register a forward post-hook for Layer.
`register_forward_pre_hook`(hook)	Register a forward pre-hook for Layer.
`set_dict`(state_dict[, use_structured_name])	Set parameters and persistable buffers from state_dict.
`set_state_dict`(state_dict[, use_structured_name])	Set parameters and persistable buffers from state_dict.
`state_dict`([destination, include_sublayers, ...])	Get all parameters and persistable buffers of current layer and its sub-layers.
`sublayers`([include_self])	Returns a list of sub layers.
`to`([device, dtype, blocking])	Cast the parameters and buffers of Layer by the give device, dtype and blocking.
`to_static_state_dict`([destination, ...])	Get all parameters and buffers of current layer and its sub-layers.
`train`()	Sets this Layer and all its sublayers to training mode.

backward
encoder_infer
register_state_dict_hook

encoder_infer(text: Tensor, spk_id=None, alpha: float = 1.0, spk_emb=None, tone_id=None) → Tuple[Tensor, Tensor, Tensor][source]

forward(text: Tensor, text_lengths: Tensor, speech: Tensor, speech_lengths: Tensor, durations: Tensor, pitch: Tensor, energy: Tensor, tone_id: Optional[Tensor] = None, spk_emb: Optional[Tensor] = None, spk_id: Optional[Tensor] = None) → Tuple[Tensor, Dict[str, Tensor], Tensor][source]

Calculate forward propagation.

Args:

text(Tensor(int64)):: Batch of padded token ids (B, Tmax).
text_lengths(Tensor(int64)):: Batch of lengths of each input (B,).
speech(Tensor):: Batch of padded target features (B, Lmax, odim).
speech_lengths(Tensor(int64)):: Batch of the lengths of each target (B,).
durations(Tensor(int64)):: Batch of padded durations (B, Tmax).
pitch(Tensor):: Batch of padded token-averaged pitch (B, Tmax, 1).
energy(Tensor):: Batch of padded token-averaged energy (B, Tmax, 1).
tone_id(Tensor, optional(int64)):: Batch of padded tone ids (B, Tmax).
spk_emb(Tensor, optional):: Batch of speaker embeddings (B, spk_embed_dim).
spk_id(Tnesor, optional(int64)):: Batch of speaker ids (B,)

Returns:

inference(text: Tensor, durations: Optional[Tensor] = None, pitch: Optional[Tensor] = None, energy: Optional[Tensor] = None, alpha: float = 1.0, use_teacher_forcing: bool = False, spk_emb=None, spk_id=None, tone_id=None) → Tuple[Tensor, Tensor, Tensor][source]

Generate the sequence of features given the sequences of characters.

Args:

text(Tensor(int64)):: Input sequence of characters (T,).
durations(Tensor, optional (int64)):: Groundtruth of duration (T,).
pitch(Tensor, optional):: Groundtruth of token-averaged pitch (T, 1).
energy(Tensor, optional):: Groundtruth of token-averaged energy (T, 1).
alpha(float, optional):: Alpha to control the speed.
use_teacher_forcing(bool, optional):: Whether to use teacher forcing. If true, groundtruth of duration, pitch and energy will be used.
spk_emb(Tensor, optional, optional):: peaker embedding vector (spk_embed_dim,). (Default value = None)
spk_id(Tensor, optional(int64), optional):: spk ids (1,). (Default value = None)
tone_id(Tensor, optional(int64), optional):: tone ids (T,). (Default value = None)

Returns:

class paddlespeech.t2s.models.fastspeech2.fastspeech2.FastSpeech2Inference(normalizer, model)[source]

Bases: Layer

Methods

`__call__`(inputs, *kwargs)	Call self as a function.
`add_parameter`(name, parameter)	Adds a Parameter instance.
`add_sublayer`(name, sublayer)	Adds a sub Layer instance.
`apply`(fn)	Applies `fn` recursively to every sublayer (as returned by `.sublayers()`) as well as self.
`buffers`([include_sublayers])	Returns a list of all buffers from current layer and its sub-layers.
`children`()	Returns an iterator over immediate children layers.
`clear_gradients`()	Clear the gradients of all parameters for this layer.
`create_parameter`(shape[, attr, dtype, ...])	Create parameters for this layer.
`create_tensor`([name, persistable, dtype])	Create Tensor for this layer.
`create_variable`([name, persistable, dtype])	Create Tensor for this layer.
`eval`()	Sets this Layer and all its sublayers to evaluation mode.
`extra_repr`()	Extra representation of this layer, you can have custom implementation of your own layer.
`forward`(text[, spk_id, spk_emb])	Defines the computation performed at every call.
`full_name`()	Full name for this layer, composed by name_scope + "/" + MyLayer.__class__.__name__
`load_dict`(state_dict[, use_structured_name])	Set parameters and persistable buffers from state_dict.
`named_buffers`([prefix, include_sublayers])	Returns an iterator over all buffers in the Layer, yielding tuple of name and Tensor.
`named_children`()	Returns an iterator over immediate children layers, yielding both the name of the layer as well as the layer itself.
`named_parameters`([prefix, include_sublayers])	Returns an iterator over all parameters in the Layer, yielding tuple of name and parameter.
`named_sublayers`([prefix, include_self, ...])	Returns an iterator over all sublayers in the Layer, yielding tuple of name and sublayer.
`parameters`([include_sublayers])	Returns a list of all Parameters from current layer and its sub-layers.
`register_buffer`(name, tensor[, persistable])	Registers a tensor as buffer into the layer.
`register_forward_post_hook`(hook)	Register a forward post-hook for Layer.
`register_forward_pre_hook`(hook)	Register a forward pre-hook for Layer.
`set_dict`(state_dict[, use_structured_name])	Set parameters and persistable buffers from state_dict.
`set_state_dict`(state_dict[, use_structured_name])	Set parameters and persistable buffers from state_dict.
`state_dict`([destination, include_sublayers, ...])	Get all parameters and persistable buffers of current layer and its sub-layers.
`sublayers`([include_self])	Returns a list of sub layers.
`to`([device, dtype, blocking])	Cast the parameters and buffers of Layer by the give device, dtype and blocking.
`to_static_state_dict`([destination, ...])	Get all parameters and buffers of current layer and its sub-layers.
`train`()	Sets this Layer and all its sublayers to training mode.

backward
register_state_dict_hook

forward(text, spk_id=None, spk_emb=None)[source]

Defines the computation performed at every call. Should be overridden by all subclasses.

Parameters:: *inputs(tuple): unpacked tuple arguments **kwargs(dict): unpacked dict arguments

class paddlespeech.t2s.models.fastspeech2.fastspeech2.FastSpeech2Loss(use_masking: bool = True, use_weighted_masking: bool = False)[source]

Bases: Layer

Loss function module for FastSpeech2.

Methods

`__call__`(inputs, *kwargs)	Call self as a function.
`add_parameter`(name, parameter)	Adds a Parameter instance.
`add_sublayer`(name, sublayer)	Adds a sub Layer instance.
`apply`(fn)	Applies `fn` recursively to every sublayer (as returned by `.sublayers()`) as well as self.
`buffers`([include_sublayers])	Returns a list of all buffers from current layer and its sub-layers.
`children`()	Returns an iterator over immediate children layers.
`clear_gradients`()	Clear the gradients of all parameters for this layer.
`create_parameter`(shape[, attr, dtype, ...])	Create parameters for this layer.
`create_tensor`([name, persistable, dtype])	Create Tensor for this layer.
`create_variable`([name, persistable, dtype])	Create Tensor for this layer.
`eval`()	Sets this Layer and all its sublayers to evaluation mode.
`extra_repr`()	Extra representation of this layer, you can have custom implementation of your own layer.
`forward`(after_outs, before_outs, d_outs, ...)	Calculate forward propagation.
`full_name`()	Full name for this layer, composed by name_scope + "/" + MyLayer.__class__.__name__
`load_dict`(state_dict[, use_structured_name])	Set parameters and persistable buffers from state_dict.
`named_buffers`([prefix, include_sublayers])	Returns an iterator over all buffers in the Layer, yielding tuple of name and Tensor.
`named_children`()	Returns an iterator over immediate children layers, yielding both the name of the layer as well as the layer itself.
`named_parameters`([prefix, include_sublayers])	Returns an iterator over all parameters in the Layer, yielding tuple of name and parameter.
`named_sublayers`([prefix, include_self, ...])	Returns an iterator over all sublayers in the Layer, yielding tuple of name and sublayer.
`parameters`([include_sublayers])	Returns a list of all Parameters from current layer and its sub-layers.
`register_buffer`(name, tensor[, persistable])	Registers a tensor as buffer into the layer.
`register_forward_post_hook`(hook)	Register a forward post-hook for Layer.
`register_forward_pre_hook`(hook)	Register a forward pre-hook for Layer.
`set_dict`(state_dict[, use_structured_name])	Set parameters and persistable buffers from state_dict.
`set_state_dict`(state_dict[, use_structured_name])	Set parameters and persistable buffers from state_dict.
`state_dict`([destination, include_sublayers, ...])	Get all parameters and persistable buffers of current layer and its sub-layers.
`sublayers`([include_self])	Returns a list of sub layers.
`to`([device, dtype, blocking])	Cast the parameters and buffers of Layer by the give device, dtype and blocking.
`to_static_state_dict`([destination, ...])	Get all parameters and buffers of current layer and its sub-layers.
`train`()	Sets this Layer and all its sublayers to training mode.

backward
register_state_dict_hook

forward(after_outs: Tensor, before_outs: Tensor, d_outs: Tensor, p_outs: Tensor, e_outs: Tensor, ys: Tensor, ds: Tensor, ps: Tensor, es: Tensor, ilens: Tensor, olens: Tensor, spk_logits: Optional[Tensor] = None, spk_ids: Optional[Tensor] = None) → Tuple[Tensor, Tensor, Tensor, Tensor, Tensor][source]

Calculate forward propagation.

Args:

after_outs(Tensor):: Batch of outputs after postnets (B, Lmax, odim).
before_outs(Tensor):: Batch of outputs before postnets (B, Lmax, odim).
d_outs(Tensor):: Batch of outputs of duration predictor (B, Tmax).
p_outs(Tensor):: Batch of outputs of pitch predictor (B, Tmax, 1).
e_outs(Tensor):: Batch of outputs of energy predictor (B, Tmax, 1).
ys(Tensor):: Batch of target features (B, Lmax, odim).
ds(Tensor):: Batch of durations (B, Tmax).
ps(Tensor):: Batch of target token-averaged pitch (B, Tmax, 1).
es(Tensor):: Batch of target token-averaged energy (B, Tmax, 1).
ilens(Tensor):: Batch of the lengths of each input (B,).
olens(Tensor):: Batch of the lengths of each target (B,).
spk_logits(Option[Tensor]):: Batch of outputs after speaker classifier (B, Lmax, num_spk)
spk_ids(Option[Tensor]):: Batch of target spk_id (B,)

Returns:

class paddlespeech.t2s.models.fastspeech2.fastspeech2.StyleFastSpeech2Inference(normalizer, model, pitch_stats_path=None, energy_stats_path=None)[source]

Bases: FastSpeech2Inference

Methods

`__call__`(inputs, *kwargs)	Call self as a function.
`add_parameter`(name, parameter)	Adds a Parameter instance.
`add_sublayer`(name, sublayer)	Adds a sub Layer instance.
`apply`(fn)	Applies `fn` recursively to every sublayer (as returned by `.sublayers()`) as well as self.
`buffers`([include_sublayers])	Returns a list of all buffers from current layer and its sub-layers.
`children`()	Returns an iterator over immediate children layers.
`clear_gradients`()	Clear the gradients of all parameters for this layer.
`create_parameter`(shape[, attr, dtype, ...])	Create parameters for this layer.
`create_tensor`([name, persistable, dtype])	Create Tensor for this layer.
`create_variable`([name, persistable, dtype])	Create Tensor for this layer.
`eval`()	Sets this Layer and all its sublayers to evaluation mode.
`extra_repr`()	Extra representation of this layer, you can have custom implementation of your own layer.
`forward`(text[, durations, durations_scale, ...])	Args:
`full_name`()	Full name for this layer, composed by name_scope + "/" + MyLayer.__class__.__name__
`load_dict`(state_dict[, use_structured_name])	Set parameters and persistable buffers from state_dict.
`named_buffers`([prefix, include_sublayers])	Returns an iterator over all buffers in the Layer, yielding tuple of name and Tensor.
`named_children`()	Returns an iterator over immediate children layers, yielding both the name of the layer as well as the layer itself.
`named_parameters`([prefix, include_sublayers])	Returns an iterator over all parameters in the Layer, yielding tuple of name and parameter.
`named_sublayers`([prefix, include_self, ...])	Returns an iterator over all sublayers in the Layer, yielding tuple of name and sublayer.
`parameters`([include_sublayers])	Returns a list of all Parameters from current layer and its sub-layers.
`register_buffer`(name, tensor[, persistable])	Registers a tensor as buffer into the layer.
`register_forward_post_hook`(hook)	Register a forward post-hook for Layer.
`register_forward_pre_hook`(hook)	Register a forward pre-hook for Layer.
`set_dict`(state_dict[, use_structured_name])	Set parameters and persistable buffers from state_dict.
`set_state_dict`(state_dict[, use_structured_name])	Set parameters and persistable buffers from state_dict.
`state_dict`([destination, include_sublayers, ...])	Get all parameters and persistable buffers of current layer and its sub-layers.
`sublayers`([include_self])	Returns a list of sub layers.
`to`([device, dtype, blocking])	Cast the parameters and buffers of Layer by the give device, dtype and blocking.
`to_static_state_dict`([destination, ...])	Get all parameters and buffers of current layer and its sub-layers.
`train`()	Sets this Layer and all its sublayers to training mode.

backward
denorm
norm
register_state_dict_hook

denorm(data, mean, std)[source]

forward(text: Tensor, durations: Optional[Union[Tensor, ndarray]] = None, durations_scale: Optional[Union[int, float]] = None, durations_bias: Optional[Union[int, float]] = None, pitch: Optional[Union[Tensor, ndarray]] = None, pitch_scale: Optional[Union[int, float]] = None, pitch_bias: Optional[Union[int, float]] = None, energy: Optional[Union[Tensor, ndarray]] = None, energy_scale: Optional[Union[int, float]] = None, energy_bias: Optional[Union[int, float]] = None, robot: bool = False, spk_emb=None, spk_id=None)[source]

Args:

text(Tensor(int64)):: Input sequence of characters (T,).
durations(paddle.Tensor/np.ndarray, optional (int64)):: Groundtruth of duration (T,), this will overwrite the set of durations_scale and durations_bias

durations_scale(int/float, optional):

durations_bias(int/float, optional):

pitch(paddle.Tensor/np.ndarray, optional):: Groundtruth of token-averaged pitch (T, 1), this will overwrite the set of pitch_scale and pitch_bias
pitch_scale(int/float, optional):: In denormed HZ domain.
pitch_bias(int/float, optional):: In denormed HZ domain.
energy(paddle.Tensor/np.ndarray, optional):: Groundtruth of token-averaged energy (T, 1), this will overwrite the set of energy_scale and energy_bias
energy_scale(int/float, optional):: In denormed domain.
energy_bias(int/float, optional):: In denormed domain.

robot(bool) (Default value = False):

spk_emb(Default value = None):

spk_id(Default value = None):

Returns:

Tensor: logmel

norm(data, mean, std)[source]