paddlespeech.t2s.modules.nets_utils module
- paddlespeech.t2s.modules.nets_utils.get_random_segments(x: <module 'paddle' from '/home/docs/checkouts/readthedocs.org/user_builds/paddlespeech/envs/latest/lib/python3.7/site-packages/paddle/__init__.py'>, x_lengths: ~paddle.Tensor, segment_size: int) Tuple[Tensor, Tensor] [source]
Get random segments. Args:
- x (Tensor):
Input tensor (B, C, T).
- x_lengths (Tensor):
Length tensor (B,).
- segment_size (int):
Segment size.
- Returns:
- Tensor:
Segmented tensor (B, C, segment_size).
- Tensor:
Start index tensor (B,).
- paddlespeech.t2s.modules.nets_utils.get_seg_pos(speech_pad: Tensor, text_pad: Tensor, align_start: Tensor, align_end: Tensor, align_start_lens: Tensor, seg_emb: bool = False)[source]
- Args:
- speech_pad (paddle.Tensor):
input speech (B, Tmax, D).
- text_pad (paddle.Tensor):
input text (B, Tmax2).
- align_start (paddle.Tensor):
frame level phone alignment start (B, Tmax2).
- align_end (paddle.Tensor):
frame level phone alignment end (B, Tmax2).
- align_start_lens (paddle.Tensor):
length of align_start (B, ).
- seg_emb (bool):
whether to use segment embedding.
- Returns:
- paddle.Tensor[int]: n-th phone of each mel, 0<=n<=Tmax2 (B, Tmax).
eg: Tensor(shape=[1, 328], dtype=int64, place=Place(gpu:0), stop_gradient=True, [[0 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 2 , 2 , 2 , 3 , 3 , 3 , 4 , 4 , 4 , 5 , 5 , 5 , 6 , 6 , 6 , 6 , 6 , 6 , 6 , 6 , 7 , 7 , 7 , 7 , 7 , 7 , 7 , 7 , 8 , 8 , 8 , 8 , 9 , 9 , 9 , 9 , 9 , 9 , 9 , 9 , 10, 10, 10, 10, 10, 10, 10, 10, 11, 11, 11, 11, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 13, 13, 13, 13, 13, 13, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 15, 15, 15, 15, 15, 15, 15, 16, 16, 16, 16, 16, 16, 17, 17, 17, 17, 17, 17, 17, 17, 18, 18, 18, 18, 18, 18, 19, 19, 19, 19, 19, 19, 19, 20, 20, 20, 20, 20, 20, 21, 21, 21, 21, 21, 21, 21, 22, 22, 22, 22, 22, 22, 22, 23, 23, 23, 23, 23, 23, 23, 23, 24, 24, 24, 24, 24, 24, 24, 24, 24, 25, 25, 25, 25, 26, 26, 26, 27, 27, 27, 27, 27, 28, 28, 28, 28, 28, 28, 28, 28, 29, 29, 29, 29, 29, 29, 30, 30, 30, 30, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33, 33, 34, 34, 34, 34, 35, 35, 35, 35, 35, 35, 35, 35, 36, 36, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 0 , 0 ]])
- paddle.Tensor[int]: n-th phone of each phone, 0<=n<=Tmax2 (B, Tmax2).
eg: Tensor(shape=[1, 38], dtype=int64, place=Place(gpu:0), stop_gradient=True,
[[1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38]])
- paddlespeech.t2s.modules.nets_utils.get_segments(x: Tensor, start_idxs: Tensor, segment_size: int) Tensor [source]
Get segments. Args:
- x (Tensor):
Input tensor (B, C, T).
- start_idxs (Tensor):
Start index tensor (B,).
- segment_size (int):
Segment size.
- Returns:
Tensor: Segmented tensor (B, C, segment_size).
- paddlespeech.t2s.modules.nets_utils.initialize(model: Layer, init: str)[source]
Initialize weights of a neural network module.
Parameters are initialized using the given method or distribution.
Custom initialization routines can be implemented into submodules
- Args:
- model (nn.Layer):
Target.
- init (str):
Method of initialization.
- paddlespeech.t2s.modules.nets_utils.make_non_pad_mask(lengths, xs=None, length_dim=-1)[source]
Make mask tensor containing indices of non-padded part.
- Args:
- lengths (Tensor(int64) or List):
Batch of lengths (B,).
- xs (Tensor, optional):
The reference tensor. If set, masks will be the same shape as this tensor.
- length_dim (int, optional):
Dimension indicator of the above tensor. See the example.
- Returns:
- Tensor(bool):
mask tensor containing indices of padded part bool.
- Examples:
With only lengths.
>>> lengths = [5, 3, 2] >>> make_non_pad_mask(lengths) masks = [[1, 1, 1, 1 ,1], [1, 1, 1, 0, 0], [1, 1, 0, 0, 0]]
With the reference tensor.
>>> xs = paddle.zeros((3, 2, 4)) >>> make_non_pad_mask(lengths, xs) tensor([[[1, 1, 1, 1], [1, 1, 1, 1]], [[1, 1, 1, 0], [1, 1, 1, 0]], [[1, 1, 0, 0], [1, 1, 0, 0]]]) >>> xs = paddle.zeros((3, 2, 6)) >>> make_non_pad_mask(lengths, xs) tensor([[[1, 1, 1, 1, 1, 0], [1, 1, 1, 1, 1, 0]], [[1, 1, 1, 0, 0, 0], [1, 1, 1, 0, 0, 0]], [[1, 1, 0, 0, 0, 0], [1, 1, 0, 0, 0, 0]]])
With the reference tensor and dimension indicator.
>>> xs = paddle.zeros((3, 6, 6)) >>> make_non_pad_mask(lengths, xs, 1) tensor([[[1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1], [0, 0, 0, 0, 0, 0]], [[1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0]], [[1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0]]]) >>> make_non_pad_mask(lengths, xs, 2) tensor([[[1, 1, 1, 1, 1, 0], [1, 1, 1, 1, 1, 0], [1, 1, 1, 1, 1, 0], [1, 1, 1, 1, 1, 0], [1, 1, 1, 1, 1, 0], [1, 1, 1, 1, 1, 0]], [[1, 1, 1, 0, 0, 0], [1, 1, 1, 0, 0, 0], [1, 1, 1, 0, 0, 0], [1, 1, 1, 0, 0, 0], [1, 1, 1, 0, 0, 0], [1, 1, 1, 0, 0, 0]], [[1, 1, 0, 0, 0, 0], [1, 1, 0, 0, 0, 0], [1, 1, 0, 0, 0, 0], [1, 1, 0, 0, 0, 0], [1, 1, 0, 0, 0, 0], [1, 1, 0, 0, 0, 0]]])
- paddlespeech.t2s.modules.nets_utils.make_pad_mask(lengths, xs=None, length_dim=-1)[source]
Make mask tensor containing indices of padded part.
- Args:
- lengths (Tensor(int64)):
Batch of lengths (B,).
- xs (Tensor, optional):
The reference tensor. If set, masks will be the same shape as this tensor.
- length_dim (int, optional):
Dimension indicator of the above tensor. See the example.
- Returns:
Tensor(bool): Mask tensor containing indices of padded part bool.
- Examples:
With only lengths.
>>> lengths = [5, 3, 2] >>> make_non_pad_mask(lengths) masks = [[0, 0, 0, 0 ,0], [0, 0, 0, 1, 1], [0, 0, 1, 1, 1]]
With the reference tensor.
>>> xs = paddle.zeros((3, 2, 4)) >>> make_pad_mask(lengths, xs) tensor([[[0, 0, 0, 0], [0, 0, 0, 0]], [[0, 0, 0, 1], [0, 0, 0, 1]], [[0, 0, 1, 1], [0, 0, 1, 1]]]) >>> xs = paddle.zeros((3, 2, 6)) >>> make_pad_mask(lengths, xs) tensor([[[0, 0, 0, 0, 0, 1], [0, 0, 0, 0, 0, 1]], [[0, 0, 0, 1, 1, 1], [0, 0, 0, 1, 1, 1]], [[0, 0, 1, 1, 1, 1], [0, 0, 1, 1, 1, 1]]])
With the reference tensor and dimension indicator.
>>> xs = paddle.zeros((3, 6, 6)) >>> make_pad_mask(lengths, xs, 1) tensor([[[0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], [1, 1, 1, 1, 1, 1]], [[0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], [1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1]], [[0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], [1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1]]]) >>> make_pad_mask(lengths, xs, 2) tensor([[[0, 0, 0, 0, 0, 1], [0, 0, 0, 0, 0, 1], [0, 0, 0, 0, 0, 1], [0, 0, 0, 0, 0, 1], [0, 0, 0, 0, 0, 1], [0, 0, 0, 0, 0, 1]], [[0, 0, 0, 1, 1, 1], [0, 0, 0, 1, 1, 1], [0, 0, 0, 1, 1, 1], [0, 0, 0, 1, 1, 1], [0, 0, 0, 1, 1, 1], [0, 0, 0, 1, 1, 1]], [[0, 0, 1, 1, 1, 1], [0, 0, 1, 1, 1, 1], [0, 0, 1, 1, 1, 1], [0, 0, 1, 1, 1, 1], [0, 0, 1, 1, 1, 1], [0, 0, 1, 1, 1, 1]]],)
- paddlespeech.t2s.modules.nets_utils.pad_list(xs, pad_value)[source]
Perform padding for the list of tensors.
- Args:
- xs (List[Tensor]):
List of Tensors [(T_1, *), (T_2, *), ..., (T_B, *)].
- pad_value (float):
Value for padding.
- Returns:
Tensor: Padded tensor (B, Tmax, *).
- Examples:
>>> x = [paddle.ones([4]), paddle.ones([2]), paddle.ones([1])] >>> x [tensor([1., 1., 1., 1.]), tensor([1., 1.]), tensor([1.])] >>> pad_list(x, 0) tensor([[1., 1., 1., 1.], [1., 1., 0., 0.], [1., 0., 0., 0.]])
- paddlespeech.t2s.modules.nets_utils.phones_masking(xs_pad: Tensor, src_mask: Tensor, align_start: Tensor, align_end: Tensor, align_start_lens: Tensor, mlm_prob: float = 0.8, mean_phn_span: int = 8, span_bdy: Optional[Tensor] = None)[source]
- Args:
- xs_pad (paddle.Tensor):
input speech (B, Tmax, D).
- src_mask (paddle.Tensor):
mask of speech (B, 1, Tmax).
- align_start (paddle.Tensor):
frame level phone alignment start (B, Tmax2).
- align_end (paddle.Tensor):
frame level phone alignment end (B, Tmax2).
- align_start_lens (paddle.Tensor):
length of align_start (B, ).
mlm_prob (float): mean_phn_span (int): span_bdy (paddle.Tensor):
masked mel boundary of input speech (B, 2).
- Returns:
paddle.Tensor[bool]: masked position of input speech (B, Tmax).
- paddlespeech.t2s.modules.nets_utils.phones_text_masking(xs_pad: Tensor, src_mask: Tensor, text_pad: Tensor, text_mask: Tensor, align_start: Tensor, align_end: Tensor, align_start_lens: Tensor, mlm_prob: float = 0.8, mean_phn_span: int = 8, span_bdy: Optional[Tensor] = None)[source]
- Args:
- xs_pad (paddle.Tensor):
input speech (B, Tmax, D).
- src_mask (paddle.Tensor):
mask of speech (B, 1, Tmax).
- text_pad (paddle.Tensor):
input text (B, Tmax2).
- text_mask (paddle.Tensor):
mask of text (B, 1, Tmax2).
- align_start (paddle.Tensor):
frame level phone alignment start (B, Tmax2).
- align_end (paddle.Tensor):
frame level phone alignment end (B, Tmax2).
- align_start_lens (paddle.Tensor):
length of align_start (B, ).
mlm_prob (float): mean_phn_span (int): span_bdy (paddle.Tensor):
masked mel boundary of input speech (B, 2).
- Returns:
- paddle.Tensor[bool]:
masked position of input speech (B, Tmax).
- paddle.Tensor[bool]:
masked position of input text (B, Tmax2).
- paddlespeech.t2s.modules.nets_utils.random_spans_noise_mask(length: int, mlm_prob: float = 0.8, mean_phn_span: float = 8)[source]
This function is copy of random_spans_helper . Noise mask consisting of random spans of noise tokens. The number of noise tokens and the number of noise spans and non-noise spans are determined deterministically as follows: num_noise_tokens = round(length * noise_density) num_nonnoise_spans = num_noise_spans = round(num_noise_tokens / mean_noise_span_length) Spans alternate between non-noise and noise, beginning with non-noise. Subject to the above restrictions, all masks are equally likely. Args:
length: an int32 scalar (length of the incoming token sequence) noise_density: a float - approximate density of output mask mean_noise_span_length: a number
- Returns:
np.ndarray: a boolean tensor with shape [length]