paddlespeech.t2s.modules.nets_utils module

paddlespeech.t2s.modules.nets_utils.get_random_segments(x: <module 'paddle' from '/home/docs/checkouts/readthedocs.org/user_builds/paddlespeech/envs/latest/lib/python3.7/site-packages/paddle/__init__.py'>, x_lengths: ~paddle.Tensor, segment_size: int) Tuple[Tensor, Tensor][source]

Get random segments. Args:

x (Tensor):

Input tensor (B, C, T).

x_lengths (Tensor):

Length tensor (B,).

segment_size (int):

Segment size.

Returns:
Tensor:

Segmented tensor (B, C, segment_size).

Tensor:

Start index tensor (B,).

paddlespeech.t2s.modules.nets_utils.get_seg_pos(speech_pad: Tensor, text_pad: Tensor, align_start: Tensor, align_end: Tensor, align_start_lens: Tensor, seg_emb: bool = False)[source]
Args:
speech_pad (paddle.Tensor):

input speech (B, Tmax, D).

text_pad (paddle.Tensor):

input text (B, Tmax2).

align_start (paddle.Tensor):

frame level phone alignment start (B, Tmax2).

align_end (paddle.Tensor):

frame level phone alignment end (B, Tmax2).

align_start_lens (paddle.Tensor):

length of align_start (B, ).

seg_emb (bool):

whether to use segment embedding.

Returns:
paddle.Tensor[int]: n-th phone of each mel, 0<=n<=Tmax2 (B, Tmax).

eg: Tensor(shape=[1, 328], dtype=int64, place=Place(gpu:0), stop_gradient=True, [[0 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 2 , 2 , 2 , 3 , 3 , 3 , 4 , 4 , 4 , 5 , 5 , 5 , 6 , 6 , 6 , 6 , 6 , 6 , 6 , 6 , 7 , 7 , 7 , 7 , 7 , 7 , 7 , 7 , 8 , 8 , 8 , 8 , 9 , 9 , 9 , 9 , 9 , 9 , 9 , 9 , 10, 10, 10, 10, 10, 10, 10, 10, 11, 11, 11, 11, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 13, 13, 13, 13, 13, 13, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 15, 15, 15, 15, 15, 15, 15, 16, 16, 16, 16, 16, 16, 17, 17, 17, 17, 17, 17, 17, 17, 18, 18, 18, 18, 18, 18, 19, 19, 19, 19, 19, 19, 19, 20, 20, 20, 20, 20, 20, 21, 21, 21, 21, 21, 21, 21, 22, 22, 22, 22, 22, 22, 22, 23, 23, 23, 23, 23, 23, 23, 23, 24, 24, 24, 24, 24, 24, 24, 24, 24, 25, 25, 25, 25, 26, 26, 26, 27, 27, 27, 27, 27, 28, 28, 28, 28, 28, 28, 28, 28, 29, 29, 29, 29, 29, 29, 30, 30, 30, 30, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33, 33, 34, 34, 34, 34, 35, 35, 35, 35, 35, 35, 35, 35, 36, 36, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 0 , 0 ]])

paddle.Tensor[int]: n-th phone of each phone, 0<=n<=Tmax2 (B, Tmax2).

eg: Tensor(shape=[1, 38], dtype=int64, place=Place(gpu:0), stop_gradient=True,

[[1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38]])

paddlespeech.t2s.modules.nets_utils.get_segments(x: Tensor, start_idxs: Tensor, segment_size: int) Tensor[source]

Get segments. Args:

x (Tensor):

Input tensor (B, C, T).

start_idxs (Tensor):

Start index tensor (B,).

segment_size (int):

Segment size.

Returns:

Tensor: Segmented tensor (B, C, segment_size).

paddlespeech.t2s.modules.nets_utils.initialize(model: Layer, init: str)[source]

Initialize weights of a neural network module.

Parameters are initialized using the given method or distribution.

Custom initialization routines can be implemented into submodules

Args:
model (nn.Layer):

Target.

init (str):

Method of initialization.

paddlespeech.t2s.modules.nets_utils.make_non_pad_mask(lengths, xs=None, length_dim=-1)[source]

Make mask tensor containing indices of non-padded part.

Args:
lengths (Tensor(int64) or List):

Batch of lengths (B,).

xs (Tensor, optional):

The reference tensor. If set, masks will be the same shape as this tensor.

length_dim (int, optional):

Dimension indicator of the above tensor. See the example.

Returns:
Tensor(bool):

mask tensor containing indices of padded part bool.

Examples:

With only lengths.

>>> lengths = [5, 3, 2]
>>> make_non_pad_mask(lengths)
masks = [[1, 1, 1, 1 ,1],
         [1, 1, 1, 0, 0],
         [1, 1, 0, 0, 0]]

With the reference tensor.

>>> xs = paddle.zeros((3, 2, 4))
>>> make_non_pad_mask(lengths, xs)
tensor([[[1, 1, 1, 1],
         [1, 1, 1, 1]],
        [[1, 1, 1, 0],
         [1, 1, 1, 0]],
        [[1, 1, 0, 0],
         [1, 1, 0, 0]]])
>>> xs = paddle.zeros((3, 2, 6))
>>> make_non_pad_mask(lengths, xs)
tensor([[[1, 1, 1, 1, 1, 0],
         [1, 1, 1, 1, 1, 0]],
        [[1, 1, 1, 0, 0, 0],
         [1, 1, 1, 0, 0, 0]],
        [[1, 1, 0, 0, 0, 0],
         [1, 1, 0, 0, 0, 0]]])

With the reference tensor and dimension indicator.

>>> xs = paddle.zeros((3, 6, 6))
>>> make_non_pad_mask(lengths, xs, 1)
tensor([[[1, 1, 1, 1, 1, 1],
         [1, 1, 1, 1, 1, 1],
         [1, 1, 1, 1, 1, 1],
         [1, 1, 1, 1, 1, 1],
         [1, 1, 1, 1, 1, 1],
         [0, 0, 0, 0, 0, 0]],
        [[1, 1, 1, 1, 1, 1],
         [1, 1, 1, 1, 1, 1],
         [1, 1, 1, 1, 1, 1],
         [0, 0, 0, 0, 0, 0],
         [0, 0, 0, 0, 0, 0],
         [0, 0, 0, 0, 0, 0]],
        [[1, 1, 1, 1, 1, 1],
         [1, 1, 1, 1, 1, 1],
         [0, 0, 0, 0, 0, 0],
         [0, 0, 0, 0, 0, 0],
         [0, 0, 0, 0, 0, 0],
         [0, 0, 0, 0, 0, 0]]])
>>> make_non_pad_mask(lengths, xs, 2)
tensor([[[1, 1, 1, 1, 1, 0],
         [1, 1, 1, 1, 1, 0],
         [1, 1, 1, 1, 1, 0],
         [1, 1, 1, 1, 1, 0],
         [1, 1, 1, 1, 1, 0],
         [1, 1, 1, 1, 1, 0]],
        [[1, 1, 1, 0, 0, 0],
         [1, 1, 1, 0, 0, 0],
         [1, 1, 1, 0, 0, 0],
         [1, 1, 1, 0, 0, 0],
         [1, 1, 1, 0, 0, 0],
         [1, 1, 1, 0, 0, 0]],
        [[1, 1, 0, 0, 0, 0],
         [1, 1, 0, 0, 0, 0],
         [1, 1, 0, 0, 0, 0],
         [1, 1, 0, 0, 0, 0],
         [1, 1, 0, 0, 0, 0],
         [1, 1, 0, 0, 0, 0]]])
paddlespeech.t2s.modules.nets_utils.make_pad_mask(lengths, xs=None, length_dim=-1)[source]

Make mask tensor containing indices of padded part.

Args:
lengths (Tensor(int64)):

Batch of lengths (B,).

xs (Tensor, optional):

The reference tensor. If set, masks will be the same shape as this tensor.

length_dim (int, optional):

Dimension indicator of the above tensor. See the example.

Returns:

Tensor(bool): Mask tensor containing indices of padded part bool.

Examples:

With only lengths.

>>> lengths = [5, 3, 2]
>>> make_non_pad_mask(lengths)
masks = [[0, 0, 0, 0 ,0],
         [0, 0, 0, 1, 1],
         [0, 0, 1, 1, 1]]

With the reference tensor.

>>> xs = paddle.zeros((3, 2, 4))
>>> make_pad_mask(lengths, xs)
tensor([[[0, 0, 0, 0],
         [0, 0, 0, 0]],
        [[0, 0, 0, 1],
         [0, 0, 0, 1]],
        [[0, 0, 1, 1],
         [0, 0, 1, 1]]])
>>> xs = paddle.zeros((3, 2, 6))
>>> make_pad_mask(lengths, xs)
tensor([[[0, 0, 0, 0, 0, 1],
         [0, 0, 0, 0, 0, 1]],
        [[0, 0, 0, 1, 1, 1],
         [0, 0, 0, 1, 1, 1]],
        [[0, 0, 1, 1, 1, 1],
         [0, 0, 1, 1, 1, 1]]])

With the reference tensor and dimension indicator.

>>> xs = paddle.zeros((3, 6, 6))
>>> make_pad_mask(lengths, xs, 1)
tensor([[[0, 0, 0, 0, 0, 0],
         [0, 0, 0, 0, 0, 0],
         [0, 0, 0, 0, 0, 0],
         [0, 0, 0, 0, 0, 0],
         [0, 0, 0, 0, 0, 0],
         [1, 1, 1, 1, 1, 1]],
        [[0, 0, 0, 0, 0, 0],
         [0, 0, 0, 0, 0, 0],
         [0, 0, 0, 0, 0, 0],
         [1, 1, 1, 1, 1, 1],
         [1, 1, 1, 1, 1, 1],
         [1, 1, 1, 1, 1, 1]],
        [[0, 0, 0, 0, 0, 0],
         [0, 0, 0, 0, 0, 0],
         [1, 1, 1, 1, 1, 1],
         [1, 1, 1, 1, 1, 1],
         [1, 1, 1, 1, 1, 1],
         [1, 1, 1, 1, 1, 1]]])
>>> make_pad_mask(lengths, xs, 2)
tensor([[[0, 0, 0, 0, 0, 1],
         [0, 0, 0, 0, 0, 1],
         [0, 0, 0, 0, 0, 1],
         [0, 0, 0, 0, 0, 1],
         [0, 0, 0, 0, 0, 1],
         [0, 0, 0, 0, 0, 1]],
        [[0, 0, 0, 1, 1, 1],
         [0, 0, 0, 1, 1, 1],
         [0, 0, 0, 1, 1, 1],
         [0, 0, 0, 1, 1, 1],
         [0, 0, 0, 1, 1, 1],
         [0, 0, 0, 1, 1, 1]],
        [[0, 0, 1, 1, 1, 1],
         [0, 0, 1, 1, 1, 1],
         [0, 0, 1, 1, 1, 1],
         [0, 0, 1, 1, 1, 1],
         [0, 0, 1, 1, 1, 1],
         [0, 0, 1, 1, 1, 1]]],)
paddlespeech.t2s.modules.nets_utils.pad_list(xs, pad_value)[source]

Perform padding for the list of tensors.

Args:
xs (List[Tensor]):

List of Tensors [(T_1, *), (T_2, *), ..., (T_B, *)].

pad_value (float):

Value for padding.

Returns:

Tensor: Padded tensor (B, Tmax, *).

Examples:
>>> x = [paddle.ones([4]), paddle.ones([2]), paddle.ones([1])]
>>> x
[tensor([1., 1., 1., 1.]), tensor([1., 1.]), tensor([1.])]
>>> pad_list(x, 0)
tensor([[1., 1., 1., 1.],
        [1., 1., 0., 0.],
        [1., 0., 0., 0.]])
paddlespeech.t2s.modules.nets_utils.paddle_gather(x, dim, index)[source]
paddlespeech.t2s.modules.nets_utils.phones_masking(xs_pad: Tensor, src_mask: Tensor, align_start: Tensor, align_end: Tensor, align_start_lens: Tensor, mlm_prob: float = 0.8, mean_phn_span: int = 8, span_bdy: Optional[Tensor] = None)[source]
Args:
xs_pad (paddle.Tensor):

input speech (B, Tmax, D).

src_mask (paddle.Tensor):

mask of speech (B, 1, Tmax).

align_start (paddle.Tensor):

frame level phone alignment start (B, Tmax2).

align_end (paddle.Tensor):

frame level phone alignment end (B, Tmax2).

align_start_lens (paddle.Tensor):

length of align_start (B, ).

mlm_prob (float): mean_phn_span (int): span_bdy (paddle.Tensor):

masked mel boundary of input speech (B, 2).

Returns:

paddle.Tensor[bool]: masked position of input speech (B, Tmax).

paddlespeech.t2s.modules.nets_utils.phones_text_masking(xs_pad: Tensor, src_mask: Tensor, text_pad: Tensor, text_mask: Tensor, align_start: Tensor, align_end: Tensor, align_start_lens: Tensor, mlm_prob: float = 0.8, mean_phn_span: int = 8, span_bdy: Optional[Tensor] = None)[source]
Args:
xs_pad (paddle.Tensor):

input speech (B, Tmax, D).

src_mask (paddle.Tensor):

mask of speech (B, 1, Tmax).

text_pad (paddle.Tensor):

input text (B, Tmax2).

text_mask (paddle.Tensor):

mask of text (B, 1, Tmax2).

align_start (paddle.Tensor):

frame level phone alignment start (B, Tmax2).

align_end (paddle.Tensor):

frame level phone alignment end (B, Tmax2).

align_start_lens (paddle.Tensor):

length of align_start (B, ).

mlm_prob (float): mean_phn_span (int): span_bdy (paddle.Tensor):

masked mel boundary of input speech (B, 2).

Returns:
paddle.Tensor[bool]:

masked position of input speech (B, Tmax).

paddle.Tensor[bool]:

masked position of input text (B, Tmax2).

paddlespeech.t2s.modules.nets_utils.random_spans_noise_mask(length: int, mlm_prob: float = 0.8, mean_phn_span: float = 8)[source]

This function is copy of random_spans_helper . Noise mask consisting of random spans of noise tokens. The number of noise tokens and the number of noise spans and non-noise spans are determined deterministically as follows: num_noise_tokens = round(length * noise_density) num_nonnoise_spans = num_noise_spans = round(num_noise_tokens / mean_noise_span_length) Spans alternate between non-noise and noise, beginning with non-noise. Subject to the above restrictions, all masks are equally likely. Args:

length: an int32 scalar (length of the incoming token sequence) noise_density: a float - approximate density of output mask mean_noise_span_length: a number

Returns:

np.ndarray: a boolean tensor with shape [length]