paddlespeech.t2s.datasets.batch module

Utility functions to create batch for arrays which satisfy some conditions. Batch functions for text sequences, audio and spectrograms are provided.

class paddlespeech.t2s.datasets.batch.SpecBatcher(pad_value=0.0, time_major=False, dtype=<class 'numpy.float32'>)[source]

Bases: object

A wrapper class for batch_spec

Methods

__call__(minibatch)

Call self as a function.

class paddlespeech.t2s.datasets.batch.TextIDBatcher(pad_id=0, dtype=<class 'numpy.int64'>)[source]

Bases: object

A wrapper class for batch_text_id.

Methods

__call__(minibatch)

Call self as a function.

class paddlespeech.t2s.datasets.batch.WavBatcher(pad_value=0.0, dtype=<class 'numpy.float32'>)[source]

Bases: object

A wrapper class for batch_wav.

Methods

__call__(minibatch)

Call self as a function.

paddlespeech.t2s.datasets.batch.batch_spec(minibatch, pad_value=0.0, time_major=False, dtype=<class 'numpy.float32'>)[source]

Pad spectra to the largest length and batch them.

Args:: minibatch (List[np.ndarray]): list of rank-2 arrays of shape(F, T) for mono-channel spectrograms, or list of rank-3 arrays of shape(C, F, T) for multi-channel spectrograms(F stands for frequency bands.), dtype float. pad_value (float, optional): the pad value. Defaults to 0.. dtype (np.dtype, optional): data type of the output. Defaults to np.float32.
Returns:: np.ndarray: a rank-3 array of shape(B, F, T) or (B, T, F).

paddlespeech.t2s.datasets.batch.batch_text_id(minibatch, pad_id=0, dtype=<class 'numpy.int64'>)[source]

Pad sequences to text_ids to the largest length and batch them.

Args:: minibatch (List[np.ndarray]): list of rank-1 arrays, shape(T,), dtype np.int64, text_ids. pad_id (int, optional): the id which correspond to the special pad token. Defaults to 0. dtype (np.dtype, optional): the data dtype of the output. Defaults to np.int64.
Returns:: np.ndarray: rank-2 array of text_ids, shape(B, T), B stands for batch_size, T stands for length. The output batch.

paddlespeech.t2s.datasets.batch.batch_wav(minibatch, pad_value=0.0, dtype=<class 'numpy.float32'>)[source]

pad audios to the largest length and batch them.

Args:: minibatch (List[np.ndarray]): list of rank-1 float arrays(mono-channel audio, shape(T,)), dtype float. pad_value (float, optional): the pad value. Defaults to 0.. dtype (np.dtype, optional): the data type of the output. Defaults to np.float32.
Returns:: np.ndarray: shape(B, T), the output batch.