paddlespeech.t2s.datasets.batch module
Utility functions to create batch for arrays which satisfy some conditions. Batch functions for text sequences, audio and spectrograms are provided.
- class paddlespeech.t2s.datasets.batch.SpecBatcher(pad_value=0.0, time_major=False, dtype=<class 'numpy.float32'>)[source]
Bases:
object
A wrapper class for batch_spec
Methods
__call__
(minibatch)Call self as a function.
- class paddlespeech.t2s.datasets.batch.TextIDBatcher(pad_id=0, dtype=<class 'numpy.int64'>)[source]
Bases:
object
A wrapper class for batch_text_id.
Methods
__call__
(minibatch)Call self as a function.
- class paddlespeech.t2s.datasets.batch.WavBatcher(pad_value=0.0, dtype=<class 'numpy.float32'>)[source]
Bases:
object
A wrapper class for batch_wav.
Methods
__call__
(minibatch)Call self as a function.
- paddlespeech.t2s.datasets.batch.batch_spec(minibatch, pad_value=0.0, time_major=False, dtype=<class 'numpy.float32'>)[source]
Pad spectra to the largest length and batch them.
- Args:
minibatch (List[np.ndarray]): list of rank-2 arrays of shape(F, T) for mono-channel spectrograms, or list of rank-3 arrays of shape(C, F, T) for multi-channel spectrograms(F stands for frequency bands.), dtype float. pad_value (float, optional): the pad value. Defaults to 0.. dtype (np.dtype, optional): data type of the output. Defaults to np.float32.
- Returns:
np.ndarray: a rank-3 array of shape(B, F, T) or (B, T, F).
- paddlespeech.t2s.datasets.batch.batch_text_id(minibatch, pad_id=0, dtype=<class 'numpy.int64'>)[source]
Pad sequences to text_ids to the largest length and batch them.
- Args:
minibatch (List[np.ndarray]): list of rank-1 arrays, shape(T,), dtype np.int64, text_ids. pad_id (int, optional): the id which correspond to the special pad token. Defaults to 0. dtype (np.dtype, optional): the data dtype of the output. Defaults to np.int64.
- Returns:
np.ndarray: rank-2 array of text_ids, shape(B, T), B stands for batch_size, T stands for length. The output batch.
- paddlespeech.t2s.datasets.batch.batch_wav(minibatch, pad_value=0.0, dtype=<class 'numpy.float32'>)[source]
pad audios to the largest length and batch them.
- Args:
minibatch (List[np.ndarray]): list of rank-1 float arrays(mono-channel audio, shape(T,)), dtype float. pad_value (float, optional): the pad value. Defaults to 0.. dtype (np.dtype, optional): the data type of the output. Defaults to np.float32.
- Returns:
np.ndarray: shape(B, T), the output batch.