paddlespeech.s2t.frontend.augmentor.spec_augment module
Contains the volume perturb augmentation model.
- class paddlespeech.s2t.frontend.augmentor.spec_augment.SpecAugmentor(rng, F, T, n_freq_masks, n_time_masks, p=1.0, W=40, adaptive_number_ratio=0, adaptive_size_ratio=0, max_n_time_masks=20, replace_with_zero=True, warp_mode='PIL')[source]
Bases:
AugmentorBase
Augmentation model for Time warping, Frequency masking, Time masking.
- SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
- SpecAugment on Large Scale Datasets
- Attributes:
- freq_mask
- time_mask
Methods
__call__
(x[, train])Call self as a function.
mask_freq
(x[, replace_with_zero])freq mask
mask_time
(x[, replace_with_zero])time mask
time_warp
(x[, mode])time warp for spec augment move random center frame by the random width ~ uniform(-window, window)
transform_audio
(audio_segment)Adds various effects to the input audio segment.
Args:
librispeech_basic
librispeech_double
switchboard_mild
switchboard_strong
- property freq_mask
- mask_freq(x, replace_with_zero=False)[source]
freq mask
- Args:
x (np.ndarray): spectrogram (time, freq) replace_with_zero (bool, optional): Defaults to False.
- Returns:
np.ndarray: freq mask spectrogram (time, freq)
- mask_time(x, replace_with_zero=False)[source]
time mask
- Args:
x (np.ndarray): spectrogram (time, freq) replace_with_zero (bool, optional): Defaults to False.
- Returns:
np.ndarray: time mask spectrogram (time, freq)
- property time_mask
- time_warp(x, mode='PIL')[source]
time warp for spec augment move random center frame by the random width ~ uniform(-window, window)
- Args:
x (np.ndarray): spectrogram (time, freq) mode (str): PIL or sparse_image_warp
- Raises:
NotImplementedError: [description] NotImplementedError: [description]
- Returns:
np.ndarray: time warped spectrogram (time, freq)