paddlespeech.s2t.frontend.augmentor.spec_augment module

Contains the volume perturb augmentation model.

class paddlespeech.s2t.frontend.augmentor.spec_augment.SpecAugmentor(rng, F, T, n_freq_masks, n_time_masks, p=1.0, W=40, adaptive_number_ratio=0, adaptive_size_ratio=0, max_n_time_masks=20, replace_with_zero=True, warp_mode='PIL')[source]

Bases: AugmentorBase

Augmentation model for Time warping, Frequency masking, Time masking.

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

https://arxiv.org/abs/1904.08779

SpecAugment on Large Scale Datasets

https://arxiv.org/abs/1912.05533

Attributes:
freq_mask
time_mask

Methods

__call__(x[, train])

Call self as a function.

mask_freq(x[, replace_with_zero])

freq mask

mask_time(x[, replace_with_zero])

time mask

time_warp(x[, mode])

time warp for spec augment move random center frame by the random width ~ uniform(-window, window)

transform_audio(audio_segment)

Adds various effects to the input audio segment.

transform_feature(x)

Args:

librispeech_basic

librispeech_double

switchboard_mild

switchboard_strong

property freq_mask
librispeech_basic()[source]
librispeech_double()[source]
mask_freq(x, replace_with_zero=False)[source]

freq mask

Args:

x (np.ndarray): spectrogram (time, freq) replace_with_zero (bool, optional): Defaults to False.

Returns:

np.ndarray: freq mask spectrogram (time, freq)

mask_time(x, replace_with_zero=False)[source]

time mask

Args:

x (np.ndarray): spectrogram (time, freq) replace_with_zero (bool, optional): Defaults to False.

Returns:

np.ndarray: time mask spectrogram (time, freq)

switchboard_mild()[source]
switchboard_strong()[source]
property time_mask
time_warp(x, mode='PIL')[source]

time warp for spec augment move random center frame by the random width ~ uniform(-window, window)

Args:

x (np.ndarray): spectrogram (time, freq) mode (str): PIL or sparse_image_warp

Raises:

NotImplementedError: [description] NotImplementedError: [description]

Returns:

np.ndarray: time warped spectrogram (time, freq)

transform_feature(x: ndarray)[source]
Args:

x (np.ndarray): [T, F]

Returns:

x (np.ndarray): [T, F]