paddlespeech.s2t.frontend.augmentor.spec_augment module

Contains the volume perturb augmentation model.

class paddlespeech.s2t.frontend.augmentor.spec_augment.SpecAugmentor(rng, F, T, n_freq_masks, n_time_masks, p=1.0, W=40, adaptive_number_ratio=0, adaptive_size_ratio=0, max_n_time_masks=20, replace_with_zero=True, warp_mode='PIL')[source]

Bases: AugmentorBase

Augmentation model for Time warping, Frequency masking, Time masking.

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition: https://arxiv.org/abs/1904.08779
SpecAugment on Large Scale Datasets: https://arxiv.org/abs/1912.05533

Attributes:

freq_mask
time_mask

Methods

`__call__`(x[, train])	Call self as a function.
`mask_freq`(x[, replace_with_zero])	freq mask
`mask_time`(x[, replace_with_zero])	time mask
`time_warp`(x[, mode])	time warp for spec augment move random center frame by the random width ~ uniform(-window, window)
`transform_audio`(audio_segment)	Adds various effects to the input audio segment.
`transform_feature`(x)	Args:

librispeech_basic
librispeech_double
switchboard_mild
switchboard_strong

property freq_mask

librispeech_basic()[source]

librispeech_double()[source]

mask_freq(x, replace_with_zero=False)[source]

freq mask

Args:: x (np.ndarray): spectrogram (time, freq) replace_with_zero (bool, optional): Defaults to False.
Returns:: np.ndarray: freq mask spectrogram (time, freq)

mask_time(x, replace_with_zero=False)[source]

time mask

Args:: x (np.ndarray): spectrogram (time, freq) replace_with_zero (bool, optional): Defaults to False.
Returns:: np.ndarray: time mask spectrogram (time, freq)

switchboard_mild()[source]

switchboard_strong()[source]

property time_mask

time_warp(x, mode='PIL')[source]

time warp for spec augment move random center frame by the random width ~ uniform(-window, window)

Args:: x (np.ndarray): spectrogram (time, freq) mode (str): PIL or sparse_image_warp
Raises:: NotImplementedError: [description] NotImplementedError: [description]
Returns:: np.ndarray: time warped spectrogram (time, freq)

transform_feature(x: ndarray)[source]

Args:: x (np.ndarray): [T, F]
Returns:: x (np.ndarray): [T, F]