paddlespeech.s2t.frontend.augmentor.augmentation module

Contains the data augmentation pipeline.

class paddlespeech.s2t.frontend.augmentor.augmentation.AugmentationPipeline(preprocess_conf: str, random_seed: int = 0)[source]

Bases: object

Build a pre-processing pipeline with various augmentation models.Such a data augmentation pipeline is oftern leveraged to augment the training samples to make the model invariant to certain types of perturbations in the real world, improving model's generalization ability.

The pipeline is built according to the augmentation configuration in json string, e.g.

[ {
        "type": "noise",
        "params": {"min_snr_dB": 10,
                   "max_snr_dB": 20,
                   "noise_manifest_path": "datasets/manifest.noise"},
        "prob": 0.0
    },
    {
        "type": "speed",
        "params": {"min_speed_rate": 0.9,
                   "max_speed_rate": 1.1},
        "prob": 1.0
    },
    {
        "type": "shift",
        "params": {"min_shift_ms": -5,
                   "max_shift_ms": 5},
        "prob": 1.0
    },
    {
        "type": "volume",
        "params": {"min_gain_dBFS": -10,
                   "max_gain_dBFS": 10},
        "prob": 0.0
    },
    {
        "type": "bayesian_normal",
        "params": {"target_db": -20,
                   "prior_db": -20,
                   "prior_samples": 100},
        "prob": 0.0
    }
]

This augmentation configuration inserts two augmentation models into the pipeline, with one is VolumePerturbAugmentor and the other SpeedPerturbAugmentor. "prob" indicates the probability of the current augmentor to take effect. If "prob" is zero, the augmentor does not take effect.

Params:

preprocess_conf(str): Augmentation configuration in json file or json string. random_seed(int): Random seed.

Raises:

ValueError: If the augmentation json config is in incorrect format".

Methods

__call__(xs[, uttid_list])

Call self as a function.

transform_audio(audio_segment)

Run the pre-processing pipeline for data augmentation.

transform_feature(spec_segment)

spectrogram augmentation.

SPEC_TYPES = {'specaug'}
transform_audio(audio_segment)[source]

Run the pre-processing pipeline for data augmentation.

Note that this is an in-place transformation.

Parameters:

audio_segment (AudioSegmenet|SpeechSegment) -- Audio segment to process.

transform_feature(spec_segment)[source]

spectrogram augmentation.

Args:

spec_segment (np.ndarray): audio feature, (D, T).