paddlespeech.vector.exps.ge2e.audio_processor module

class paddlespeech.vector.exps.ge2e.audio_processor.SpeakerVerificationPreprocessor(sampling_rate: int, audio_norm_target_dBFS: float, vad_window_length, vad_moving_average_width, vad_max_silence_length, mel_window_length, mel_window_step, n_mels, partial_n_frames: int, min_pad_coverage: float = 0.75, partial_overlap_ratio: float = 0.5)[source]

Bases: object

Methods

extract_mel_partials
melspectrogram
preprocess_wav

extract_mel_partials(wav)[source]

melspectrogram(wav)[source]

preprocess_wav(fpath_or_wav, source_sr=None)[source]

paddlespeech.vector.exps.ge2e.audio_processor.compute_partial_slices(n_samples: int, partial_utterance_n_frames: int, hop_length: int, min_pad_coverage: float = 0.75, overlap: float = 0.5)[source]

Computes where to split an utterance waveform and its corresponding mel spectrogram to obtain partial utterances of <partial_utterance_n_frames> each. Both the waveform and the mel spectrogram slices are returned, so as to make each partial utterance waveform correspond to its spectrogram. This function assumes that the mel spectrogram parameters used are those defined in params_data.py.

The returned ranges may be indexing further than the length of the waveform. It is recommended that you pad the waveform with zeros up to wave_slices[-1].stop. Parameters ---------- n_samples : int

the number of samples in the waveform.

partial_utterance_n_framesint: the number of mel spectrogram frames in each partial utterance.
min_pad_coverageint: when reaching the last partial utterance, it may or may not have enough frames. If at least <min_pad_coverage> of <partial_utterance_n_frames> are present, then the last partial utterance will be considered, as if we padded the audio. Otherwise, it will be discarded, as if we trimmed the audio. If there aren't enough frames for 1 partial utterance, this parameter is ignored so that the function always returns at least 1 slice.
overlapfloat: by how much the partial utterance should overlap. If set to 0, the partial utterances are entirely disjoint.

Returns

the waveform slices and mel spectrogram slices as lists of array slices. Index respectively the waveform and the mel spectrogram with these slices to obtain the partialutterances.

paddlespeech.vector.exps.ge2e.audio_processor.normalize_volume(wav, target_dBFS, increase_only=False, decrease_only=False)[source]

paddlespeech.vector.exps.ge2e.audio_processor.trim_long_silences(wav, vad_window_length: int, vad_moving_average_width: int, vad_max_silence_length: int, sampling_rate: int)[source]

Ensures that segments without voice in the waveform remain no longer than a threshold determined by the VAD parameters in params.py. Parameters ---------- wav : np.array

the raw waveform as a numpy array of floats

Returns

np.array: the same waveform with silences trimmed away (length <= original wav length)