paddlespeech.vector.exps.ge2e.audio_processor module

class paddlespeech.vector.exps.ge2e.audio_processor.SpeakerVerificationPreprocessor(sampling_rate: int, audio_norm_target_dBFS: float, vad_window_length, vad_moving_average_width, vad_max_silence_length, mel_window_length, mel_window_step, n_mels, partial_n_frames: int, min_pad_coverage: float = 0.75, partial_overlap_ratio: float = 0.5)[source]

Bases: object

Methods

extract_mel_partials

melspectrogram

preprocess_wav

extract_mel_partials(wav)[source]
melspectrogram(wav)[source]
preprocess_wav(fpath_or_wav, source_sr=None)[source]
paddlespeech.vector.exps.ge2e.audio_processor.compute_partial_slices(n_samples: int, partial_utterance_n_frames: int, hop_length: int, min_pad_coverage: float = 0.75, overlap: float = 0.5)[source]

Computes where to split an utterance waveform and its corresponding mel spectrogram to obtain partial utterances of <partial_utterance_n_frames> each. Both the waveform and the mel spectrogram slices are returned, so as to make each partial utterance waveform correspond to its spectrogram. This function assumes that the mel spectrogram parameters used are those defined in params_data.py.

The returned ranges may be indexing further than the length of the waveform. It is recommended that you pad the waveform with zeros up to wave_slices[-1].stop. Parameters ---------- n_samples : int

the number of samples in the waveform.

partial_utterance_n_framesint

the number of mel spectrogram frames in each partial utterance.

min_pad_coverageint

when reaching the last partial utterance, it may or may not have enough frames. If at least <min_pad_coverage> of <partial_utterance_n_frames> are present, then the last partial utterance will be considered, as if we padded the audio. Otherwise, it will be discarded, as if we trimmed the audio. If there aren't enough frames for 1 partial utterance, this parameter is ignored so that the function always returns at least 1 slice.

overlapfloat

by how much the partial utterance should overlap. If set to 0, the partial utterances are entirely disjoint.

Returns

the waveform slices and mel spectrogram slices as lists of array slices. Index respectively the waveform and the mel spectrogram with these slices to obtain the partialutterances.

paddlespeech.vector.exps.ge2e.audio_processor.normalize_volume(wav, target_dBFS, increase_only=False, decrease_only=False)[source]
paddlespeech.vector.exps.ge2e.audio_processor.trim_long_silences(wav, vad_window_length: int, vad_moving_average_width: int, vad_max_silence_length: int, sampling_rate: int)[source]

Ensures that segments without voice in the waveform remain no longer than a threshold determined by the VAD parameters in params.py. Parameters ---------- wav : np.array

the raw waveform as a numpy array of floats

Returns

np.array

the same waveform with silences trimmed away (length <= original wav length)