paddlespeech.vector.exps.ge2e.audio_processor module
- class paddlespeech.vector.exps.ge2e.audio_processor.SpeakerVerificationPreprocessor(sampling_rate: int, audio_norm_target_dBFS: float, vad_window_length, vad_moving_average_width, vad_max_silence_length, mel_window_length, mel_window_step, n_mels, partial_n_frames: int, min_pad_coverage: float = 0.75, partial_overlap_ratio: float = 0.5)[source]
Bases:
object
Methods
extract_mel_partials
melspectrogram
preprocess_wav
- paddlespeech.vector.exps.ge2e.audio_processor.compute_partial_slices(n_samples: int, partial_utterance_n_frames: int, hop_length: int, min_pad_coverage: float = 0.75, overlap: float = 0.5)[source]
Computes where to split an utterance waveform and its corresponding mel spectrogram to obtain partial utterances of <partial_utterance_n_frames> each. Both the waveform and the mel spectrogram slices are returned, so as to make each partial utterance waveform correspond to its spectrogram. This function assumes that the mel spectrogram parameters used are those defined in params_data.py.
The returned ranges may be indexing further than the length of the waveform. It is recommended that you pad the waveform with zeros up to wave_slices[-1].stop. Parameters ---------- n_samples : int
the number of samples in the waveform.
- partial_utterance_n_framesint
the number of mel spectrogram frames in each partial utterance.
- min_pad_coverageint
when reaching the last partial utterance, it may or may not have enough frames. If at least <min_pad_coverage> of <partial_utterance_n_frames> are present, then the last partial utterance will be considered, as if we padded the audio. Otherwise, it will be discarded, as if we trimmed the audio. If there aren't enough frames for 1 partial utterance, this parameter is ignored so that the function always returns at least 1 slice.
- overlapfloat
by how much the partial utterance should overlap. If set to 0, the partial utterances are entirely disjoint.
Returns
the waveform slices and mel spectrogram slices as lists of array slices. Index respectively the waveform and the mel spectrogram with these slices to obtain the partialutterances.
- paddlespeech.vector.exps.ge2e.audio_processor.normalize_volume(wav, target_dBFS, increase_only=False, decrease_only=False)[source]
- paddlespeech.vector.exps.ge2e.audio_processor.trim_long_silences(wav, vad_window_length: int, vad_moving_average_width: int, vad_max_silence_length: int, sampling_rate: int)[source]
Ensures that segments without voice in the waveform remain no longer than a threshold determined by the VAD parameters in params.py. Parameters ---------- wav : np.array
the raw waveform as a numpy array of floats
Returns
- np.array
the same waveform with silences trimmed away (length <= original wav length)