paddleaudio.compliance.kaldi module

paddleaudio.compliance.kaldi.fbank(waveform: Tensor, blackman_coeff: float = 0.42, channel: int = -1, dither: float = 0.0, energy_floor: float = 1.0, frame_length: float = 25.0, frame_shift: float = 10.0, high_freq: float = 0.0, htk_compat: bool = False, low_freq: float = 20.0, n_mels: int = 23, preemphasis_coefficient: float = 0.97, raw_energy: bool = True, remove_dc_offset: bool = True, round_to_power_of_two: bool = True, sr: int = 16000, snip_edges: bool = True, subtract_mean: bool = False, use_energy: bool = False, use_log_fbank: bool = True, use_power: bool = True, vtln_high: float = -500.0, vtln_low: float = 100.0, vtln_warp: float = 1.0, window_type: str = 'povey') → Tensor[source]

Compute and return filter banks from a waveform. The output is identical to Kaldi's.

Args:

waveform (Tensor): A waveform tensor with shape (C, T). C is in the range [0,1]. blackman_coeff (float, optional): Coefficient for Blackman window.. Defaults to 0.42. channel (int, optional): Select the channel of waveform. Defaults to -1. dither (float, optional): Dithering constant . Defaults to 0.0. energy_floor (float, optional): Floor on energy of the output Spectrogram. Defaults to 1.0. frame_length (float, optional): Frame length in milliseconds. Defaults to 25.0. frame_shift (float, optional): Shift between adjacent frames in milliseconds. Defaults to 10.0. high_freq (float, optional): The upper cut-off frequency. Defaults to 0.0. htk_compat (bool, optional): Put energy to the last when it is set True. Defaults to False. low_freq (float, optional): The lower cut-off frequency. Defaults to 20.0. n_mels (int, optional): Number of output mel bins. Defaults to 23. preemphasis_coefficient (float, optional): Preemphasis coefficient for input waveform. Defaults to 0.97. raw_energy (bool, optional): Whether to compute before preemphasis and windowing. Defaults to True. remove_dc_offset (bool, optional): Whether to subtract mean from waveform on frames. Defaults to True. round_to_power_of_two (bool, optional): If True, round window size to power of two by zero-padding input

to FFT. Defaults to True.

sr (int, optional): Sample rate of input waveform. Defaults to 16000. snip_edges (bool, optional): Drop samples in the end of waveform that cann't fit a singal frame when it

is set True. Otherwise performs reflect padding to the end of waveform. Defaults to True.

subtract_mean (bool, optional): Whether to subtract mean of feature files. Defaults to False. use_energy (bool, optional): Add an dimension with energy of spectrogram to the output. Defaults to False. use_log_fbank (bool, optional): Return log fbank when it is set True. Defaults to True. use_power (bool, optional): Whether to use power instead of magnitude. Defaults to True. vtln_high (float, optional): High inflection point in piecewise linear VTLN warping function. Defaults to -500.0. vtln_low (float, optional): Low inflection point in piecewise linear VTLN warping function. Defaults to 100.0. vtln_warp (float, optional): Vtln warp factor. Defaults to 1.0. window_type (str, optional): Choose type of window for FFT computation. Defaults to "povey".

Returns:

Tensor: A filter banks tensor with shape (m, n_mels).

paddleaudio.compliance.kaldi.mfcc(waveform: Tensor, blackman_coeff: float = 0.42, cepstral_lifter: float = 22.0, channel: int = -1, dither: float = 0.0, energy_floor: float = 1.0, frame_length: float = 25.0, frame_shift: float = 10.0, high_freq: float = 0.0, htk_compat: bool = False, low_freq: float = 20.0, n_mfcc: int = 13, n_mels: int = 23, preemphasis_coefficient: float = 0.97, raw_energy: bool = True, remove_dc_offset: bool = True, round_to_power_of_two: bool = True, sr: int = 16000, snip_edges: bool = True, subtract_mean: bool = False, use_energy: bool = False, vtln_high: float = -500.0, vtln_low: float = 100.0, vtln_warp: float = 1.0, window_type: str = 'povey') → Tensor[source]

Compute and return mel frequency cepstral coefficients from a waveform. The output is

identical to Kaldi's.

Args:

waveform (Tensor): A waveform tensor with shape (C, T). blackman_coeff (float, optional): Coefficient for Blackman window.. Defaults to 0.42. cepstral_lifter (float, optional): Scaling of output mfccs. Defaults to 22.0. channel (int, optional): Select the channel of waveform. Defaults to -1. dither (float, optional): Dithering constant . Defaults to 0.0. energy_floor (float, optional): Floor on energy of the output Spectrogram. Defaults to 1.0. frame_length (float, optional): Frame length in milliseconds. Defaults to 25.0. frame_shift (float, optional): Shift between adjacent frames in milliseconds. Defaults to 10.0. high_freq (float, optional): The upper cut-off frequency. Defaults to 0.0. htk_compat (bool, optional): Put energy to the last when it is set True. Defaults to False. low_freq (float, optional): The lower cut-off frequency. Defaults to 20.0. n_mfcc (int, optional): Number of cepstra in MFCC. Defaults to 13. n_mels (int, optional): Number of output mel bins. Defaults to 23. preemphasis_coefficient (float, optional): Preemphasis coefficient for input waveform. Defaults to 0.97. raw_energy (bool, optional): Whether to compute before preemphasis and windowing. Defaults to True. remove_dc_offset (bool, optional): Whether to subtract mean from waveform on frames. Defaults to True. round_to_power_of_two (bool, optional): If True, round window size to power of two by zero-padding input

to FFT. Defaults to True.

sr (int, optional): Sample rate of input waveform. Defaults to 16000. snip_edges (bool, optional): Drop samples in the end of waveform that cann't fit a singal frame when it

is set True. Otherwise performs reflect padding to the end of waveform. Defaults to True.

subtract_mean (bool, optional): Whether to subtract mean of feature files. Defaults to False. use_energy (bool, optional): Add an dimension with energy of spectrogram to the output. Defaults to False. vtln_high (float, optional): High inflection point in piecewise linear VTLN warping function. Defaults to -500.0. vtln_low (float, optional): Low inflection point in piecewise linear VTLN warping function. Defaults to 100.0. vtln_warp (float, optional): Vtln warp factor. Defaults to 1.0. window_type (str, optional): Choose type of window for FFT computation. Defaults to POVEY.

Returns:

Tensor: A mel frequency cepstral coefficients tensor with shape (m, n_mfcc).

paddleaudio.compliance.kaldi.spectrogram(waveform: Tensor, blackman_coeff: float = 0.42, channel: int = -1, dither: float = 0.0, energy_floor: float = 1.0, frame_length: float = 25.0, frame_shift: float = 10.0, preemphasis_coefficient: float = 0.97, raw_energy: bool = True, remove_dc_offset: bool = True, round_to_power_of_two: bool = True, sr: int = 16000, snip_edges: bool = True, subtract_mean: bool = False, window_type: str = 'povey') → Tensor[source]

Compute and return a spectrogram from a waveform. The output is identical to Kaldi's.

Args:

waveform (Tensor): A waveform tensor with shape (C, T). blackman_coeff (float, optional): Coefficient for Blackman window.. Defaults to 0.42. channel (int, optional): Select the channel of waveform. Defaults to -1. dither (float, optional): Dithering constant . Defaults to 0.0. energy_floor (float, optional): Floor on energy of the output Spectrogram. Defaults to 1.0. frame_length (float, optional): Frame length in milliseconds. Defaults to 25.0. frame_shift (float, optional): Shift between adjacent frames in milliseconds. Defaults to 10.0. preemphasis_coefficient (float, optional): Preemphasis coefficient for input waveform. Defaults to 0.97. raw_energy (bool, optional): Whether to compute before preemphasis and windowing. Defaults to True. remove_dc_offset (bool, optional): Whether to subtract mean from waveform on frames. Defaults to True. round_to_power_of_two (bool, optional): If True, round window size to power of two by zero-padding input

to FFT. Defaults to True.

sr (int, optional): Sample rate of input waveform. Defaults to 16000. snip_edges (bool, optional): Drop samples in the end of waveform that cann't fit a singal frame when it

is set True. Otherwise performs reflect padding to the end of waveform. Defaults to True.

subtract_mean (bool, optional): Whether to subtract mean of feature files. Defaults to False. window_type (str, optional): Choose type of window for FFT computation. Defaults to "povey".

Returns:

Tensor: A spectrogram tensor with shape (m, padded_window_size // 2 + 1) where m is the number of frames: depends on frame_length and frame_shift.