paddleaudio.functional.functional module

paddleaudio.functional.functional.compute_fbank_matrix(sr: int, n_fft: int, n_mels: int = 64, f_min: float = 0.0, f_max: Optional[float] = None, htk: bool = False, norm: Union[str, float] = 'slaney', dtype: str = 'float32') Tensor[source]

Compute fbank matrix.

Args:

sr (int): Sample rate. n_fft (int): Number of fft bins. n_mels (int, optional): Number of mel bins. Defaults to 64. f_min (float, optional): Minimum frequency in Hz. Defaults to 0.0. f_max (Optional[float], optional): Maximum frequency in Hz. Defaults to None. htk (bool, optional): Use htk scaling. Defaults to False. norm (Union[str, float], optional): Type of normalization. Defaults to 'slaney'. dtype (str, optional): The data type of the return matrix. Defaults to 'float32'.

Returns:

Tensor: Mel transform matrix with shape (n_mels, n_fft//2 + 1).

paddleaudio.functional.functional.create_dct(n_mfcc: int, n_mels: int, norm: Optional[str] = 'ortho', dtype: str = 'float32') Tensor[source]

Create a discrete cosine transform(DCT) matrix.

Args:

n_mfcc (int): Number of mel frequency cepstral coefficients. n_mels (int): Number of mel filterbanks. norm (Optional[str], optional): Normalizaiton type. Defaults to 'ortho'. dtype (str, optional): The data type of the return matrix. Defaults to 'float32'.

Returns:

Tensor: The DCT matrix with shape (n_mels, n_mfcc).

paddleaudio.functional.functional.fft_frequencies(sr: int, n_fft: int, dtype: str = 'float32') Tensor[source]

Compute fourier frequencies.

Args:

sr (int): Sample rate. n_fft (int): Number of fft bins. dtype (str, optional): The data type of the return frequencies. Defaults to 'float32'.

Returns:

Tensor: FFT frequencies in Hz with shape (n_fft//2 + 1,).

paddleaudio.functional.functional.hz_to_mel(freq: Union[Tensor, float], htk: bool = False) Union[Tensor, float][source]

Convert Hz to Mels.

Args:

freq (Union[Tensor, float]): The input tensor with arbitrary shape. htk (bool, optional): Use htk scaling. Defaults to False.

Returns:

Union[Tensor, float]: Frequency in mels.

paddleaudio.functional.functional.mel_frequencies(n_mels: int = 64, f_min: float = 0.0, f_max: float = 11025.0, htk: bool = False, dtype: str = 'float32') Tensor[source]

Compute mel frequencies.

Args:

n_mels (int, optional): Number of mel bins. Defaults to 64. f_min (float, optional): Minimum frequency in Hz. Defaults to 0.0. fmax (float, optional): Maximum frequency in Hz. Defaults to 11025.0. htk (bool, optional): Use htk scaling. Defaults to False. dtype (str, optional): The data type of the return frequencies. Defaults to 'float32'.

Returns:

Tensor: Tensor of n_mels frequencies in Hz with shape (n_mels,).

paddleaudio.functional.functional.mel_to_hz(mel: Union[float, Tensor], htk: bool = False) Union[float, Tensor][source]

Convert mel bin numbers to frequencies.

Args:

mel (Union[float, Tensor]): The mel frequency represented as a tensor with arbitrary shape. htk (bool, optional): Use htk scaling. Defaults to False.

Returns:

Union[float, Tensor]: Frequencies in Hz.

paddleaudio.functional.functional.power_to_db(spect: Tensor, ref_value: float = 1.0, amin: float = 1e-10, top_db: Optional[float] = None) Tensor[source]

Convert a power spectrogram (amplitude squared) to decibel (dB) units. The function computes the scaling 10 * log10(x / ref) in a numerically stable way.

Args:

spect (Tensor): STFT power spectrogram. ref_value (float, optional): The reference value. If smaller than 1.0, the db level of the signal will be pulled up accordingly. Otherwise, the db level is pushed down. Defaults to 1.0. amin (float, optional): Minimum threshold. Defaults to 1e-10. top_db (Optional[float], optional): Threshold the output at top_db below the peak. Defaults to None.

Returns:

Tensor: Power spectrogram in db scale.