paddleaudio.backends.soundfile_backend module

paddleaudio.backends.soundfile_backend.info(filepath: str, format: Optional[str] = None) AudioInfo[source]

Get signal information of an audio file.

Note:

filepath argument is intentionally annotated as str only, even though it accepts pathlib.Path object as well. This is for the consistency with "sox_io" backend,

Args:
filepath (path-like object or file-like object):

Source of audio data.

format (str or None, optional):

Not used. PySoundFile does not accept format hint.

Returns:

AudioInfo: meta data of the given audio.

paddleaudio.backends.soundfile_backend.load(filepath: str, frame_offset: int = 0, num_frames: int = -1, normalize: bool = True, channels_first: bool = True, format: Optional[str] = None) Tuple[Tensor, int][source]

Load audio data from file.

Note:

The formats this function can handle depend on the soundfile installation. This function is tested on the following formats;

  • WAV

    • 32-bit floating-point

    • 32-bit signed integer

    • 16-bit signed integer

    • 8-bit unsigned integer

  • FLAC

  • OGG/VORBIS

  • SPHERE

By default (normalize=True, channels_first=True), this function returns Tensor with float32 dtype and the shape of [channel, time]. The samples are normalized to fit in the range of [-1.0, 1.0].

When the input format is WAV with integer type, such as 32-bit signed integer, 16-bit signed integer and 8-bit unsigned integer (24-bit signed integer is not supported), by providing normalize=False, this function can return integer Tensor, where the samples are expressed within the whole range of the corresponding dtype, that is, int32 tensor for 32-bit signed PCM, int16 for 16-bit signed PCM and uint8 for 8-bit unsigned PCM.

normalize parameter has no effect on 32-bit floating-point WAV and other formats, such as flac and mp3. For these formats, this function always returns float32 Tensor with values normalized to [-1.0, 1.0].

Note:

filepath argument is intentionally annotated as str only, even though it accepts pathlib.Path object as well. This is for the consistency with "sox_io" backend.

Args:
filepath (path-like object or file-like object):

Source of audio data.

frame_offset (int, optional):

Number of frames to skip before start reading data.

num_frames (int, optional):

Maximum number of frames to read. -1 reads all the remaining samples, starting from frame_offset. This function may return the less number of frames if there is not enough frames in the given file.

normalize (bool, optional):

When True, this function always return float32, and sample values are normalized to [-1.0, 1.0]. If input file is integer WAV, giving False will change the resulting Tensor type to integer type. This argument has no effect for formats other than integer WAV type.

channels_first (bool, optional):

When True, the returned Tensor has dimension [channel, time]. Otherwise, the returned Tensor's dimension is [time, channel].

format (str or None, optional):

Not used. PySoundFile does not accept format hint.

Returns:
(paddle.Tensor, int): Resulting Tensor and sample rate.

If the input file has integer wav format and normalization is off, then it has integer type, else float32 type. If channels_first=True, it has [channel, time] else [time, channel].

paddleaudio.backends.soundfile_backend.normalize(y: ndarray, norm_type: str = 'linear', mul_factor: float = 1.0) ndarray[source]

Normalize an input audio with additional multiplier.

Args:

y (np.ndarray): Input waveform array in 1D or 2D. norm_type (str, optional): Type of normalization. Defaults to 'linear'. mul_factor (float, optional): Scaling factor. Defaults to 1.0.

Returns:

np.ndarray: y after normalization.

paddleaudio.backends.soundfile_backend.resample(y: ndarray, src_sr: int, target_sr: int, mode: str = 'kaiser_fast') ndarray[source]

Audio resampling.

Args:

y (np.ndarray): Input waveform array in 1D or 2D. src_sr (int): Source sample rate. target_sr (int): Target sample rate. mode (str, optional): The resampling filter to use. Defaults to 'kaiser_fast'.

Returns:

np.ndarray: y resampled to target_sr

paddleaudio.backends.soundfile_backend.save(filepath: str, src: Tensor, sample_rate: int, channels_first: bool = True, compression: Optional[float] = None, format: Optional[str] = None, encoding: Optional[str] = None, bits_per_sample: Optional[int] = None)[source]

Save audio data to file.

Note:

The formats this function can handle depend on the soundfile installation. This function is tested on the following formats;

  • WAV

    • 32-bit floating-point

    • 32-bit signed integer

    • 16-bit signed integer

    • 8-bit unsigned integer

  • FLAC

  • OGG/VORBIS

  • SPHERE

Note:

filepath argument is intentionally annotated as str only, even though it accepts pathlib.Path object as well. This is for the consistency with "sox_io" backend,

Args:

filepath (str or pathlib.Path): Path to audio file. src (paddle.Tensor): Audio data to save. must be 2D tensor. sample_rate (int): sampling rate channels_first (bool, optional): If True, the given tensor is interpreted as [channel, time],

otherwise [time, channel].

compression (float of None, optional): Not used.

It is here only for interface compatibility reson with "sox_io" backend.

format (str or None, optional): Override the audio format.

When filepath argument is path-like object, audio format is inferred from file extension. If the file extension is missing or different, you can specify the correct format with this argument.

When filepath argument is file-like object, this argument is required.

Valid values are "wav", "ogg", "vorbis", "flac" and "sph".

encoding (str or None, optional): Changes the encoding for supported formats.

This argument is effective only for supported formats, sush as "wav", ""flac" and "sph". Valid values are;

  • "PCM_S" (signed integer Linear PCM)

  • "PCM_U" (unsigned integer Linear PCM)

  • "PCM_F" (floating point PCM)

  • "ULAW" (mu-law)

  • "ALAW" (a-law)

bits_per_sample (int or None, optional): Changes the bit depth for the

supported formats. When format is one of "wav", "flac" or "sph", you can change the bit depth. Valid values are 8, 16, 24, 32 and 64.

Supported formats/encodings/bit depth/compression are:

"wav"
  • 32-bit floating-point PCM

  • 32-bit signed integer PCM

  • 24-bit signed integer PCM

  • 16-bit signed integer PCM

  • 8-bit unsigned integer PCM

  • 8-bit mu-law

  • 8-bit a-law

Note:

Default encoding/bit depth is determined by the dtype of the input Tensor.

"flac"
  • 8-bit

  • 16-bit (default)

  • 24-bit

"ogg", "vorbis"
  • Doesn't accept changing configuration.

"sph"
  • 8-bit signed integer PCM

  • 16-bit signed integer PCM

  • 24-bit signed integer PCM

  • 32-bit signed integer PCM (default)

  • 8-bit mu-law

  • 8-bit a-law

  • 16-bit a-law

  • 24-bit a-law

  • 32-bit a-law

paddleaudio.backends.soundfile_backend.soundfile_load(file: PathLike, sr: Optional[int] = None, mono: bool = True, merge_type: str = 'average', normal: bool = True, norm_type: str = 'linear', norm_mul_factor: float = 1.0, offset: float = 0.0, duration: Optional[int] = None, dtype: str = 'float32', resample_mode: str = 'kaiser_fast') Tuple[ndarray, int][source]

Load audio file from disk. This function loads audio from disk using using audio beackend.

Args:

file (os.PathLike): Path of auido file to load. sr (Optional[int], optional): Sample rate of loaded waveform. Defaults to None. mono (bool, optional): Return waveform with mono channel. Defaults to True. merge_type (str, optional): Merge type of multi-channels waveform. Defaults to 'average'. normal (bool, optional): Waveform normalization. Defaults to True. norm_type (str, optional): Type of normalization. Defaults to 'linear'. norm_mul_factor (float, optional): Scaling factor. Defaults to 1.0. offset (float, optional): Offset to the start of waveform. Defaults to 0.0. duration (Optional[int], optional): Duration of waveform to read. Defaults to None. dtype (str, optional): Data type of waveform. Defaults to 'float32'. resample_mode (str, optional): The resampling filter to use. Defaults to 'kaiser_fast'.

Returns:

Tuple[np.ndarray, int]: Waveform in ndarray and its samplerate.

paddleaudio.backends.soundfile_backend.soundfile_save(y: ndarray, sr: int, file: PathLike) None[source]

Save audio file to disk. This function saves audio to disk using scipy.io.wavfile, with additional step to convert input waveform to int16.

Args:

y (np.ndarray): Input waveform array in 1D or 2D. sr (int): Sample rate. file (os.PathLike): Path of auido file to save.

paddleaudio.backends.soundfile_backend.to_mono(y: ndarray, merge_type: str = 'average') ndarray[source]

Convert sterior audio to mono.

Args:

y (np.ndarray): Input waveform array in 1D or 2D. merge_type (str, optional): Merge type to generate mono waveform. Defaults to 'average'.

Returns:

np.ndarray: y with mono channel.