paddleaudio.backends.soundfile_backend module
- paddleaudio.backends.soundfile_backend.info(filepath: str, format: Optional[str] = None) AudioInfo [source]
Get signal information of an audio file.
- Note:
filepath
argument is intentionally annotated asstr
only, even though it acceptspathlib.Path
object as well. This is for the consistency with"sox_io"
backend,- Args:
- filepath (path-like object or file-like object):
Source of audio data.
- format (str or None, optional):
Not used. PySoundFile does not accept format hint.
- Returns:
AudioInfo: meta data of the given audio.
- paddleaudio.backends.soundfile_backend.load(filepath: str, frame_offset: int = 0, num_frames: int = -1, normalize: bool = True, channels_first: bool = True, format: Optional[str] = None) Tuple[Tensor, int] [source]
Load audio data from file.
- Note:
The formats this function can handle depend on the soundfile installation. This function is tested on the following formats;
WAV
32-bit floating-point
32-bit signed integer
16-bit signed integer
8-bit unsigned integer
FLAC
OGG/VORBIS
SPHERE
By default (
normalize=True
,channels_first=True
), this function returns Tensor withfloat32
dtype and the shape of [channel, time]. The samples are normalized to fit in the range of[-1.0, 1.0]
.When the input format is WAV with integer type, such as 32-bit signed integer, 16-bit signed integer and 8-bit unsigned integer (24-bit signed integer is not supported), by providing
normalize=False
, this function can return integer Tensor, where the samples are expressed within the whole range of the corresponding dtype, that is,int32
tensor for 32-bit signed PCM,int16
for 16-bit signed PCM anduint8
for 8-bit unsigned PCM.normalize
parameter has no effect on 32-bit floating-point WAV and other formats, such asflac
andmp3
. For these formats, this function always returnsfloat32
Tensor with values normalized to[-1.0, 1.0]
.- Note:
filepath
argument is intentionally annotated asstr
only, even though it acceptspathlib.Path
object as well. This is for the consistency with"sox_io"
backend.- Args:
- filepath (path-like object or file-like object):
Source of audio data.
- frame_offset (int, optional):
Number of frames to skip before start reading data.
- num_frames (int, optional):
Maximum number of frames to read.
-1
reads all the remaining samples, starting fromframe_offset
. This function may return the less number of frames if there is not enough frames in the given file.- normalize (bool, optional):
When
True
, this function always returnfloat32
, and sample values are normalized to[-1.0, 1.0]
. If input file is integer WAV, givingFalse
will change the resulting Tensor type to integer type. This argument has no effect for formats other than integer WAV type.- channels_first (bool, optional):
When True, the returned Tensor has dimension [channel, time]. Otherwise, the returned Tensor's dimension is [time, channel].
- format (str or None, optional):
Not used. PySoundFile does not accept format hint.
- Returns:
- (paddle.Tensor, int): Resulting Tensor and sample rate.
If the input file has integer wav format and normalization is off, then it has integer type, else
float32
type. Ifchannels_first=True
, it has [channel, time] else [time, channel].
- paddleaudio.backends.soundfile_backend.normalize(y: ndarray, norm_type: str = 'linear', mul_factor: float = 1.0) ndarray [source]
Normalize an input audio with additional multiplier.
- Args:
y (np.ndarray): Input waveform array in 1D or 2D. norm_type (str, optional): Type of normalization. Defaults to 'linear'. mul_factor (float, optional): Scaling factor. Defaults to 1.0.
- Returns:
np.ndarray: y after normalization.
- paddleaudio.backends.soundfile_backend.resample(y: ndarray, src_sr: int, target_sr: int, mode: str = 'kaiser_fast') ndarray [source]
Audio resampling.
- Args:
y (np.ndarray): Input waveform array in 1D or 2D. src_sr (int): Source sample rate. target_sr (int): Target sample rate. mode (str, optional): The resampling filter to use. Defaults to 'kaiser_fast'.
- Returns:
np.ndarray: y resampled to target_sr
- paddleaudio.backends.soundfile_backend.save(filepath: str, src: Tensor, sample_rate: int, channels_first: bool = True, compression: Optional[float] = None, format: Optional[str] = None, encoding: Optional[str] = None, bits_per_sample: Optional[int] = None)[source]
Save audio data to file.
- Note:
The formats this function can handle depend on the soundfile installation. This function is tested on the following formats;
WAV
32-bit floating-point
32-bit signed integer
16-bit signed integer
8-bit unsigned integer
FLAC
OGG/VORBIS
SPHERE
- Note:
filepath
argument is intentionally annotated asstr
only, even though it acceptspathlib.Path
object as well. This is for the consistency with"sox_io"
backend,- Args:
filepath (str or pathlib.Path): Path to audio file. src (paddle.Tensor): Audio data to save. must be 2D tensor. sample_rate (int): sampling rate channels_first (bool, optional): If
True
, the given tensor is interpreted as [channel, time],otherwise [time, channel].
- compression (float of None, optional): Not used.
It is here only for interface compatibility reson with "sox_io" backend.
- format (str or None, optional): Override the audio format.
When
filepath
argument is path-like object, audio format is inferred from file extension. If the file extension is missing or different, you can specify the correct format with this argument.When
filepath
argument is file-like object, this argument is required.Valid values are
"wav"
,"ogg"
,"vorbis"
,"flac"
and"sph"
.- encoding (str or None, optional): Changes the encoding for supported formats.
This argument is effective only for supported formats, sush as
"wav"
,""flac"
and"sph"
. Valid values are;"PCM_S"
(signed integer Linear PCM)"PCM_U"
(unsigned integer Linear PCM)"PCM_F"
(floating point PCM)"ULAW"
(mu-law)"ALAW"
(a-law)
- bits_per_sample (int or None, optional): Changes the bit depth for the
supported formats. When
format
is one of"wav"
,"flac"
or"sph"
, you can change the bit depth. Valid values are8
,16
,24
,32
and64
.
Supported formats/encodings/bit depth/compression are:
"wav"
32-bit floating-point PCM
32-bit signed integer PCM
24-bit signed integer PCM
16-bit signed integer PCM
8-bit unsigned integer PCM
8-bit mu-law
8-bit a-law
- Note:
Default encoding/bit depth is determined by the dtype of the input Tensor.
"flac"
8-bit
16-bit (default)
24-bit
"ogg"
,"vorbis"
Doesn't accept changing configuration.
"sph"
8-bit signed integer PCM
16-bit signed integer PCM
24-bit signed integer PCM
32-bit signed integer PCM (default)
8-bit mu-law
8-bit a-law
16-bit a-law
24-bit a-law
32-bit a-law
- paddleaudio.backends.soundfile_backend.soundfile_load(file: PathLike, sr: Optional[int] = None, mono: bool = True, merge_type: str = 'average', normal: bool = True, norm_type: str = 'linear', norm_mul_factor: float = 1.0, offset: float = 0.0, duration: Optional[int] = None, dtype: str = 'float32', resample_mode: str = 'kaiser_fast') Tuple[ndarray, int] [source]
Load audio file from disk. This function loads audio from disk using using audio beackend.
- Args:
file (os.PathLike): Path of auido file to load. sr (Optional[int], optional): Sample rate of loaded waveform. Defaults to None. mono (bool, optional): Return waveform with mono channel. Defaults to True. merge_type (str, optional): Merge type of multi-channels waveform. Defaults to 'average'. normal (bool, optional): Waveform normalization. Defaults to True. norm_type (str, optional): Type of normalization. Defaults to 'linear'. norm_mul_factor (float, optional): Scaling factor. Defaults to 1.0. offset (float, optional): Offset to the start of waveform. Defaults to 0.0. duration (Optional[int], optional): Duration of waveform to read. Defaults to None. dtype (str, optional): Data type of waveform. Defaults to 'float32'. resample_mode (str, optional): The resampling filter to use. Defaults to 'kaiser_fast'.
- Returns:
Tuple[np.ndarray, int]: Waveform in ndarray and its samplerate.
- paddleaudio.backends.soundfile_backend.soundfile_save(y: ndarray, sr: int, file: PathLike) None [source]
Save audio file to disk. This function saves audio to disk using scipy.io.wavfile, with additional step to convert input waveform to int16.
- Args:
y (np.ndarray): Input waveform array in 1D or 2D. sr (int): Sample rate. file (os.PathLike): Path of auido file to save.
- paddleaudio.backends.soundfile_backend.to_mono(y: ndarray, merge_type: str = 'average') ndarray [source]
Convert sterior audio to mono.
- Args:
y (np.ndarray): Input waveform array in 1D or 2D. merge_type (str, optional): Merge type to generate mono waveform. Defaults to 'average'.
- Returns:
np.ndarray: y with mono channel.