paddlespeech.s2t.frontend.audio module
Contains the audio segment class.
- class paddlespeech.s2t.frontend.audio.AudioSegment(samples, sample_rate)[source]
Bases:
object
Monaural audio segment abstraction.
- Parameters:
samples (ndarray.float32) -- Audio samples [num_samples x num_channels].
sample_rate (int) -- Audio sample rate.
- Raises:
TypeError -- If the sample data type is not float or int.
- Attributes:
duration
Return audio duration.
num_samples
Return number of samples.
rms_db
Return root mean square energy of the audio in decibels.
sample_rate
Return audio sample rate.
samples
Return audio samples.
Methods
add_noise
(noise, snr_dB[, ...])Add the given noise segment at a specific signal-to-noise ratio.
change_speed
(speed_rate)Change the audio speed by linear interpolation.
concatenate
(*segments)Concatenate an arbitrary number of audio segments together.
convolve
(impulse_segment[, allow_resample])Convolve this audio segment with the given impulse segment.
convolve_and_normalize
(impulse_segment[, ...])Convolve and normalize the resulting audio segment so that it has the same average power as the input signal.
from_bytes
(bytes)Create audio segment from a byte string containing audio samples.
from_file
(file[, infos])Create audio segment from audio file.
from_pcm
(samples, sample_rate)Create audio segment from a byte string containing audio samples.
from_sequence_file
(filepath)Create audio segment from sequence file.
gain_db
(gain)Apply gain in decibels to samples.
make_silence
(duration, sample_rate)Creates a silent audio segment of the given duration and sample rate.
normalize
([target_db, max_gain_db])Normalize audio to be of the desired RMS value in decibels.
normalize_online_bayesian
(target_db, ...[, ...])Normalize audio using a production-compatible online/causal algorithm.
pad_silence
(duration[, sides])Pad this audio sample with a period of silence.
random_subsegment
(subsegment_length[, rng])Cut the specified length of the audiosegment randomly.
resample
(target_sample_rate[, filter])Resample the audio to a target sample rate.
shift
(shift_ms)Shift the audio in time.
slice_from_file
(file[, start, end])Loads a small section of an audio without having to load the entire file into the memory which can be incredibly wasteful.
subsegment
([start_sec, end_sec])Cut the AudioSegment between given boundaries.
superimpose
(other)Add samples from another segment to those of this segment (sample-wise addition, not segment concatenation).
to
([dtype])Create a dtype audio content.
to_bytes
([dtype])Create a byte string containing the audio content.
to_wav_file
(filepath[, dtype])Save audio segment to disk as wav file.
- add_noise(noise, snr_dB, allow_downsampling=False, max_gain_db=300.0, rng=None)[source]
Add the given noise segment at a specific signal-to-noise ratio. If the noise segment is longer than this segment, a random subsegment of matching length is sampled from it and used instead.
Note that this is an in-place transformation.
- Parameters:
noise (AudioSegment) -- Noise signal to add.
snr_dB (float) -- Signal-to-Noise Ratio, in decibels.
allow_downsampling (bool) -- Whether to allow the noise signal to be downsampled to match the base signal sample rate.
max_gain_db (float) -- Maximum amount of gain to apply to noise signal before adding it in. This is to prevent attempting to apply infinite gain to a zero signal.
rng (None|random.Random) -- Random number generator state.
- Raises:
ValueError -- If the sample rate does not match between the two audio segments when downsampling is not allowed, or if the duration of noise segments is shorter than original audio segments.
- change_speed(speed_rate)[source]
Change the audio speed by linear interpolation.
Note that this is an in-place transformation.
- Parameters:
speed_rate (float) -- Rate of speed change: speed_rate > 1.0, speed up the audio; speed_rate = 1.0, unchanged; speed_rate < 1.0, slow down the audio; speed_rate <= 0.0, not allowed, raise ValueError.
- Raises:
ValueError -- If speed_rate <= 0.0.
- classmethod concatenate(*segments)[source]
Concatenate an arbitrary number of audio segments together.
- Parameters:
*segments --
Input audio segments to be concatenated.
- Returns:
Audio segment instance as concatenating results.
- Return type:
- Raises:
ValueError -- If the number of segments is zero, or if the sample_rate of any segments does not match.
TypeError -- If any segment is not AudioSegment instance.
- convolve(impulse_segment, allow_resample=False)[source]
Convolve this audio segment with the given impulse segment.
Note that this is an in-place transformation.
- Parameters:
impulse_segment (AudioSegment) -- Impulse response segments.
allow_resample (bool) -- Indicates whether resampling is allowed when the impulse_segment has a different sample rate from this signal.
- Raises:
ValueError -- If the sample rate is not match between two audio segments when resample is not allowed.
- convolve_and_normalize(impulse_segment, allow_resample=False)[source]
Convolve and normalize the resulting audio segment so that it has the same average power as the input signal.
Note that this is an in-place transformation.
- Parameters:
impulse_segment (AudioSegment) -- Impulse response segments.
allow_resample (bool) -- Indicates whether resampling is allowed when the impulse_segment has a different sample rate from this signal.
- property duration
Return audio duration.
- Returns:
Audio duration in seconds.
- Return type:
float
- classmethod from_bytes(bytes)[source]
Create audio segment from a byte string containing audio samples.
- Parameters:
bytes (str) -- Byte string containing audio samples.
- Returns:
Audio segment instance.
- Return type:
- classmethod from_file(file, infos=None)[source]
Create audio segment from audio file.
- Args:
filepath (str|file): Filepath or file object to audio file. infos (TarLocalData, optional): tar2obj and tar2infos. Defaults to None.
- Returns:
AudioSegment: Audio segment instance.
- classmethod from_pcm(samples, sample_rate)[source]
Create audio segment from a byte string containing audio samples. :param samples: Audio samples [num_samples x num_channels]. :type samples: numpy.ndarray :param sample_rate: Audio sample rate. :type sample_rate: int :return: Audio segment instance. :rtype: AudioSegment
- classmethod from_sequence_file(filepath)[source]
Create audio segment from sequence file. Sequence file is a binary file containing a collection of multiple audio files, with several header bytes in the head indicating the offsets of each audio byte data chunk.
The format is:
4 bytes (int, version), 4 bytes (int, num of utterance), 4 bytes (int, bytes per header), [bytes_per_header*(num_utterance+1)] bytes (offsets for each audio), audio_bytes_data_of_1st_utterance, audio_bytes_data_of_2nd_utterance, ......
Sequence file name must end with ".seqbin". And the filename of the 5th utterance's audio file in sequence file "xxx.seqbin" must be "xxx.seqbin_5", with "5" indicating the utterance index within this sequence file (starting from 1).
- Parameters:
filepath (str) -- Filepath of sequence file.
- Returns:
Audio segment instance.
- Return type:
- gain_db(gain)[source]
Apply gain in decibels to samples.
Note that this is an in-place transformation.
- Parameters:
gain (float|1darray) -- Gain in decibels to apply to samples.
- classmethod make_silence(duration, sample_rate)[source]
Creates a silent audio segment of the given duration and sample rate.
- Parameters:
duration (float) -- Length of silence in seconds.
sample_rate (float) -- Sample rate.
- Returns:
Silent AudioSegment instance of the given duration.
- Return type:
- normalize(target_db=-20, max_gain_db=300.0)[source]
Normalize audio to be of the desired RMS value in decibels.
Note that this is an in-place transformation.
- Parameters:
target_db (float) -- Target RMS value in decibels. This value should be less than 0.0 as 0.0 is full-scale audio.
max_gain_db (float) -- Max amount of gain in dB that can be applied for normalization. This is to prevent nans when attempting to normalize a signal consisting of all zeros.
- Raises:
ValueError -- If the required gain to normalize the segment to the target_db value exceeds max_gain_db.
- normalize_online_bayesian(target_db, prior_db, prior_samples, startup_delay=0.0)[source]
Normalize audio using a production-compatible online/causal algorithm. This uses an exponential likelihood and gamma prior to make online estimates of the RMS even when there are very few samples.
Note that this is an in-place transformation.
- Parameters:
target_db -- Target RMS value in decibels.
prior_db (float) -- Prior RMS estimate in decibels.
prior_samples (float) -- Prior strength in number of samples.
startup_delay (float) -- Default 0.0s. If provided, this function will accrue statistics for the first startup_delay seconds before applying online normalization.
- property num_samples
Return number of samples.
- Returns:
Number of samples.
- Return type:
int
- pad_silence(duration, sides='both')[source]
Pad this audio sample with a period of silence.
Note that this is an in-place transformation.
- Parameters:
duration (float) -- Length of silence in seconds to pad.
sides (str) -- Position for padding: 'beginning' - adds silence in the beginning; 'end' - adds silence in the end; 'both' - adds silence in both the beginning and the end.
- Raises:
ValueError -- If sides is not supported.
- random_subsegment(subsegment_length, rng=None)[source]
Cut the specified length of the audiosegment randomly.
Note that this is an in-place transformation.
- Parameters:
subsegment_length (float) -- Subsegment length in seconds.
rng (random.Random) -- Random number generator state.
- Raises:
ValueError -- If the length of subsegment is greater than the origineal segemnt.
- resample(target_sample_rate, filter='kaiser_best')[source]
Resample the audio to a target sample rate.
Note that this is an in-place transformation.
- Parameters:
target_sample_rate (int) -- Target sample rate.
filter (str) -- The resampling filter to use one of {'kaiser_best', 'kaiser_fast'}.
- property rms_db
Return root mean square energy of the audio in decibels.
- Returns:
Root mean square energy in decibels.
- Return type:
float
- property sample_rate
Return audio sample rate.
- Returns:
Audio sample rate.
- Return type:
int
- property samples
Return audio samples.
- Returns:
Audio samples.
- Return type:
ndarray
- shift(shift_ms)[source]
Shift the audio in time. If shift_ms is positive, shift with time advance; if negative, shift with time delay. Silence are padded to keep the duration unchanged.
Note that this is an in-place transformation.
- Parameters:
shift_ms (float) -- Shift time in millseconds. If positive, shift with time advance; if negative; shift with time delay.
- Raises:
ValueError -- If shift_ms is longer than audio duration.
- classmethod slice_from_file(file, start=None, end=None)[source]
Loads a small section of an audio without having to load the entire file into the memory which can be incredibly wasteful.
- Parameters:
file (str|file) -- Input audio filepath or file object.
start (float) -- Start time in seconds. If start is negative, it wraps around from the end. If not provided, this function reads from the very beginning.
end (float) -- End time in seconds. If end is negative, it wraps around from the end. If not provided, the default behvaior is to read to the end of the file.
- Returns:
AudioSegment instance of the specified slice of the input audio file.
- Return type:
- Raises:
ValueError -- If start or end is incorrectly set, e.g. out of bounds in time.
- subsegment(start_sec=None, end_sec=None)[source]
Cut the AudioSegment between given boundaries.
Note that this is an in-place transformation.
- Parameters:
start_sec (float) -- Beginning of subsegment in seconds.
end_sec (float) -- End of subsegment in seconds.
- Raises:
ValueError -- If start_sec or end_sec is incorrectly set, e.g. out of bounds in time.
- superimpose(other)[source]
Add samples from another segment to those of this segment (sample-wise addition, not segment concatenation).
Note that this is an in-place transformation.
- Parameters:
other (AudioSegments) -- Segment containing samples to be added in.
- Raises:
TypeError -- If type of two segments don't match.
ValueError -- If the sample rates of the two segments are not equal, or if the lengths of segments don't match.
- to(dtype='int16')[source]
Create a dtype audio content.
- Parameters:
dtype (str) -- Data type for export samples. Options: 'int16', 'int32', 'float32', 'float64'. Default is 'float32'.
- Returns:
np.ndarray containing dtype audio content.
- Return type:
str
- to_bytes(dtype='float32')[source]
Create a byte string containing the audio content.
- Parameters:
dtype (str) -- Data type for export samples. Options: 'int16', 'int32', 'float32', 'float64'. Default is 'float32'.
- Returns:
Byte string containing audio content.
- Return type:
str
- to_wav_file(filepath, dtype='float32')[source]
Save audio segment to disk as wav file.
- Parameters:
filepath (str|file) -- WAV filepath or file object to save the audio segment.
dtype (str) -- Subtype for audio file. Options: 'int16', 'int32', 'float32', 'float64'. Default is 'float32'.
- Raises:
TypeError -- If dtype is not supported.