paddlespeech.s2t.frontend.utility module

Contains data helper functions.

paddlespeech.s2t.frontend.utility.convert_samples_from_float32(samples, dtype)[source]

Convert sample type from float32 to dtype.

Audio sample type is usually integer or float-point. For integer type, float32 will be rescaled from [-1, 1] to the maximum range supported by the integer type.

PCM32 -> PCM16

paddlespeech.s2t.frontend.utility.convert_samples_to_float32(samples)[source]

Convert sample type to float32.

Audio sample type is usually integer or float-point. Integers will be scaled to [-1, 1] in float32.

PCM16 -> PCM32

paddlespeech.s2t.frontend.utility.gain_db_to_ratio(gain_db: float)[source]

dB to ratio

Args:

gain_db (float): gain in dB

Returns:

float: scale in amp

paddlespeech.s2t.frontend.utility.load_cmvn(cmvn_file: str, filetype: str)[source]

load cmvn from file.

Args:

cmvn_file (str): cmvn path. filetype (str): file type, optional[npz, json, kaldi].

Raises:

ValueError: file type not support.

Returns:

Tuple[np.ndarray, np.ndarray]: mean, istd

paddlespeech.s2t.frontend.utility.load_dict(dict_path: Optional[str], maskctc=False) Optional[List[str]][source]
paddlespeech.s2t.frontend.utility.max_dbfs(sample_data: ndarray)[source]

Peak dBFS based on the maximum energy sample.

Args:

sample_data ([np.ndarray]): float array, [-1, 1].

Returns:

float: dBFS

paddlespeech.s2t.frontend.utility.mean_dbfs(sample_data)[source]

Peak dBFS based on the RMS energy.

Args:

sample_data ([np.ndarray]): float array, [-1, 1].

Returns:

float: dBFS

paddlespeech.s2t.frontend.utility.normalize_audio(sample_data: ndarray, dbfs: float = -3.0103)[source]

Nomalize audio to dBFS.

Args:

sample_data (np.ndarray): input wave samples, [-1, 1]. dbfs (float, optional): target dBFS. Defaults to -3.0103.

Returns:

np.ndarray: normalized wave

paddlespeech.s2t.frontend.utility.read_manifest(manifest_path, max_input_len=inf, min_input_len=0.0, max_output_len=inf, min_output_len=0.0, max_output_input_ratio=inf, min_output_input_ratio=0.0)[source]

Load and parse manifest file.

Args:

manifest_path ([type]): Manifest file to load and parse. max_input_len ([type], optional): maximum output seq length,

in seconds for raw wav, in frame numbers for feature data. Defaults to float('inf').

min_input_len (float, optional): minimum input seq length,

in seconds for raw wav, in frame numbers for feature data. Defaults to 0.0.

max_output_len (float, optional): maximum input seq length,

in modeling units. Defaults to 500.0.

min_output_len (float, optional): minimum input seq length,

in modeling units. Defaults to 0.0.

max_output_input_ratio (float, optional):

maximum output seq length/output seq length ratio. Defaults to 10.0.

min_output_input_ratio (float, optional):

minimum output seq length/output seq length ratio. Defaults to 0.05.

Raises:

IOError: If failed to parse the manifest.

Returns:

List[dict]: Manifest parsing results.

paddlespeech.s2t.frontend.utility.rms_to_db(rms: float)[source]

Root Mean Square to dB.

Args:

rms ([float]): root mean square

Returns:

float: dB

paddlespeech.s2t.frontend.utility.rms_to_dbfs(rms: float)[source]

Root Mean Square to dBFS. https://fireattack.wordpress.com/2017/02/06/replaygain-loudness-normalization-and-applications/ Audio is mix of sine wave, so 1 amp sine wave's Full scale is 0.7071, equal to -3.0103dB.

dB = dBFS + 3.0103 dBFS = db - 3.0103 e.g. 0 dB = -3.0103 dBFS

Args:

rms ([float]): root mean square

Returns:

float: dBFS