paddlespeech.s2t.frontend.utility module
Contains data helper functions.
- paddlespeech.s2t.frontend.utility.convert_samples_from_float32(samples, dtype)[source]
Convert sample type from float32 to dtype.
Audio sample type is usually integer or float-point. For integer type, float32 will be rescaled from [-1, 1] to the maximum range supported by the integer type.
PCM32 -> PCM16
- paddlespeech.s2t.frontend.utility.convert_samples_to_float32(samples)[source]
Convert sample type to float32.
Audio sample type is usually integer or float-point. Integers will be scaled to [-1, 1] in float32.
PCM16 -> PCM32
- paddlespeech.s2t.frontend.utility.gain_db_to_ratio(gain_db: float)[source]
dB to ratio
- Args:
gain_db (float): gain in dB
- Returns:
float: scale in amp
- paddlespeech.s2t.frontend.utility.load_cmvn(cmvn_file: str, filetype: str)[source]
load cmvn from file.
- Args:
cmvn_file (str): cmvn path. filetype (str): file type, optional[npz, json, kaldi].
- Raises:
ValueError: file type not support.
- Returns:
Tuple[np.ndarray, np.ndarray]: mean, istd
- paddlespeech.s2t.frontend.utility.load_dict(dict_path: Optional[str], maskctc=False) Optional[List[str]] [source]
- paddlespeech.s2t.frontend.utility.max_dbfs(sample_data: ndarray)[source]
Peak dBFS based on the maximum energy sample.
- Args:
sample_data ([np.ndarray]): float array, [-1, 1].
- Returns:
float: dBFS
- paddlespeech.s2t.frontend.utility.mean_dbfs(sample_data)[source]
Peak dBFS based on the RMS energy.
- Args:
sample_data ([np.ndarray]): float array, [-1, 1].
- Returns:
float: dBFS
- paddlespeech.s2t.frontend.utility.normalize_audio(sample_data: ndarray, dbfs: float = -3.0103)[source]
Nomalize audio to dBFS.
- Args:
sample_data (np.ndarray): input wave samples, [-1, 1]. dbfs (float, optional): target dBFS. Defaults to -3.0103.
- Returns:
np.ndarray: normalized wave
- paddlespeech.s2t.frontend.utility.read_manifest(manifest_path, max_input_len=inf, min_input_len=0.0, max_output_len=inf, min_output_len=0.0, max_output_input_ratio=inf, min_output_input_ratio=0.0)[source]
Load and parse manifest file.
- Args:
manifest_path ([type]): Manifest file to load and parse. max_input_len ([type], optional): maximum output seq length,
in seconds for raw wav, in frame numbers for feature data. Defaults to float('inf').
- min_input_len (float, optional): minimum input seq length,
in seconds for raw wav, in frame numbers for feature data. Defaults to 0.0.
- max_output_len (float, optional): maximum input seq length,
in modeling units. Defaults to 500.0.
- min_output_len (float, optional): minimum input seq length,
in modeling units. Defaults to 0.0.
- max_output_input_ratio (float, optional):
maximum output seq length/output seq length ratio. Defaults to 10.0.
- min_output_input_ratio (float, optional):
minimum output seq length/output seq length ratio. Defaults to 0.05.
- Raises:
IOError: If failed to parse the manifest.
- Returns:
List[dict]: Manifest parsing results.
- paddlespeech.s2t.frontend.utility.rms_to_db(rms: float)[source]
Root Mean Square to dB.
- Args:
rms ([float]): root mean square
- Returns:
float: dB
- paddlespeech.s2t.frontend.utility.rms_to_dbfs(rms: float)[source]
Root Mean Square to dBFS. https://fireattack.wordpress.com/2017/02/06/replaygain-loudness-normalization-and-applications/ Audio is mix of sine wave, so 1 amp sine wave's Full scale is 0.7071, equal to -3.0103dB.
dB = dBFS + 3.0103 dBFS = db - 3.0103 e.g. 0 dB = -3.0103 dBFS
- Args:
rms ([float]): root mean square
- Returns:
float: dBFS