paddlespeech.t2s.datasets.preprocess_utils module
- paddlespeech.t2s.datasets.preprocess_utils.compare_duration_and_mel_length(sentences, utt, mel)[source]
check duration error, correct sentences[utt] if possible, else pop sentences[utt] Args:
sentences (Dict): sentences[utt] = [phones_list ,durations_list] utt (str): utt_id mel (np.ndarry): features (num_frames, n_mels)
- paddlespeech.t2s.datasets.preprocess_utils.get_input_token(sentence, output_path, dataset='baker')[source]
get phone set from training data and save it Args:
sentence (Dict): sentence: {'utt': ([char], [int])} output_path (str or path):path to save phone_id_map
- paddlespeech.t2s.datasets.preprocess_utils.get_phn_dur(file_name)[source]
read MFA duration.txt Args:
file_name (str or Path): path of gen_duration_from_textgrid.py's result
- Returns:
Dict: sentence: {'utt': ([char], [int])}
- paddlespeech.t2s.datasets.preprocess_utils.get_phones_tones(sentence, phones_output_path, tones_output_path, dataset='baker')[source]
get phone set and tone set from training data and save it Args:
sentence (Dict): sentence: {'utt': ([char], [int])} phones_output_path (str or path): path to save phone_id_map tones_output_path (str or path): path to save tone_id_map
- paddlespeech.t2s.datasets.preprocess_utils.get_sentences_svs(file_name, dataset: str = 'opencpop', sample_rate: int = 24000, n_shift: int = 128)[source]
read label file Args:
file_name (str or Path): path of gen_duration_from_textgrid.py's result dataset (str): dataset name
- Returns:
Dict: the information of sentence, include [phone id (int)], [the frame of phone (int)], [note id (int)], [note duration (float)], [is slur (int)], text(str), speaker name (str) tuple: speaker name
- paddlespeech.t2s.datasets.preprocess_utils.merge_silence(sentence)[source]
merge silences Args:
sentence (Dict): sentence: {'utt': (([char], [int]), str)}
- paddlespeech.t2s.datasets.preprocess_utils.note2midi(notes: List[str]) List[str] [source]
Covert note string to note id, for example: ["C1"] -> [24]
- Args:
notes (List[str]): the list of note string
- Returns:
List[str]: the list of note id
- paddlespeech.t2s.datasets.preprocess_utils.time2frame(times: List[float], sample_rate: int = 24000, n_shift: int = 128) List[int] [source]
Convert the phoneme duration of time(s) into frames
- Args:
times (List[float]): phoneme duration of time(s) sample_rate (int, optional): sample rate. Defaults to 24000. n_shift (int, optional): frame shift. Defaults to 128.
- Returns:
List[int]: phoneme duration of frame