paddlespeech.t2s.datasets.preprocess_utils module

paddlespeech.t2s.datasets.preprocess_utils.compare_duration_and_mel_length(sentences, utt, mel)[source]

check duration error, correct sentences[utt] if possible, else pop sentences[utt] Args:

sentences (Dict): sentences[utt] = [phones_list ,durations_list] utt (str): utt_id mel (np.ndarry): features (num_frames, n_mels)

paddlespeech.t2s.datasets.preprocess_utils.get_input_token(sentence, output_path, dataset='baker')[source]

get phone set from training data and save it Args:

sentence (Dict): sentence: {'utt': ([char], [int])} output_path (str or path):path to save phone_id_map

paddlespeech.t2s.datasets.preprocess_utils.get_phn_dur(file_name)[source]

read MFA duration.txt Args:

file_name (str or Path): path of gen_duration_from_textgrid.py's result

Returns:

Dict: sentence: {'utt': ([char], [int])}

paddlespeech.t2s.datasets.preprocess_utils.get_phones_tones(sentence, phones_output_path, tones_output_path, dataset='baker')[source]

get phone set and tone set from training data and save it Args:

sentence (Dict): sentence: {'utt': ([char], [int])} phones_output_path (str or path): path to save phone_id_map tones_output_path (str or path): path to save tone_id_map

paddlespeech.t2s.datasets.preprocess_utils.get_sentences_svs(file_name, dataset: str = 'opencpop', sample_rate: int = 24000, n_shift: int = 128)[source]

read label file Args:

file_name (str or Path): path of gen_duration_from_textgrid.py's result dataset (str): dataset name

Returns:

Dict: the information of sentence, include [phone id (int)], [the frame of phone (int)], [note id (int)], [note duration (float)], [is slur (int)], text(str), speaker name (str) tuple: speaker name

paddlespeech.t2s.datasets.preprocess_utils.get_spk_id_map(speaker_set, output_path)[source]
paddlespeech.t2s.datasets.preprocess_utils.merge_silence(sentence)[source]

merge silences Args:

sentence (Dict): sentence: {'utt': (([char], [int]), str)}

paddlespeech.t2s.datasets.preprocess_utils.note2midi(notes: List[str]) List[str][source]

Covert note string to note id, for example: ["C1"] -> [24]

Args:

notes (List[str]): the list of note string

Returns:

List[str]: the list of note id

paddlespeech.t2s.datasets.preprocess_utils.time2frame(times: List[float], sample_rate: int = 24000, n_shift: int = 128) List[int][source]

Convert the phoneme duration of time(s) into frames

Args:

times (List[float]): phoneme duration of time(s) sample_rate (int, optional): sample rate. Defaults to 24000. n_shift (int, optional): frame shift. Defaults to 128.

Returns:

List[int]: phoneme duration of frame