paddlespeech.t2s.datasets.am_batch_fn module
- class paddlespeech.t2s.datasets.am_batch_fn.ErnieSATCollateFn(mlm_prob: float = 0.8, mean_phn_span: int = 8, seg_emb: bool = False, text_masking: bool = False)[source]
Bases:
object
Functor class of common_collate_fn()
Methods
__call__
(exmaples)Call self as a function.
- class paddlespeech.t2s.datasets.am_batch_fn.StarGANv2VCCollateFn(latent_dim: int = 16, max_mel_length: int = 192)[source]
Bases:
object
Functor class of common_collate_fn()
Methods
__call__
(exmaples)Call self as a function.
random_clip
starganv2_vc_batch_fn
- paddlespeech.t2s.datasets.am_batch_fn.build_erniesat_collate_fn(mlm_prob: float = 0.8, mean_phn_span: int = 8, seg_emb: bool = False, text_masking: bool = False)[source]
- paddlespeech.t2s.datasets.am_batch_fn.build_starganv2_vc_collate_fn(latent_dim: int = 16, max_mel_length: int = 192)[source]
- paddlespeech.t2s.datasets.am_batch_fn.erniesat_batch_fn(examples, mlm_prob: float = 0.8, mean_phn_span: int = 8, seg_emb: bool = False, text_masking: bool = False)[source]
- paddlespeech.t2s.datasets.am_batch_fn.jets_multi_spk_batch_fn(examples)[source]
- Returns:
- Dict[str, Any]:
text (Tensor): Text index tensor (B, T_text).
text_lengths (Tensor): Text length tensor (B,).
feats (Tensor): Feature tensor (B, T_feats, aux_channels).
feats_lengths (Tensor): Feature length tensor (B,).
durations (Tensor): Feature tensor (B, T_text,).
durations_lengths (Tensor): Durations length tensor (B,).
pitch (Tensor): Feature tensor (B, pitch_length,).
energy (Tensor): Feature tensor (B, energy_length,).
speech (Tensor): Speech waveform tensor (B, T_wav).
spk_id (Optional[Tensor]): Speaker index tensor (B,) or (B, 1).
spk_emb (Optional[Tensor]): Speaker embedding tensor (B, spk_embed_dim).
- paddlespeech.t2s.datasets.am_batch_fn.jets_single_spk_batch_fn(examples)[source]
- Returns:
- Dict[str, Any]:
text (Tensor): Text index tensor (B, T_text).
text_lengths (Tensor): Text length tensor (B,).
feats (Tensor): Feature tensor (B, T_feats, aux_channels).
feats_lengths (Tensor): Feature length tensor (B,).
durations (Tensor): Feature tensor (B, T_text,).
durations_lengths (Tensor): Durations length tensor (B,).
pitch (Tensor): Feature tensor (B, pitch_length,).
energy (Tensor): Feature tensor (B, energy_length,).
speech (Tensor): Speech waveform tensor (B, T_wav).
- paddlespeech.t2s.datasets.am_batch_fn.vits_multi_spk_batch_fn(examples)[source]
- Returns:
- Dict[str, Any]:
text (Tensor): Text index tensor (B, T_text).
text_lengths (Tensor): Text length tensor (B,).
feats (Tensor): Feature tensor (B, T_feats, aux_channels).
feats_lengths (Tensor): Feature length tensor (B,).
speech (Tensor): Speech waveform tensor (B, T_wav).
spk_id (Optional[Tensor]): Speaker index tensor (B,) or (B, 1).
spk_emb (Optional[Tensor]): Speaker embedding tensor (B, spk_embed_dim).
- paddlespeech.t2s.datasets.am_batch_fn.vits_single_spk_batch_fn(examples)[source]
- Returns:
- Dict[str, Any]:
text (Tensor): Text index tensor (B, T_text).
text_lengths (Tensor): Text length tensor (B,).
feats (Tensor): Feature tensor (B, T_feats, aux_channels).
feats_lengths (Tensor): Feature length tensor (B,).
speech (Tensor): Speech waveform tensor (B, T_wav).