PaddleSpeech
What is PaddleSpeech?
PaddleSpeech is an open-source toolkit on the PaddlePaddle platform for two critical tasks in Speech - Speech-to-Text (Automatic Speech Recognition, ASR) and Text-to-Speech Synthesis (TTS), with modules involving state-of-art and influential models.
What can PaddleSpeech do?
Speech-to-Text
PaddleSpeech ASR mainly consists of components below:
Implementation of models and commonly used neural network layers.
Dataset abstraction and common data preprocessing pipelines.
Ready-to-run experiments.
PaddleSpeech ASR provides you with a complete ASR pipeline, including:
Data Preparation
Build vocabulary
Compute Cepstral mean and variance normalization (CMVN)
Featrue extraction
linear
fbank (also support kaldi feature)
mfcc
Acoustic Models
Deepspeech2 (Streaming and Non-Streaming)
Transformer (Streaming and Non-Streaming)
Conformer (Streaming and Non-Streaming)
Decoder
ctc greedy search (used in DeepSpeech2, Transformer and Conformer)
ctc beam search (used in DeepSpeech2, Transformer and Conformer)
attention decoding (used in Transformer and Conformer)
attention rescoring (used in Transformer and Conformer)
Speech-to-Text helps you train the ASR model very simply.
Text-to-Speech
TTS mainly consists of components below:
Implementation of models and commonly used neural network layers.
Dataset abstraction and common data preprocessing pipelines.
Ready-to-run experiments.
PaddleSpeech TTS provides you with a complete TTS pipeline, including:
Text FrontEnd
Rule based Chinese frontend.
Acoustic Models
FastSpeech2
SpeedySpeech
TransformerTTS
Tacotron2
Vocoders
Multi Band MelGAN
Parallel WaveGAN
WaveFlow
Voice Cloning
Transfer Learning from Speaker Verification to Multispeaker Text-to-Speech Synthesis
GE2E
Text-to-Speech helps you to train TTS models with simple commands.