PaddleSpeech

What is PaddleSpeech?

PaddleSpeech is an open-source toolkit on the PaddlePaddle platform for two critical tasks in Speech - Speech-to-Text (Automatic Speech Recognition, ASR) and Text-to-Speech Synthesis (TTS), with modules involving state-of-art and influential models.

What can PaddleSpeech do?

Speech-to-Text

PaddleSpeech ASR mainly consists of components below:

Implementation of models and commonly used neural network layers.
Dataset abstraction and common data preprocessing pipelines.
Ready-to-run experiments.

PaddleSpeech ASR provides you with a complete ASR pipeline, including:

Data Preparation
- Build vocabulary
- Compute Cepstral mean and variance normalization (CMVN)
- Featrue extraction
  - linear
  - fbank (also support kaldi feature)
  - mfcc
Acoustic Models
- Deepspeech2 (Streaming and Non-Streaming)
- Transformer (Streaming and Non-Streaming)
- Conformer (Streaming and Non-Streaming)
Decoder
- ctc greedy search (used in DeepSpeech2, Transformer and Conformer)
- ctc beam search (used in DeepSpeech2, Transformer and Conformer)
- attention decoding (used in Transformer and Conformer)
- attention rescoring (used in Transformer and Conformer)

Speech-to-Text helps you train the ASR model very simply.

Text-to-Speech

TTS mainly consists of components below:

Implementation of models and commonly used neural network layers.
Dataset abstraction and common data preprocessing pipelines.
Ready-to-run experiments.

PaddleSpeech TTS provides you with a complete TTS pipeline, including:

Text FrontEnd
- Rule based Chinese frontend.
Acoustic Models
- FastSpeech2
- SpeedySpeech
- TransformerTTS
- Tacotron2
Vocoders
- Multi Band MelGAN
- Parallel WaveGAN
- WaveFlow
Voice Cloning
- Transfer Learning from Speaker Verification to Multispeaker Text-to-Speech Synthesis
- GE2E

Text-to-Speech helps you to train TTS models with simple commands.