Released Models

!!! Since PaddlePaddle support 0-D tensor from 2.5.0, PaddleSpeech Static model will not work for it, please re-export static model.

Speech-to-Text Models

Speech Recognition Model

Acoustic Model

Training Data

Token-based

Size

Descriptions

CER

WER

Hours of speech

Example Link

Inference Type

static_model

Ds2 Online Wenetspeech ASR0 Model

Wenetspeech Dataset

Char-based

1.2 GB

2 Conv + 5 LSTM layers

0.152 (test_net, w/o LM)
0.2417 (test_meeting, w/o LM)
0.053 (aishell, w/ LM)

-

10000 h

-

onnx/inference/python

-

Ds2 Online Aishell ASR0 Model

Aishell Dataset

Char-based

491 MB

2 Conv + 5 LSTM layers

0.0666

-

151 h

D2 Online Aishell ASR0

onnx/inference/python

-

Ds2 Offline Aishell ASR0 Model

Aishell Dataset

Char-based

1.4 GB

2 Conv + 5 bidirectional LSTM layers

0.0554

-

151 h

Ds2 Offline Aishell ASR0

inference/python

-

Conformer Online Wenetspeech ASR1 Model

WenetSpeech Dataset

Char-based

457 MB

Encoder:Conformer, Decoder:Transformer, Decoding method: Attention rescoring

0.11 (test_net) 0.1879 (test_meeting)

-

10000 h

-

python

-

Conformer U2PP Online Wenetspeech ASR1 Model

WenetSpeech Dataset

Char-based

540 MB

Encoder:Conformer, Decoder:BiTransformer, Decoding method: Attention rescoring

0.047198 (aishell test_-1) 0.059212 (aishell test_16)

-

10000 h

-

python

FP32
INT8

Conformer Online Aishell ASR1 Model

Aishell Dataset

Char-based

189 MB

Encoder:Conformer, Decoder:Transformer, Decoding method: Attention rescoring

0.051968

-

151 h

Conformer Online Aishell ASR1

python

-

Conformer Offline Aishell ASR1 Model

Aishell Dataset

Char-based

189 MB

Encoder:Conformer, Decoder:Transformer, Decoding method: Attention rescoring

0.0460

-

151 h

Conformer Offline Aishell ASR1

python

-

Transformer Aishell ASR1 Model

Aishell Dataset

Char-based

128 MB

Encoder:Transformer, Decoder:Transformer, Decoding method: Attention rescoring

0.0523

151 h

Transformer Aishell ASR1

python

-

Ds2 Offline Librispeech ASR0 Model

Librispeech Dataset

Char-based

1.3 GB

2 Conv + 5 bidirectional LSTM layers

-

0.0467

960 h

Ds2 Offline Librispeech ASR0

inference/python

-

Conformer Librispeech ASR1 Model

Librispeech Dataset

subword-based

191 MB

Encoder:Conformer, Decoder:Transformer, Decoding method: Attention rescoring

-

0.0338

960 h

Conformer Librispeech ASR1

python

-

Transformer Librispeech ASR1 Model

Librispeech Dataset

subword-based

131 MB

Encoder:Transformer, Decoder:Transformer, Decoding method: Attention rescoring

-

0.0381

960 h

Transformer Librispeech ASR1

python

-

Transformer Librispeech ASR2 Model

Librispeech Dataset

subword-based

131 MB

Encoder:Transformer, Decoder:Transformer, Decoding method: JoinCTC w/ LM

-

0.0240

960 h

Transformer Librispeech ASR2

python

-

Conformer TALCS ASR1 Model

TALCS Dataset

subword-based

470 MB

Encoder:Conformer, Decoder:Transformer, Decoding method: Attention rescoring

-

0.0844

587 h

Conformer TALCS ASR1

python

-

Self-Supervised Pre-trained Model

Model

Pre-Train Method

Pre-Train Data

Finetune Data

Size

Descriptions

CER

WER

Example Link

Wav2vec2-large-960h-lv60-self Model

wav2vec2

Librispeech and LV-60k Dataset (5.3w h)

-

1.18 GB

Pre-trained Wav2vec2.0 Model

-

-

-

Wav2vec2ASR-large-960h-librispeech Model

wav2vec2

Librispeech and LV-60k Dataset (5.3w h)

Librispeech (960 h)

718 MB

Encoder: Wav2vec2.0, Decoder: CTC, Decoding method: Greedy search

-

0.0189

Wav2vecASR Librispeech ASR3

Wav2vec2-large-wenetspeech-self Model

wav2vec2

Wenetspeech Dataset (1w h)

-

714 MB

Pre-trained Wav2vec2.0 Model

-

-

-

Wav2vec2ASR-large-aishell1 Model

wav2vec2

Wenetspeech Dataset (1w h)

aishell1 (train set)

1.18 GB

Encoder: Wav2vec2.0, Decoder: CTC, Decoding method: Greedy search

0.0510

-

-

Hubert-large-lv60 Model

hubert

LV-60k Dataset

-

1.18 GB

Pre-trained hubert Model

-

-

-

Hubert-large-100h-librispeech Model

hubert

LV-60k Dataset

librispeech train-clean-100

1.27 GB

Encoder: Hubert, Decoder: Linear + CTC, Decoding method: Greedy search

-

0.0587

HubertASR Librispeech ASR4

Whisper Model

Demo Link

Training Data

Size

Descriptions

CER

Model

Whisper

680kh from internet

large: 5.8G,
medium: 2.9G,
small: 923M,
base: 277M,
tiny: 145M

Encoder:Transformer,
Decoder:Transformer,
Decoding method:
Greedy search

0.027
(large, Librispeech)

whisper-large
whisper-medium
whisper-medium-English-only
whisper-small
whisper-small-English-only
whisper-base
whisper-base-English-only
whisper-tiny
whisper-tiny-English-only

Language Model based on NGram

Language Model

Training Data

Token-based

Size

Descriptions

English LM

CommonCrawl(en.00)

Word-based

8.3 GB

Pruned with 0 1 1 1 1;
About 1.85 billion n-grams;
'trie' binary with '-a 22 -q 8 -b 8'

Mandarin LM Small

Baidu Internal Corpus

Char-based

2.8 GB

Pruned with 0 1 2 4 4;
About 0.13 billion n-grams;
'probing' binary with default settings

Mandarin LM Large

Baidu Internal Corpus

Char-based

70.4 GB

No Pruning;
About 3.7 billion n-grams;
'probing' binary with default settings

Speech Translation Models

Model

Training Data

Token-based

Size

Descriptions

BLEU

Example Link

(only for CLI)Transformer FAT-ST MTL En-Zh

Ted-En-Zh

Spm

Encoder:Transformer, Decoder:Transformer,
Decoding method: Attention

20.80

Transformer Ted-En-Zh ST1

Text-to-Speech Models

Acoustic Models

Model Type

Dataset

Example Link

Pretrained Models

Static / ONNX / Paddle-Lite Models

Size (static)

Tacotron2

LJSpeech

tacotron2-ljspeech

tacotron2_ljspeech_ckpt_0.2.0.zip

Tacotron2

CSMSC

tacotron2-csmsc

tacotron2_csmsc_ckpt_0.2.0.zip

tacotron2_csmsc_static_0.2.0.zip

103MB

TransformerTTS

LJSpeech

transformer-ljspeech

transformer_tts_ljspeech_ckpt_0.4.zip

SpeedySpeech

CSMSC

speedyspeech-csmsc

speedyspeech_csmsc_ckpt_0.2.0.zip

speedyspeech_csmsc_static_0.2.0.zip
speedyspeech_csmsc_onnx_0.2.0.zip
speedyspeech_csmsc_pdlite_1.3.0.zip

13MB

FastSpeech2

CSMSC

fastspeech2-csmsc

fastspeech2_nosil_baker_ckpt_0.4.zip

fastspeech2_csmsc_static_0.2.0.zip
fastspeech2_csmsc_onnx_0.2.0.zip
fastspeech2_csmsc_pdlite_1.3.0.zip

157MB

FastSpeech2-Conformer

CSMSC

fastspeech2-csmsc

fastspeech2_conformer_baker_ckpt_0.5.zip

FastSpeech2-CNNDecoder

CSMSC

fastspeech2-csmsc

fastspeech2_cnndecoder_csmsc_ckpt_1.0.0.zip

fastspeech2_cnndecoder_csmsc_static_1.0.0.zip
fastspeech2_cnndecoder_csmsc_streaming_static_1.0.0.zip
fastspeech2_cnndecoder_csmsc_onnx_1.0.0.zip
fastspeech2_cnndecoder_csmsc_streaming_onnx_1.0.0.zip
fastspeech2_cnndecoder_csmsc_pdlite_1.3.0.zip
fastspeech2_cnndecoder_csmsc_streaming_pdlite_1.3.0.zip

84MB

FastSpeech2

AISHELL-3

fastspeech2-aishell3

fastspeech2_aishell3_ckpt_1.1.0.zip

fastspeech2_aishell3_static_1.1.0.zip
fastspeech2_aishell3_onnx_1.1.0.zip
fastspeech2_aishell3_pdlite_1.3.0.zip

147MB

FastSpeech2

LJSpeech

fastspeech2-ljspeech

fastspeech2_nosil_ljspeech_ckpt_0.5.zip

fastspeech2_ljspeech_static_1.1.0.zip
fastspeech2_ljspeech_onnx_1.1.0.zip
fastspeech2_ljspeech_pdlite_1.3.0.zip

145MB

FastSpeech2

VCTK

fastspeech2-vctk

fastspeech2_vctk_ckpt_1.2.0.zip

fastspeech2_vctk_static_1.1.0.zip
fastspeech2_vctk_onnx_1.1.0.zip
fastspeech2_vctk_pdlite_1.3.0.zip

145MB

FastSpeech2

ZH_EN

fastspeech2-zh_en

fastspeech2_mix_ckpt_1.2.0.zip

fastspeech2_mix_static_0.2.0.zip
fastspeech2_mix_onnx_0.2.0.zip

145MB

FastSpeech2

male-zh

fastspeech2_male_zh_ckpt_1.4.0.zip

fastspeech2_male_zh_static_1.4.0.zip
fastspeech2_male_zh_onnx_1.4.0.zip

146MB

FastSpeech2

male-en

fastspeech2_male_en_ckpt_1.4.0.zip

fastspeech2_male_en_static_1.4.0.zip
fastspeech2_male_en_onnx_1.4.0.zip

145MB

FastSpeech2

male-mix

fastspeech2_male_mix_ckpt_1.4.0.zip

fastspeech2_male_mix_static_1.4.0.zip
fastspeech2_male_mix_onnx_1.4.0.zip

146MB

FastSpeech2

Cantonese

fastspeech2-canton

fastspeech2_canton_ckpt_1.4.0.zip

fastspeech2_canton_static_1.4.0.zip
fastspeech2_canton_onnx_1.4.0.zip

146MB

Vocoders

Model Type

Dataset

Example Link

Pretrained Models

Static / ONNX / Paddle-Lite Models

Size (static)

WaveFlow

LJSpeech

waveflow-ljspeech

waveflow_ljspeech_ckpt_0.3.zip

Parallel WaveGAN

CSMSC

PWGAN-csmsc

pwg_baker_ckpt_0.4.zip

pwg_baker_static_0.4.zip
pwgan_csmsc_onnx_0.2.0.zip
pwgan_csmsc_pdlite_1.3.0.zip

4.8MB

Parallel WaveGAN

LJSpeech

PWGAN-ljspeech

pwg_ljspeech_ckpt_0.5.zip

pwgan_ljspeech_static_1.1.0.zip
pwgan_ljspeech_onnx_1.1.0.zip
pwgan_ljspeech_pdlite_1.3.0.zip

4.8MB

Parallel WaveGAN

AISHELL-3

PWGAN-aishell3

pwg_aishell3_ckpt_0.5.zip

pwgan_aishell3_static_1.1.0.zip
pwgan_aishell3_onnx_1.1.0.zip
pwgan_aishell3_pdlite_1.3.0.zip

4.8MB

Parallel WaveGAN

VCTK

PWGAN-vctk

pwg_vctk_ckpt_0.5.zip

pwgan_vctk_static_1.1.0.zip
pwgan_vctk_onnx_1.1.0.zip
pwgan_vctk_pdlite_1.3.0.zip

4.8MB

Multi Band MelGAN

CSMSC

MB MelGAN-csmsc

mb_melgan_csmsc_ckpt_0.1.1.zip
mb_melgan_baker_finetune_ckpt_0.5.zip

mb_melgan_csmsc_static_0.1.1.zip
mb_melgan_csmsc_onnx_0.2.0.zip
mb_melgan_csmsc_pdlite_1.3.0.zip

7.6MB

Style MelGAN

CSMSC

Style MelGAN-csmsc

style_melgan_csmsc_ckpt_0.1.1.zip

HiFiGAN

CSMSC

HiFiGAN-csmsc

hifigan_csmsc_ckpt_0.1.1.zip

hifigan_csmsc_static_0.1.1.zip
hifigan_csmsc_onnx_0.2.0.zip
hifigan_csmsc_pdlite_1.3.0.zip

46MB

HiFiGAN

LJSpeech

HiFiGAN-ljspeech

hifigan_ljspeech_ckpt_0.2.0.zip

hifigan_ljspeech_static_1.1.0.zip
hifigan_ljspeech_onnx_1.1.0.zip
hifigan_ljspeech_pdlite_1.3.0.zip

49MB

HiFiGAN

AISHELL-3

HiFiGAN-aishell3

hifigan_aishell3_ckpt_0.2.0.zip

hifigan_aishell3_static_1.1.0.zip
hifigan_aishell3_onnx_1.1.0.zip
hifigan_aishell3_pdlite_1.3.0.zip

46MB

HiFiGAN

VCTK

HiFiGAN-vctk

hifigan_vctk_ckpt_0.2.0.zip

hifigan_vctk_static_1.1.0.zip
hifigan_vctk_onnx_1.1.0.zip
hifigan_vctk_pdlite_1.3.0.zip

46MB

WaveRNN

CSMSC

WaveRNN-csmsc

wavernn_csmsc_ckpt_0.2.0.zip

wavernn_csmsc_static_0.2.0.zip

18MB

Parallel WaveGAN

Male

pwg_male_ckpt_1.4.0.zip

pwgan_male_static_1.4.0.zip
pwgan_male_onnx_1.4.0.zip

4.8M

HiFiGAN

Male

hifigan_male_ckpt_1.4.0.zip

hifigan_male_static_1.4.0.zip
hifigan_male_onnx_1.4.0.zip

46M

Voice Cloning

Model Type

Dataset

Example Link

Pretrained Models

GE2E

AISHELL-3, etc.

ge2e

ge2e_ckpt_0.3.zip

GE2E + Tacotron2

AISHELL-3

ge2e-Tacotron2-aishell3

tacotron2_aishell3_ckpt_vc0_0.2.0.zip

GE2E + FastSpeech2

AISHELL-3

ge2e-fastspeech2-aishell3

fastspeech2_nosil_aishell3_vc1_ckpt_0.5.zip

Audio Classification Models

Model Type

Dataset

Example Link

Pretrained Models

Static Models

PANN

Audioset

audioset_tagging_cnn

panns_cnn6.pdparams, panns_cnn10.pdparams, panns_cnn14.pdparams

panns_cnn6_static.tar.gz(18M), panns_cnn10_static.tar.gz(19M), panns_cnn14_static.tar.gz(289M)

PANN

ESC-50

pann-esc50

esc50_cnn6.tar.gz, esc50_cnn10.tar.gz, esc50_cnn14.tar.gz

Speaker Verification Models

Model Type

Dataset

Example Link

Pretrained Models

Static Models

ECAPA-TDNN

VoxCeleb

voxceleb_ecapatdnn

ecapatdnn.tar.gz

-

Punctuation Restoration Models

Model Type

Dataset

Example Link

Pretrained Models

Ernie Linear

IWLST2012_zh

iwslt2012_punc0

ernie_linear_p3_iwslt2012_zh_ckpt_0.1.1.zip