Audio Sample
The main processes of TTS include:
Convert the original text into characters/phonemes, through
text frontend
module.Convert characters/phonemes into acoustic features , such as linear spectrogram, mel spectrogram, LPC features, etc. through
Acoustic models
.Convert acoustic features into waveforms through
Vocoders
.
When training Tacotron2
、TransformerTTS
and WaveFlow
, we use English single speaker TTS dataset LJSpeech by default. However, when training SpeedySpeech
, FastSpeech2
and ParallelWaveGAN
, we use Chinese single speaker dataset CSMSC by default.
In the future, PaddleSpeech TTS
will mainly use Chinese TTS datasets for default examples.
Here, we will display three types of audio samples:
Analysis/synthesis (ground-truth spectrograms + Vocoder)
TTS (Acoustic model + Vocoder)
Chinese TTS with/without text frontend (mainly tone sandhi)
Analysis/synthesis
Audio samples generated from ground-truth spectrograms with a vocoder.
LJSpeech(English)Text | GT | WaveFlow |
---|---|---|
Printing, in the only sense with which we are at present concerned, differs from most if not from all the arts and crafts represented in the Exhibition | ||
in being comparatively modern. |
audio element.
|
audio element.
|
For although the Chinese took impressions from wood blocks engraved in relief for centuries before the woodcutters of the Netherlands, by a similar process |
audio element.
|
audio element.
|
produced the block books, which were the immediate predecessors of the true printed book |
audio element.
|
audio element.
|
the invention of movable metal letters in the middle of the fifteenth century may justly be considered as the invention of the art of printing. |
audio element.
|
audio element.
|
CSMSC(Chinese)
Text | GT (convert to 24k) | ParallelWaveGAN |
---|---|---|
昨日,这名“伤者”与医生全部被警方依法刑事拘留 |
audio element.
|
audio element.
|
钱伟长想到上海来办学校是经过深思熟虑的。 |
audio element.
|
audio element.
|
她见我一进门就骂,吃饭时也骂,骂得我抬不起头。 |
audio element.
|
audio element.
|
李述德在离开之前,只说了一句“柱驼杀父亲了” |
audio element.
|
audio element.
|
这种车票和保险单捆绑出售属于重复性购买。 |
audio element.
|
audio element.
|
TTS
Audio samples generated by a TTS system. Text is first transformed into spectrogram by a text-to-spectrogram model, then the spectrogram is converted into raw audio by a vocoder.
LJSpeech(English)Text | TransformerTTS + WaveFlow | Tacotron2 + WaveFlow |
---|---|---|
Life was like a box of chocolates, you never know what you're gonna get. |
audio element.
|
audio element.
|
With great power there must come great responsibility. |
audio element.
|
audio element.
|
To be or not to be, that’s a question. |
audio element.
|
audio element.
|
A man can be destroyed but not defeated. |
audio element.
|
audio element.
|
Do not, for one repulse, give up the purpose that you resolved to effort. |
audio element.
|
audio element.
|
Death is just a part of life, something we're all destined to do. |
audio element.
|
audio element.
|
I think it's hard winning a war with words. |
audio element.
|
audio element.
|
Don’t argue with the people of strong determination, because they may change the fact! |
audio element.
|
audio element.
|
Love you three thousand times. |
audio element.
|
audio element.
|
CSMSC(Chinese)
Text | SpeedySpeech + ParallelWaveGAN | FastSpeech2 + ParallelWaveGAN |
---|---|---|
凯莫瑞安联合体的经济崩溃,迫在眉睫。 |
audio element.
|
audio element.
|
对于所有想要离开那片废土,去寻找更美好生活的人来说。 |
audio element.
|
audio element.
|
克哈,是你们所有人安全的港湾。 |
audio element.
|
audio element.
|
为了保护尤摩扬人民不受异虫的残害,我所做的,比他们自己的领导委员会都多。 |
audio element.
|
audio element.
|
无论他们如何诽谤我,我将继续为所有泰伦人的最大利益,而努力奋斗。 |
audio element.
|
audio element.
|
身为你们的元首,我带领泰伦人实现了人类统治领地和经济的扩张。 |
audio element.
|
audio element.
|
我们将继续成长,用行动回击那些只会说风凉话,不愿意和我们相向而行的害群之马。 |
audio element.
|
audio element.
|
帝国武装力量,无数的优秀儿女,正时刻守卫着我们的家园大门,但是他们孤木难支。 |
audio element.
|
audio element.
|
凡是今天应征入伍者,所获的所有刑罚罪责,减半。 |
audio element.
|
audio element.
|
FastSpeech2-Conformer + ParallelWaveGAN |
---|
audio element.
|
audio element.
|
audio element.
|
audio element.
|
audio element.
|
audio element.
|
audio element.
|
audio element.
|
audio element.
|
Multi-Speaker TTS
PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generated by FastSpeech2 + ParallelWaveGAN, we use AISHELL-3 Multi-Speaker TTS dataset. Each line is a different person.
Target Timbre | Generated | |||
---|---|---|---|---|
audio element.
|
audio element.
|
|||
audio element.
|
audio element.
|
|||
audio element.
|
audio element.
|
|||
audio element.
|
audio element.
|
|||
audio element.
|
audio element.
|
|||
audio element.
|
audio element.
|
|||
audio element.
|
audio element.
|
|||
audio element.
|
audio element.
|
|||
audio element.
|
audio element.
|
|||
audio element.
|
audio element.
|
|||
audio element.
|
audio element.
|
|||
audio element.
|
audio element.
|
|||
audio element.
|
audio element.
|
|||
audio element.
|
audio element.
|
|||
audio element.
|
audio element.
|
|||
audio element.
|
audio element.
|
|||
audio element.
|
audio element.
|
|||
audio element.
|
audio element.
|
|||
audio element.
|
audio element.
|
|||
audio element.
|
audio element.
|
Speed(0.8x) | Speed(1x) | Speed(1.2x) | ||
---|---|---|---|---|
audio element.
|
audio element.
|
audio element.
|
||
audio element.
|
audio element.
|
audio element.
|
||
audio element.
|
audio element.
|
audio element.
|
||
audio element.
|
audio element.
|
audio element.
|
||
audio element.
|
audio element.
|
audio element.
|
||
audio element.
|
audio element.
|
audio element.
|
||
audio element.
|
audio element.
|
audio element.
|
||
audio element.
|
audio element.
|
audio element.
|
Robot | Child | |||
---|---|---|---|---|
audio element.
|
audio element.
|
|||
audio element.
|
audio element.
|
|||
audio element.
|
audio element.
|
|||
audio element.
|
audio element.
|
|||
audio element.
|
audio element.
|
|||
audio element.
|
audio element.
|
|||
audio element.
|
audio element.
|
|||
audio element.
|
audio element.
|
Text | With Text Frontend | Without Text Frontend | ||
---|---|---|---|---|
他只是一个纸老虎。 |
audio element.
|
audio element.
|
||
手表厂有五种好产品。 |
audio element.
|
audio element.
|
||
老板的轿车需要保养。 |
audio element.
|
audio element.
|
||
我们所有人都好喜欢你呀。 |
audio element.
|
audio element.
|
||
岂有此理。 |
audio element.
|
audio element.
|
||
虎骨酒多少钱一瓶。 |
audio element.
|
audio element.
|
||
这件事情需要冷处理。 |
audio element.
|
audio element.
|
||
这个老奶奶是个大喇叭。 |
audio element.
|
audio element.
|
||
我喜欢说相声。 |
audio element.
|
audio element.
|
||
有一天,我路过了一栋楼。 |
audio element.
|
audio element.
|
Frozen Method | train_num=10, bs=10, epoch=100, lr=1e-4 | train_num=18, bs=18, epoch=100, lr=1e-4 | train_num=97, bs=64, epoch=100, lr=1e-4 | train_num=196, bs=64, epoch=100, lr=1e-4 |
---|---|---|---|---|
Non Frozen |
audio element.
|
audio element.
|
audio element.
|
audio element.
|
Freeze encoder |
audio element.
|
audio element.
|
audio element.
|
audio element.
|
Freeze encoder && duration_predictor |
audio element.
|
audio element.
|
audio element.
|
audio element.
|