Controllable Emphatic Speech Synthesis based on Forward Attention for Expressive Speech Synthesis

Liangqi Liu, Jiankun Hu, Zhiyong Wu, Song Yang, Songfan Yang, Jia Jia, Helen Meng

Synthesized speech samples

Idx Chinese Text with {Emphasi Label} Base Model Proposed Model Duration control Intonation and energy control
01

我想去{西单}买手机。

02

我想去西单买{手机}。

03

在{上海},有个小伙子,他每天准时坐着地铁上班。

04

在上海,有个{小伙子},他每天准时坐着地铁上班。

05

在上海,有个小伙子,他{每天}准时坐着地铁上班。

06

在上海,有个小伙子,他每天{准时}坐着地铁上班。

07

在上海,有个小伙子,他每天准时坐着{地铁}上班。

08

{孔子}是中国古代伟大的哲学家。

09

孔子是中国古代{伟大的}哲学家。

10

孔子是中国古代伟大的{哲学家}。

11

我会的,不过{游泳}对我来说是个挑战。

12

我会的,不过游泳对我来说是个{挑战}。

Control the strength of emphasis on duration (with yd) and on intonation and energy (with ya) seperately

index-01 / Chinese Text: "我想去{西单}买手机。" / English Translation: "I want to go to Xidan to buy mobile phone." / Emphasis Label: "Xidan".

index-02 / Chinese Text: "在上海,有个小伙子,他{每天}准时坐着地铁上班。" / English Translation: "In Shanghai, there is a young man, who takes subway to go to work every day." / Emphasis Label: "every day".

index ya=0,yd=0 ya=0,yd=0.5 ya=0,yd=1
01
02
index ya=0.5,yd=0 ya=0.5,yd=0.5 ya=0.5,yd=1
01
02
index ya=1,yd=0 ya=1,yd=0.5 ya=1,yd=1
01
02