Joint Multi-scale Cross-lingual Speaking Style Transfer with Bidirectional Attention Mechanism for Automatic Dubbing

Submitted to TASLP.

Download this project as a .zip file Download this project as a tar.gz file

To demonstrate that our proposed model can significantly transfer the cross-lingual speaking styles both in global and local from source speech to the synthesized speech, some samples are provided for comparison. Source Speech means the source speech in the original language, reconstructed by a vocoder. FastSpeech 2 means an open-source implementation of FastSpeech 2, with no speaking style transfer. Duration Tansfer means duration tansfer model, which predicts the duration of every word in the target speech. Joint Style Transfer means the proposed model, which predicts joint multi-scale cross-lingual speaking style in the target speech. In addition, a well-trained HIFI-GAN is used as the vocoder to generate waveform.

Transfer direction from English to Chinese

Target Chinese Text	Source English Text	Source Speech	FastSpeech 2	Duration Transfer	Joint Style Transfer (Proposed)
这就是当初将毁灭者封印起来的那台机器。重要的是，它还能再封印一次！但我们需要四把秘藏钥匙才能进去。你已经有三把了，对吧？	This was the machine that locked away the Destroyer the first time. And it can do it again, too! But we need four Vault Keys to get in. You already got three of them, right?
瀑布密道！太聪明了！我喜欢这招。	Secret waterfall tunnel! Brilliant! Love this gig.
有了这艘飞船，我们就能满世界追杀强盗啦！	With this ship, we can kill bandits all over the worlds!
空间可是越多越好，所以赶快买买买吧！	You’re gonna need more space, so buy it already!
现在就只缺宇航芯片啦。飞船必须依靠它来进行太空导航。你可以去达尔公司的遗迹里找找看，应该能找到一片。	Now we just need an astronav chip. It’s essential for space navigation and such. There’s one in some old Dahl wreckage you can plunder.
北面和南面都有敌人来袭，大量机甲已经越过高墙！	Contacts to the north and south! We’ve got mechs coming over the walls!

Transfer direction from Chinese to English

Target English Text	Source Chinese Text	Source Speech	FastSpeech 2	Duration Transfer	Joint Style Transfer (Proposed)
Save me, Vault Hunter!	快救我，秘藏猎人！
Show me what you’re made of and take out that Big Foot monster!	让我见识一下你的厉害，去干掉大脚怪！
There’s a Vault underneath the city? And no one ever found it?	这城下面居然有秘藏？而且从来没人发现过？
One step closer to your true potential!	离你的真正潜力又近一步啦！
I’ll tell you what I DON’T miss: angry saurians. At least we’re safe from them big ol’ bastards on this here ship.	告诉你什么是我不怀念的吧：愤怒的蜥蜴。在飞船上至少能躲开那些该死的臭虫。
Much appreciated, Beau! Keep them coming! We got a lot of hungry people back here.	太感谢你了，阿波！全都送过来吧！我这里还有很多人在挨饿呢。

Case Study

Transfer direction from English to Chinese

Model	Target Chinese Text	Audio	Mel-Spectrogram
Source Speech	With this ship, we can kill bandits all over the worlds!
vanilla	有了这艘飞船，我们就能满世界追杀强盗啦！
Duration Transfer	有了这艘飞船，我们就能满世界追杀强盗啦！
Joint Style Transfer (Proposed)	有了这艘飞船，我们就能满世界追杀强盗啦！

Transfer direction from Chinese to English

Model	Target Chinese Text	Audio	Mel-Spectrogram
Source Speech	有了这艘飞船，我们就能满世界追杀强盗啦！
vanilla	With this ship, we can kill bandits all over the worlds!
Duration Transfer	With this ship, we can kill bandits all over the worlds!
Joint Style Transfer (Proposed)	With this ship, we can kill bandits all over the worlds!