Transformer-S2A: Robust and Efficient Speech-to-Animation
Submitted to ICASSP 2022
Digital Domain creates the Digital Avatar and provides all Rendering Demos
The proposed model and baseline are trained on Mandarin dataset.
Upper: baseline (frame-level). Lower: proposed.
Transcription: 随着年轻人聚集,县城就可以不断滚动提升,最终实现真正的品质提升.
Transcription: 随着年轻人聚集,县城就可以不断滚动提升,最终实现真正的品质提升.
The proposed S2A model is only trained on Madarian dataset.
The compared baseline (LipSync3D) is trained on English dataset.
Left: proposed. Right: LipSync3D.
I can’t promise that I’ll be an expert at it and be able to help you get better with it, but at least we can have some fun.
Only trained on Mandarin talking dataset.
终于做了这个决定,别人怎么说我不理,只要你也一样的肯定. —— 《勇气》