• Publications

 

  • All
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • All
  • Journal
  • Conference
  • Selected
Liyang Chen, Tianxiang Ma, Jiawei Liu, Bingchuan Li, Zhuowei Chen, Lijie Liu, Xu He, Gen Li, Qian He, Zhiyong Wu. "Human-Centric Video Generation via Collaborative Multi-Modal Conditioning," [in] AAAI Conference on Artificial Intelligence (AAAI), vol. 40, no. 4, pp. 2939-2947. AAAI, Singapore, January 20-27, 2026.. (EI, CCF-A, THU-A)
Yuanyuan Wang, Dongchao Yang, Yiwen Shao, Hangting Chen, Jiankun Zhao, Zhiyong Wu, Helen Meng, Xixin Wu. "DualSpeechLM: Towards Unified Speech Understanding and Generation via Dual Speech Token Modeling with Large Language Models," [in] AAAI Conference on Artificial Intelligence (AAAI), vol. 40, no. 40, pp. 33728-33736. AAAI, Singapore, January 20-27, 2026.. (EI, CCF-A, THU-A)
Haiwei Xue, Xiangyang Luo, Zhanghao Hu, Xin Zhang, Xunzhi Xiang, Yuqin Dai, Jianzhuang Liu, Zhensong Zhang, Minglei Li, Jian Yang, Fei Ma, Zhiyong Wu, Changpeng Yang, Zonghong Dai, Fei Richard Yu. "Human Motion Video Generation: A Survey," IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 47, no. 11, pp. 10709-10730. IEEE, July 31, 2025.. (SCI, EI: 0253218942512, CCF-A, THU-A)
Jingbei Li, Weihao Wu, Yi Meng, Luwen Zhang, Qiao Tian, Yuping Wang, Yuxuan Wang, Xixin Wu, Zhiyong Wu, Helen Meng. "Inferring Speaking Styles for Conversational Speech Synthesis by Learning Contextual Dependencies," IEEE Transactions on Audio, Speech, and Language Processing (TASLP), vol. 33, pp. 3160-3173. IEEE, July 16, 2025.. (SCI, EI, CCF-B, THU-A)
Jinjiang Liu, Hao Li, Fei Chen, Zhiyong Wu, Xueliang Zhang. "Inplace Frequency Filtering and Cepstral Speech Modeling in Binaural Speech Enhancement," IEEE Transactions on Audio, Speech, and Language Processing (TASLP), vol. 33, pp. 2775-2787. IEEE, June 16, 2025.. (SCI, EI, CCF-B, THU-A)
Liyang Chen, Weihong Bao, Shun Lei, Boshi Tang, Zhiyong Wu, Shiyin Kang, Haozhi Huang, Helen Meng. "AdaMesh: Personalized Facial Expressions and Head Poses for Adaptive Speech-Driven 3D Facial Animation," IEEE Transactions on Multimedia (TMM), vol. 27, pp. 3598-3609. IEEE, February 13, 2025.. (SCI, EI: 20250817931440, CCF-A, THU-A)
Haiwei Xue, Yanbo Fan, Xuan Wang, Zhiyong Wu. "Echo: Enhancing Conversational Behavior Generation via Hierarchical Semantic Comprehension with Large Language Models," [in] SIGGRAPH Asia Conference Papers (SA), pp. 1-9. ACM, Hong Kong, China, December 15-18, 2025.. (EI, CCF-A)
Zhisheng Zhang, Derui Wang, Yifan Mi, Zhiyong Wu, JieGao, Yuxin Cao, Kai Ye, Jason Xue, Jie Hao. "E2E-VGuard: Adversarial Prevention for Production LLM-based End-To-End Speech Synthesis," [in] Annual Conference on Neural Information Processing Systems (NeurIPS), pp. XXXX-XXXX. MIT Press, San Diego, USA, December 2-7, 2025.. (EI, CCF-A, THU-A)
Shun Lei, Yaoxun Xu, Zhiwei Lin, Huaicheng Zhang, Wei Tan, Hangting Chen, Yixuan Zhang, Chenyu Yang, Haina Zhu, Shuai Wang, Zhiyong Wu, Dong Yu. "LeVo: High-Quality Song Generation with Multi-Preference Alignment," [in] Annual Conference on Neural Information Processing Systems (NeurIPS), pp. XXXX-XXXX. MIT Press, San Diego, USA, December 2-7, 2025.. (EI, CCF-A, THU-A)
Yaoxun Xu, Hangting Chen, Jianwei Yu, Wei Tan, Shun Lei, Zhiwei Lin, Rongzhi Gu, Zhiyong Wu. "MuCodec: Ultra Low-Bitrate Music Codec for Music Generation," [in] ACM International Conference on Multimedia (ACM MM), pp. 689-698. ACM, Dublin, Ireland, October 27-31, 2025.. (EI: 20255019681816, CCF-A, THU-A)
Songtao Zhou, Xiaoyu Qin, Yixuan Zhou, Qixin Wang, Zeyu Jin, Zixuan Wang, Zhiyong Wu, Jia Jia. "HarmoniVox: Painting Voices to Match the Avatar's Soul," [in] ACM International Conference on Multimedia (ACM MM), pp. 6720-6729. ACM, Dublin, Ireland, October 27-31, 2025.. (EI: 20255019681841, CCF-A, THU-A)
Haiwei Xue, Zhensong Zhang, Minglei Li, Zonghong Dai, Fei Yu, Fei Ma, Zhiyong Wu. "VideoHumanMIB: Unlocking Appearance Decoupling for Video Human Motion In-betweening," [in] International Joint Conference on Artificial Intelligence (IJCAI), pp. 4254-4262. Morgan Kaufmann, Montreal, Canada, August 16-22, 2025.. (EI: 20254719524923, CCF-B, THU-B)
Peng Liu, Dongyang Dai, Zhiyong Wu. "RFWave: Multi-band Rectified Flow for Audio Waveform Reconstruction," [in] International Conference on Learning Representations (ICLR), pp. 39921-39953. Singapore, April 24-28, 2025.. (EI: 20252818762417, CCF-A, THU-A)
Xu He, Zhiyong Wu, Xiaoyu Li, Di Kang, Chaopeng Zhang, Jiangnan Ye, Liyang Chen, Xiangjun Gao, Han Zhang, Haolin Zhuang. "MagicMan: Generative Novel View Synthesis of Humans with 3D-Aware Diffusion and Iterative Refinement," [in] AAAI Conference on Artificial Intelligence (AAAI), pp. 3437-3445. AAAI, Philadelphia, USA, February 25-March 4, 2025.. (EI: 20251818357154, CCF-A, THU-A)
Renjie Yu, Runrui Cai, Yixuan Zhou, Runchuan Ye, Zhiyong Wu. "A Dual-Branch Ensemble Framework for Personality Recognition Based on Multimodal Emotion Features," [in] International Workshop on Multimodal and Responsible Affective Computing (MRAC), pp. 51-57. Dublin, Ireland, October 31, 2025.. (EI, CCF-B)
Yinlong Zhang, Jinjiang Liu, Jiawei Jin, Jiuxin Lin, Zhiyong Wu. "CDSS: Innovating Cross Differential Attention for Robust Monaural Multi-Speaker Audio-Visual Speech Separation," [in] International Conference on Intelligent Computing (ICIC), pp. 1-18. Springer, Ningbo, China, July 26-29, 2025.. (EI, CCF-C)
Zijian Lin, Yang Zhang, Yougen Yuan, Yuming Yan, Jinjiang Liu, Zhiyong Wu, Pengfei Hu, Qun Yu. "Accelerating Autoregressive Speech Synthesis Inference With Speech Speculative Decoding," [in] Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 5533-5537. ISCA, Rotterdam, The Netherlands, August 17-21, 2025.. (EI: 20254419420178, CCF-B)
Jiawei Jin, Zhihan Yang, Yixuan Zhou, Zhiyong Wu. "In This Environment, As That Speaker: A Text-Driven Framework for Multi-Attribute Speech Conversion," [in] Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 1393-1397. ISCA, Rotterdam, The Netherlands, August 17-21, 2025.. (EI: 20254419419838, CCF-B)
Fengjin Li, Jie Wang, Yadong Niu, Yongqing Wang, Meng Meng, Jian Luan, Zhiyong Wu. "StarVC: A Unified Auto-Regressive Framework for Joint Text and Speech Generation in Voice Conversion," [in] Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 4593-4597. ISCA, Rotterdam, The Netherlands, August 17-21, 2025.. (EI: 20254419419786, CCF-B)
Wei Chen, Binzhu Sha, Dan Luo, Jing Yang, Zhuo Wang, Fan Fan, Zhiyong Wu. "DAFMSVC: One-Shot Singing Voice Conversion with Dual Attention Mechanism and Flow Matching," [in] Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 1263-1267. ISCA, Rotterdam, The Netherlands, August 17-21, 2025.. (EI: 20254419419812, CCF-B)
Xueyuan Chen, Dongchao Yang, Wenxuan Wu, Minglin Wu, Jing Xu, Xixin Wu, Zhiyong Wu, Helen Meng. "DiffDSR: Dysarthric Speech Reconstruction Using Latent Diffusion Model," [in] Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 2113-2117. ISCA, Rotterdam, The Netherlands, August 17-21, 2025.. (EI: 20254419419432, CCF-B)
Yaoxun Xu, Jianwei Yu, Hangting Chen, Zhiyong Wu, Xixin Wu, Dong Yu, Rongzhi Gu, Yi Luo. "WAKE: Watermarking Audio with Key Enrichment," [in] Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 5093-5097. ISCA, Rotterdam, The Netherlands, August 17-21, 2025.. (EI: 20254419420040, CCF-B)
Haiyun Li, Zhiyong Wu, Xiaofeng Xie, Jingran Xie, Yaoxun Xu, Hanyang Peng. "VoiceMark: Zero-Shot Voice Cloning-Resistant Watermarking Approach Leveraging Speaker-Specific Latents," [in] Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 5108-5112. ISCA, Rotterdam, The Netherlands, August 17-21, 2025.. (EI: 20254419420043, CCF-B)
Jingran Xie, Xiang Li, Hui Wang, Yue Yu, Yang Xiang, Xixin Wu, Zhiyong Wu. "Enhancing Generalization of Speech Large Language Models with Multi-Task Behavior Imitation and Speech-Text Interleaving," [in] Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 2430-2434. ISCA, Rotterdam, The Netherlands, August 17-21, 2025.. (EI: 20254419420499, CCF-B, Best Student Paper Finalist)
Dan Luo, Chengyuan Ma, Weiqin Li, Jun Wang, Wei Chen, Zhiyong Wu. "AutoStyle-TTS: Retrieval-Augmented Generation based Automatic Style Matching Text-to-Speech Synthesis," [in] IEEE International Conference on Multimedia and Expo (ICME), pp. 1-6. IEEE, Nantes, France, June 30-July 4, 2025.. (EI: 20254819583122, CCF-B)
Rui Niu, Weihao Wu, Jie Chen, Long Ma, Zhiyong Wu. "A Multi-Stage Framework for Multimodal Controllable Speech Synthesis," [in] IEEE International Conference on Multimedia and Expo (ICME), pp. 1-6. IEEE, Nantes, France, June 30-July 4, 2025.. (EI: 20254819583370, CCF-B)
Yuanyuan Wang, Hangting Chen, Dongchao Yang, Weiqin Li, Dan Luo, Guangzhi Li, Shan Yang, Zhiyong Wu, Helen Meng, Xixin Wu. "UniSep: Universal Target Audio Separation with Language Models at Scale," [in] IEEE International Conference on Multimedia and Expo (ICME), pp. 1-6. IEEE, Nantes, France, June 30-July 4, 2025.. (EI: 20254819583283, CCF-B)
Jingran Xie, Shun Lei, Yue Yu, Yang Xiang, Hui Wang, Xixin Wu, Zhiyong Wu. "Leveraging Chain of Thought towards Empathetic Spoken Dialogue without Corresponding Question-Answering Data," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1-5. IEEE, Hyderabad, India, April 6-11, 2025.. (EI: 20252718723593, CCF-B, THU-B)
Jie Gao, Haiyun Li, Zhisheng Zhang, Zhiyong Wu. "Black-Box Adversarial Defense Against Voice Conversion Using Latent Space Perturbation," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1-5. IEEE, Hyderabad, India, April 6-11, 2025.. (EI: 20252718723520, CCF-B, THU-B)
Weihao Wu, Zhiwei Lin, Yixuan Zhou, Jingbei Li, Rui Niu, Qinghua Wu, Songjun Cao, Long Ma, Zhiyong Wu. "DiffCSS: Diverse and Expressive Conversational Speech Synthesis with Diffusion Models," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1-5. IEEE, Hyderabad, India, April 6-11, 2025.. (EI: 20252718725633, CCF-B, THU-B)
Wei Chen, Binzhu Sha, Jing Yang, Zhuo Wang, Fan Fan, Zhiyong Wu. "Singing Voice Conversion with Accompaniment Using Self-Supervised Representation-Based Melody Features," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1-5. IEEE, Hyderabad, India, April 6-11, 2025.. (EI: 20251818342489, CCF-B, THU-B)
Haiwei Xue, Zhensong Zhang, Minglei Li, Zonghong Dai, Zhiyong Wu. "Identity-Preserving Audio-Driven Holistic Human Motion Video Generation," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1-5. IEEE, Hyderabad, India, April 6-11, 2025.. (EI: 20252818737665, CCF-B, THU-B)
Rui Niu, Jie Chen, Long Ma, Changhe Song, Weihao Wu, Zhiyong Wu. "Binary Representation Learning for Discriminative Acoustic Unit Discovery," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1-5. IEEE, Hyderabad, India, April 6-11, 2025.. (EI: 20252718725458, CCF-B, THU-B)
Zhiqi Huang, Dan Luo, Jun Wang, Huan Liao, Zhiheng Li, Zhiyong Wu. "Rhythmic Foley: A Framework For Seamless Audio-Visual Alignment In Video-to-Audio Synthesis," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1-5. IEEE, Hyderabad, India, April 6-11, 2025.. (EI: 20251818340869, CCF-B, THU-B)
Yuanyuan Wang, Hangting Chen, Dongchao Yang, Zhiyong Wu, Xixin Wu. "AudioComposer: Towards Fine-grained Audio Generation with Natural Language Descriptions," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1-5. IEEE, Hyderabad, India, April 6-11, 2025.. (EI: 20251818339733, CCF-B, THU-B)
Shun Lei, Yixuan Zhou, Boshi Tang, Max W. Y. Lam, Feng Liu, Hangyu Liu, Jingcheng Wu, Shiyin Kang, Zhiyong Wu, Helen Meng. "SongCreator: Lyrics-based Universal Song Generation," [in] Annual Conference on Neural Information Processing Systems (NeurIPS), pp. 1-34. MIT Press, Vancouver, Canada, December 10-15, 2024.. (EI:20240405449, CCF-A, THU-A)
Yixuan Zhou, Xiaoyu Qin, Zeyu Jin, Shuoyi Zhou, Shun Lei, Songtao Zhou, Zhiyong Wu, Jia Jia. "VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling," [in] ACM International Conference on Multimedia (ACM MM), pp. 554-563. ACM, Melbourne, Australia, October 28-November 1, 2024.. (EI:20244817417008, CCF-A, THU-A, Top 4% PaperTravel Grant)
Zeyu Jin, Jia Jia, Qixin Wang, Kehan Li, Shuoyi Zhou, Songtao Zhou, Xiaoyu Qin, Zhiyong Wu. "SpeechCraft: A Fine-Grained Expressive Speech Dataset with Natural Language Description," [in] ACM International Conference on Multimedia (ACM MM), pp. 1255-1264. ACM, Melbourne, Australia, October 28-November 1, 2024.. (SCI:INSPEC:25550569, EI:20244817417002, CCF-A, THU-A)
Xu He, Qiaochu Huang, Zhensong Zhang, Zhiwei Lin, Zhiyong Wu, Sicheng Yang, Minglei Li, Zhiyi Chen, Songcen Xu, Xiaofei Wu. "Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model," [in] IEEE/CVF Conference on Computer Vision and Pattern Recognition Conference (CVPR), pp. 2263-2273. IEEE/CVF, Seattle, USA, June 16-22, 2024.. (SCI:WOS:001322555902059, EI:20240166196, CCF-A, THU-A)
Yaoxun Xu, Hangting Chen, Jianwei Yu, Qiaochu Huang, Zhiyong Wu, Shixiong Zhang, Guangzhi Li, Yi Luo, Rongzhi Gu. "SECap: Speech Emotion Captioning with Large Language Model," [in] AAAI Conference on Artificial Intelligence (AAAI), pp. 19323-19331. AAAI, Vancouver, Canada, February 20-27, 2024.. (EI:20241515874366, CCF-A, THU-A)
Zilin Wang, Haolin Zhuang, Lu Li, Yinmin Zhang, Junjie Zhong, Jun Chen, Yu Yang, Boshi Tang, Zhiyong Wu. "Explore 3D Dance Generation via Reward Model from Automatically-Ranked Demonstrations," [in] AAAI Conference on Artificial Intelligence (AAAI), pp. 301-309. AAAI, Vancouver, Canada, February 20-27, 2024.. (EI:20241515854020, CCF-A, THU-A)
Boshi Tang, Zhiyong Wu, Xixin Wu, Qiaochu Huang, Jun Chen, Shun Lei, Helen Meng. "SimCalib: Graph Neural Network Calibration Based on Similarity between Nodes," [in] AAAI Conference on Artificial Intelligence (AAAI), pp. 15267-15275. AAAI, Vancouver, Canada, February 20-27, 2024.. (EI:20241515875846, CCF-A, THU-A)
Yunrui Cai, Runchuan Ye, Jingran Xie, Yixuan Zhou, Yaoxun Xu, Zhiyong Wu. "Robust Representation Learning for Multimodal Emotion Recognition with Contrastive Learning and Mixup," [in] International Workshop on Multimodal and Responsible Affective Computing (MRAC), pp. 93-97. Melbourne, Australia, November 1, 2024.. (CCF-B)
Yaoxun Xu, Yixuan Zhou, Yunrui Cai, Jingran Xie, Runchuan Ye, Zhiyong Wu. "Multimodal Emotion Captioning Using Large Language Model with Prompt Engineering," [in] International Workshop on Multimodal and Responsible Affective Computing (MRAC), pp. 104-109. Melbourne, Australia, November 1, 2024.. (CCF-B)
Jingran Xie, Yang Xiang, Hui Wang, Xixin Wu, Zhiyong Wu, Helen Meng. "ERVQ: Leverage Residual Vector Quantization for Speech Emotion Recognition," [in] International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 456-460. Beijing, China, November 7-10, 2024..
Jingran Xie, Changhe Song, Yang Xiang, Hui Wang, Xixin Wu, Zhiyong Wu, Helen Meng. "CMAST: Efficient Speech-Text Joint Training Method to Enhance Linguistic Features Learning of Speech Representations," [in] International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 656-660. Beijing, China, November 7-10, 2024..
Shuoyi Zhou, Yixuan Zhou, Weiqing Li, Jun Chen, Runchuan Ye, Weihao Wu, Zijian Lin, Shun Lei, Zhiyong Wu. "The Codec Language Model-Based Zero-Shot Spontaneous Style TTS System for CoVoC Challenge 2024," [in] International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 496-500. Beijing, China, November 7-10, 2024..
Rui Niu, Changhe Song, Zhiyong Wu. "NLPP: A Natural Language Prosodic Prominence Dataset Assisted by ChatGPT," [in] International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 441-445. Beijing, China, November 7-10, 2024..
Wei Chen, Xintao Zhao, Jun Chen, Binzhu Sha, Zhiwei Lin, Zhiyong Wu. "RobustSVC: HuBERT-Based Melody Extractor and Adversarial Learning for Robust Singing Voice Conversion," [in] International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 164-168. Beijing, China, November 7-10, 2024..
Zhihan Yang, Chunfeng Wang, Zhiyong Wu, Jia Jia. "Inferring Agent Speaking Styles for Auditory-Visual User-Agent Conversation," [in] International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 421-425. Beijing, China, November 7-10, 2024..
Yaoxun Xu, Shixiong Zhang, Jianwei Yu, Zhiyong Wu, Dong Yu. "Comparing Discrete and Continuous Space LLMs for Speech Recognition," [in] Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 2509-2513. ISCA, Kos, Greece, September 1-5, 2024.. (EI:20240390229, CCF-B)
Weiqin Li, Peiji Yang, Yicheng Zhong, Yixuan Zhou, Zhisheng Wang, Zhiyong Wu, Xixin Wu, Helen Meng. "Spontaneous Style Text-to-Speech Synthesis with Controllable Spontaneous Behaviors Based on Language Models," [in] Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 1785-1789. ISCA, Kos, Greece, September 1-5, 2024.. (EI:20240315609, CCF-B)
Shuochen Gao, Shun Lei, Fan Zhuo, Hangyu Liu, Feng Liu, Boshi Tang, Qiaochu Huang, Shiyin Kang, Zhiyong Wu. "An End-to-end Approach for Chord-Conditioned Song Generation," [in] Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 1890-1894. ISCA, Kos, Greece, September 1-5, 2024.. (EI:20240417880, CCF-B)
Yunrui Cai, Zhiyong Wu, Jia Jia, Helen Meng. "LoRA-MER: Low-Rank Adaptation of Pre-Trained Speech Models for Multimodal Emotion Recognition Using Mutual Information," [in] Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 4658-4662. ISCA, Kos, Greece, September 1-5, 2024.. (CCF-B)
Xueyuan Chen, Dongchao Yang, Dingdong Wang, Xixin Wu, Zhiyong Wu, Helen Meng. "CoLM-DSR: Leveraging Neural Codec Language Modeling for Multi-Modal Dysarthric Speech Reconstruction," [in] Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 4129-4133. ISCA, Kos, Greece, September 1-5, 2024.. (EI:20240265774, CCF-B)
Hang Su, Yuxiang Kong, Lichun Fan, Peng Gao, Yujun Wang, Zhiyong Wu. "Speaker Change Detection with Weighted-sum Knowledge Distillation based on Self-supervised Pre-trained Models," [in] Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 1655-1659. ISCA, Kos, Greece, September 1-5, 2024.. (CCF-B)
Yaoxun Xu, Xingchen Song, Zhiyong Wu, Di Wu, Zhendong Peng, Binbin Zhang. "Hydraformer: One Encoder for All Subsampling Rates," [in] IEEE International Conference on Multimedia and Expo (ICME), pp. 1-6. IEEE, Niagara Falls, Canada, July 15-19, 2024.. (CCF-B)
Tianjiao Du, Jun Chen, Jiasheng Lu, Qinmei Xu, Huan Liao, Yupeng Chen, Zhiyong Wu. "Controllable Text-to-Audio Generation with Training-Free Temporal Guidance Diffusion," [in] IEEE International Conference on Multimedia and Expo (ICME), pp. 1-6. IEEE, Niagara Falls, Canada, July 15-19, 2024.. (CCF-B)
Rui Niu, Zhiyong Wu, Changhe Song. "Representation Space Maintenance: Against Forgetting in Continual Learning," [in] IEEE International Joint Conference on Neural Networks (IJCNN), pp. 1-7. IEEE, Yokohama, Japan, June 30-July 5, 2024.. (CCF-C, THU-B)
Ming Cheng, Shun Lei, Dongyang Dai, Zhiyong Wu, Dading Chong. "NRAdapt: Noise-Robust Adaptive Text to Speech Using Untranscribed Data," [in] IEEE International Joint Conference on Neural Networks (IJCNN), pp. 1-8. IEEE, Yokohama, Japan, June 30-July 5, 2024.. (CCF-C, THU-B)
Yixuan Zhou, Shuoyi Zhou, Shun Lei, Zhiyong Wu, Menglin Wu. "The THU-HCSI Multi-Speaker Multi-Lingual Few-Shot Voice Cloning System for LIMMITS'24 Challenge," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 71-72. IEEE, Seoul, Korea, April 14-19, 2024.. (CCF-B, THU-B, 1st Place in Speaker Similarity)
Shun Lei, Yixuan Zhou, Liyang Chen, Dan Luo, Zhiyong Wu, Xixin Wu, Shiyin Kang, Tao Jiang, Yahui Zhou, Yuxing Han, Helen Meng. "Improving Language Model-based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 12662-12666. IEEE, Seoul, Korea, April 14-19, 2024.. (EI:20242416240666, CCF-B, THU-B)
Xingda Li, Fan Zhuo, Dan Luo, Jun Chen, Shiyin Kang, Zhiyong Wu, Tao Jiang, Yang Li, Han Fang, Yahui Zhou. "Generating Stereophonic Music with Single-Stage Language Models," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1471-1475. IEEE, Seoul, Korea, April 14-19, 2024.. (CCF-B, THU-B)
Zhiwei Lin, Jun Chen, Boshi Tang, Binzhu Sha, Jing Yang, Yaolong Ju, Fan Fan, Shiyin Kang, Zhiyong Wu, Helen Meng. "Multi-View MidiVAE: Fusing Track- and Bar-View Representations for Long Multi-Track Symbolic Music Generation," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 941-945. IEEE, Seoul, Korea, April 14-19, 2024.. (EI:20240038542, CCF-B, THU-B)
Weinan Tong, Jiaxu Zhu, Jun Chen, Shiyin Kang, Tao Jiang, Yang Li, Zhiyong Wu, Helen Meng. "SCNet: Sparse Compression Network for Music Source Separation," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1276-1280. IEEE, Seoul, Korea, April 14-19, 2024.. (CCF-B, THU-B)
Yuanyuan Wang, Hangting Chen, Dongchao Yang, Jianwei Yu, Chao Weng, Zhiyong Wu, Helen Meng. "Consistent and Relevant: Rethink the Query Embedding in General Sound Separation," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 961-965. IEEE, Seoul, Korea, April 14-19, 2024.. (CCF-B, THU-B)
Binzhu Sha, Xu Li, Zhiyong Wu, Ying Shan, Helen Meng. "Neural Concatenative Singing Voice Conversion: Rethinking Concatenation-Based Approach for One-Shot Singing Voice Conversion," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 12577-12581. IEEE, Seoul, Korea, April 14-19, 2024.. (EI:20230450875, CCF-B, THU-B)
Hui Lu, Xixin Wu, Haohan Guo, Songxiang Liu, Zhiyong Wu, Helen Meng. "Unifying One-Shot Voice Conversion and Cloning with Disentangled Speech Representations," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 11141-11145. IEEE, Seoul, Korea, April 14-19, 2024.. (CCF-B, THU-B)
Xueyuan Chen, Xi Wang, Shaofei Zhang, Lei He, Zhiyong Wu, Xixin Wu, Helen Meng. "StyleSpeech: Self-Supervised Style Enhancing with VQ-VAE-Based Pre-Training for Expressive Audiobook Speech Synthesis," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 12316-12320. IEEE, Seoul, Korea, April 14-19, 2024.. (EI:20240002562, CCF-B, THU-B)
Xueyuan Chen, Yuejiao Wang, Xixin Wu, Disong Wang, Zhiyong Wu, Xunying Liu, Helen Meng. "Exploiting Audio-Visual Features with Pretrained AV-HuBERT for Multi-Modal Dysarthric Speech Reconstruction," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 12341-12345. IEEE, Seoul, Korea, April 14-19, 2024.. (CCF-B, THU-B)
Qiaochu Huang, Xu He, Boshi Tang, Haolin Zhuang, Liyang Chen, Shuochen Gao, Zhiyong Wu, Haozhi Huang, Helen Meng. "Enhancing Expressiveness in Dance Generation Via Integrating Frequency and Music Style Information," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8185-8189. IEEE, Seoul, Korea, April 14-19, 2024.. (EI:20242416239330, CCF-B, THU-B)
Sicheng Yang, Zunnan Xu, Haiwei Xue, Yongkang Cheng, Shaoli Huang, Mingming Gong, Zhiyong Wu. "FreeTalker: Controllable Speech and Text-Driven Gesture Generation Based on Diffusion Models for Enhanced Speaker Naturalness," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7945-7949. IEEE, Seoul, Korea, April 14-19, 2024.. (EI:20242416241075, CCF-B, THU-B)
Haiwei Xue, Sicheng Yang, Zhensong Zhang, Zhiyong Wu, Minglei Li, Zonghong Dai, Helen Meng. "Conversational Co-Speech Gesture Generation via Modeling Dialog Intention, Emotion, and Context with Diffusion Models," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8296-8300. IEEE, Seoul, Korea, April 14-19, 2024.. (EI:20240010380, CCF-B, THU-B)
Jingbei Li, Sipan Li, Ping Chen, Luwen Zhang, Yi Meng, Zhiyong Wu, Helen Meng, Qiao Tian, Yuping Wang, Yuxuan Wang. "Joint Multiscale Cross-Lingual Speaking Style Transfer with Bidirectional Attention Mechanism for Automatic Dubbing," IEEE Transactions on Audio, Speech, and Language Processing (TASLP), vol. 32, pp. 517-528. IEEE, November 10, 2023.. (SCI:, EI:20230175143, CCF-B, THU-A)
Xixin Wu, Hui Lu, Kun Li, Zhiyong Wu, Xunying Liu, Helen Meng. "Hiformer: Sequence Modeling Networks with Hierarchical Attention Mechanisms," IEEE Transactions on Audio, Speech, and Language Processing (TASLP), vol. 31, pp. 3993-4003. IEEE, September 8, 2023.. (SCI:INSPEC:23688081, EI:20233814764513, CCF-B, THU-A)
Shun Lei, Yixuan Zhou, Liyang Chen, Zhiyong Wu, Xixin Wu, Shiyin Kang, Helen Meng. "MSStyleTTS: Multi-scale Style Modeling with Hierarchical Context Information for Expressive Speech Synthesis," IEEE Transactions on Audio, Speech, and Language Processing (TASLP), vol. 31, pp. 3290-3303. IEEE, August 2, 2023.. (SCI:, EI:20230281293, CCF-B, THU-A)
Hui Lu, Xixin Wu, Zhiyong Wu, Helen Meng. "SpeechTripleNet: End-to-end Disentangled Speech Representation Learning for Content, Timbre and Prosody," [in] ACM International Conference on Multimedia (ACM MM), pp. 2829-2837. ACM, Ottawa, Canada, October 29-November 3, 2023.. (EI:20235015224410, CCF-A, THU-A)
Sicheng Yang, Zilin Wang, Zhiyong Wu, Minglei Li, Zhensong Zhang, Qiaochu Huang, Lei Hao, Songcen Xu, Xiaofei Wu, Changpeng Yang, Zonghong Dai. "UnifiedGesture: A Unified Gesture Synthesis Model for Multiple Skeletons," [in] ACM International Conference on Multimedia (ACM MM), pp. 1033-1044. ACM, Ottawa, Canada, October 29-November 3, 2023.. (EI:20230332184, CCF-A, THU-A, 前2.5%)
Sicheng Yang, Zhiyong Wu, Minglei Li, Zhensong Zhang, Lei Hao, Weihong Bao, Ming Cheng, Long Xiao. "DiffuseStyleGesture: Stylized Audio-Driven Co-Speech Gesture Generation with Diffusion Models," [in] International Joint Conference on Artificial Intelligence (IJCAI), pp. 5860-5868. Morgan Kaufmann, Macao, China, August 19-25, 2023.. (EI:20233714713734, CCF-B, THU-B)
Sicheng Yang, Zhiyong Wu, Minglei Li, Zhensong Zhang, Lei Hao, Weihong Bao, Haolin Zhuang. "QPGesture: Quantization-Based and Phase-Guided Motion Matching for Natural Speech-Driven Gesture Generation," [in] IEEE/CVF Conference on Computer Vision and Pattern Recognition Conference (CVPR), pp. 2321-2330. IEEE/CVF, Vancouver, Canada, June 18-22, 2023.. (EI:20230186667, CCF-A, THU-A, Highlight前2.5%)
Zhihan Yang, Zhiyong Wu, Ying Shan, Jia Jia. "What Does Your Face Sound Like? 3D Face Shape Towards Voice," [in] AAAI Conference on Artificial Intelligence (AAAI), pp. 13905-13913. AAAI, Washington DC, USA, February 7-14, 2023.. (EI:20233414581264, CCF-A, THU-A)
Sicheng Yang, Haiwei Xue, Zhensong Zhang, Minglei Li, Zhiyong Wu, Xiaofei Wu, Songcen Xu, Zonghong Dai. "The DiffuseStyleGesture+ entry to the GENEA Challenge 2023," [in] ACM International Conference on Multimodal Interaction (ICMI), pp. 779-785. ACM, Paris, France, October 9-13, 2023.. (EI:20230317714, CCF-C, THU-B, Reproducibility Award)
Liyang Chen, Zhiyong Wu, Runnan Li, Weihong Bao, Jun Ling, Xu Tan, Sheng Zhao. "VAST: Vivify Your Talking Avatar via Zero-Shot Expressive Facial Style Transfer," [in] IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, pp. 2977-2987. Paris, France, October 2-6, 2023.. (EI:20230292957, CCF-B)
Yunrui Cai, Jingran Xie, Boshi Tang, Yuanyuan Wang, Jun Chen, Haiwei Xue, Zhiyong Wu. "First-order Multi-label Learning with Cross-modal Interactions for Multimodal Emotion Recognition," [in] International Workshop on Multimodal and Responsible Affective Computing (MRAC), pp. 13-20. Ottawa, Canada, October 29, 2023.. (CCF-B)
Yunrui Cai, Changhe Song, Boshi Tang, Dongyang Dai, Zhiyong Wu, Helen Meng. "Robust Representation Learning for Speech Emotion Recognition with Moment Exchange," [in] Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 1002-1007. APSIPA, Taipei, China, October 31-November 3, 2023.. (EI:20235115257009)
Xianhao Wei, Jia Jia, Xiang Li, Zhiyong Wu, Ziyi Wang. "A Discourse-level Multi-scale Prosodic Model for Fine-grained Emotion Analysis," [in] China Multimedia (ChinaMM), pp. 1-9. Kunming, China, August 2-4, 2023.. (EI:20230345194, Best Paper)
Xiang Li, Songxiang Liu, Max W. Y. Lam, Zhiyong Wu, Chao Weng, Helen Meng. "Diverse and Expressive Speech Prosody Prediction with Denoising Diffusion Probabilistic Model," [in] Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 4858-4862. ISCA, Dublin, Ireland, August 20-24, 2023.. (EI:20230201740, CCF-B, Best Student Paper)
Weiqin Li, Shun Lei, Qiaochu Huang, Yixuan Zhou, Zhiyong Wu, Shiyin Kang, Helen Meng. "Towards Spontaneous Style Modeling with Semi-supervised Pre-training for Conversational Text-to-Speech Synthesis," [in] Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 3377-3381. ISCA, Dublin, Ireland, August 20-24, 2023.. (EI:20230331605, CCF-B)
Zhihan Yang, Shansong Liu, Xu Li, Haozhe Wu, Zhiyong Wu, Ying Shan, Jia Jia. "Prosody Modeling with 3D Visual Information for Expressive Video Dubbing," [in] Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 4863-4867. ISCA, Dublin, Ireland, August 20-24, 2023.. (EI:20233814760588, CCF-B)
Jiuxin Lin, Peng Wang, Heinrich Dinkel, Jun Chen, Zhiyong Wu, Yongqing Wang, Zhiyong Yan, Junbo Zhang, Yujun Wang. "Focus on the Sound around You: Monaural Target Speaker Extraction via Distance and Speaker Information," [in] Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 2488-2492. ISCA, Dublin, Ireland, August 20-24, 2023.. (EI:20230232439, CCF-B)
Jun Chen, Wei Rao, Zilin Wang, Jiuxin Lin, Yukai Ju, Shulin He, Yannan Wang, Zhiyong Wu. "MC-SpEx: Towards Effective Speaker Extraction with Multi-Scale Interfusion and Conditional Speaker Modulation," [in] Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 4034-4038. ISCA, Dublin, Ireland, August 20-24, 2023.. (EI:20230233757, CCF-B)
Xingchen Song, Di Wu, Binbin Zhang, Zhendong Peng, Bo Dang, Fuping Pan, Zhiyong Wu. "ZeroPrompt: Streaming Acoustic Encoders are Zero-Shot Masked LMs," [in] Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 1648-1652. ISCA, Dublin, Ireland, August 20-24, 2023.. (EI:20230191878, CCF-B)
Jiaxu Zhu, Changhe Song, Zhiyong Wu, Helen Meng. "SememeASR: Boosting Performance of End-to-end Speech Recognition against Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge," [in] Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 3272-3276. ISCA, Dublin, Ireland, August 20-24, 2023.. (EI:20230330956, CCF-B)
Jiaxu Zhu, Weinan Tong, Yaoxun Xu, Changhe Song, Zhiyong Wu, Zhao You, Dan Su, Dong Yu, Helen Meng. "Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation," [in] Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 1334-1338. ISCA, Dublin, Ireland, August 20-24, 2023.. (EI:20230333401, CCF-B)
Wenzhe Liu, Yupeng Shi, Jun Chen, Wei Rao, Shulin He, Andong Li, Yannan Wang, Zhiyong Wu. "Gesper: A Restoration-Enhancement Framework for General Speech Reconstruction," [in] Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 4044-4048. ISCA, Dublin, Ireland, August 20-24, 2023.. (EI:20230216014, CCF-B)
Sipan Li, Songxiang Liu, Luwen Zhang, Xiang Li, Yanyao Bian, Chao Weng, Zhiyong Wu, Helen Meng. "SnakeGAN: A Universal Vocoder Leveraging DDSP Prior Knowledge and Periodic Inductive Bias," [in] IEEE International Conference on Multimedia and Expo (ICME), pp. 1703-1708. IEEE, Brisbane, Australia, July 10-14, 2023.. (EI:20230340577, CCF-B)
Xintao Zhao, Shuai Wang, Yang Chao, Zhiyong Wu, Helen Meng. "Adversarial Speaker Disentanglement Using Unannotated External Data for Self-supervised Representation-based Voice Conversion," [in] IEEE International Conference on Multimedia and Expo (ICME), pp. 1691-1696. IEEE, Brisbane, Australia, July 10-14, 2023.. (EI:20230198155, CCF-B)
Shun Lei, Yixuan Zhou, Liyang Chen, Zhiyong Wu, Shiyin Kang, Helen Meng. "Context-Aware Coherent Speaking Style Prediction with Hierarchical Transformers for Audiobook Speech Synthesis," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1-5. IEEE, Rhodes Island, Greece, June 4-10, 2023.. (EI:20230134346, CCF-B, THU-B, Top 3% Paper)
Jie Chen, Xingchen Song, Zhendong Peng, Binbin Zhang, Fuping Pan, Zhiyong Wu. "LightGrad: Lightweight Diffusion Probabilistic Model for Text-to-Speech," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1-5. IEEE, Rhodes Island, Greece, June 4-10, 2023.. (EI:20230340208, CCF-B, THU-B)
Zilin Wang, Peng Liu, Jun Chen, Sipan Li, Jinfeng Bai, Gang He, Zhiyong Wu, Helen Meng. "A Synthetic Corpus Generation Method for Neural Vocoder Training," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1-5. IEEE, Rhodes Island, Greece, June 4-10, 2023.. (EI:20234715106132, CCF-B, THU-B)
Shaohuan Zhou, Xu Li, Zhiyong Wu, Ying Shan, Helen Meng. "Enhancing the Vocal Range of Single-Speaker Singing Voice Synthesis with Melody-Unsupervised Pre-Training," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1-5. IEEE, Rhodes Island, Greece, June 4-10, 2023.. (EI:20230330979, CCF-B, THU-B)
Weihong Bao, Liyang Chen, Chaoyong Zhou, Sicheng Yang, Zhiyong Wu. "WavSyncSwap: End-to-End Portrait-Customized Audio-Driven Talking Face Generation," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1-5. IEEE, Rhodes Island, Greece, June 4-10, 2023.. (EI:20234715105848, CCF-B, THU-B)
Haolin Zhuang, Shun Lei, Long Xiao, Weiqin Li, Liyang Chen, Sicheng Yang, Zhiyong Wu, Shiyin Kang, Helen Meng. "GTN-Bailando: Genre Consistent Long-Term 3D Dance Generation based on Pre-Trained Genre Token Network," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1-5. IEEE, Rhodes Island, Greece, June 4-10, 2023.. (EI:20230155848, CCF-B, THU-B)
Xingchen Song, Di Wu, Zhiyong Wu, Binbin Zhang, Yuekai Zhang, Zhendong Peng, Wenpeng Li, Fuping Pan, Changbao Zhu. "TrimTail: Low-Latency Streaming ASR with Simple but Effective Spectrogram-Level Length Penalty," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1-5. IEEE, Rhodes Island, Greece, June 4-10, 2023.. (EI:20220411197, CCF-B, THU-B)
Yaoxun Xu, Baiji Liu, Qiaochu Huang, Zhiyong Wu, Shiyin Kang, Helen Meng. "CB-Conformer: Contextual Biasing Conformer for Biased Word Recognition," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1-5. IEEE, Rhodes Island, Greece, June 4-10, 2023.. (EI:20230144602, CCF-B, THU-B)
Yujie Yang, Kun Zhang, Zhiyong Wu, Helen Meng. "Keyword-Specific Acoustic Model Pruning for Open Vocabulary Keyword Spotting," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1-5. IEEE, Rhodes Island, Greece, June 4-10, 2023.. (CCF-B, THU-B)
Yuanyuan Wang, Yang Zhang, Zhiyong Wu, Zhihan Yang, Tao Wei, Kun Zou, Helen Meng. "DASA: Difficulty-Aware Semantic Augmentation for Speaker Verification," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1-5. IEEE, Rhodes Island, Greece, June 4-10, 2023.. (CCF-B, THU-B)
Jun Chen, Wei Rao, Zilin Wang, Jiuxin Lin, Zhiyong Wu, Yannan Wang, Shidong Shang, Helen Meng. "Inter-SubNet: Speech Enhancement with Subband Interaction," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1-5. IEEE, Rhodes Island, Greece, June 4-10, 2023.. (EI:20230179621, CCF-B, THU-B)
Jun Chen, Yupeng Shi, Wenzhe Liu, Wei Rao, Shulin He, Andong Li, Yannan Wang, Zhiyong Wu, Shidong Shang, Chengshi Zheng. "Gesper: A Unified Framework for General Speech Restoration," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1-5. IEEE, Rhodes Island, Greece, June 4-10, 2023.. (CCF-B, THU-B)
Jiuxin Lin, Xinyu Cai, Heinrich Dinkel, Jun Chen, Zhiyong Yan, Yongqing Wang, Junbo Zhang, Yujun Wang, Zhiyong Wu, Helen Meng. "AV-SepFormer: Cross-Attention SepFormer for Audio-Visual Target Speaker Extraction," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1-5. IEEE, Rhodes Island, Greece, June 4-10, 2023.. (EI:20230235771, CCF-B, THU-B)
Weinan Tong, Jiaxu Zhu, Jun Chen, Zhiyong Wu, Shiyin Kang, Helen Meng. "TFCNet: Time-Frequency Domain Corrector for Speech Separation," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1-5. IEEE, Rhodes Island, Greece, June 4-10, 2023.. (CCF-B, THU-B)
Xiaojun Meng, Wenlin Dai, Yasheng Wang, Baojun Wang, Zhiyong Wu, Xin Jiang, Qun Liu. "Lexicon-Injected Semantic Parsing for Task-Oriented Dialog," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1-5. IEEE, Rhodes Island, Greece, June 4-10, 2023.. (EI:20220445378, CCF-B, THU-B)
Hui Lu, Disong Wang, Xixin Wu, Zhiyong Wu, Xunying Liu, Helen Meng. "Disentangled Speech Representation Learning for One-Shot Cross-Lingual Voice Conversion Using β-VAE," [in] IEEE Spoken Language Technology Workshop (SLT), pp. 814-821. IEEE, Doha, Qatar, January 9-12, 2023.. (CCF-C)
Haibin Wu, Xu Li, Andy T Liu, Zhiyong Wu, Helen Meng, Hung-Yi Lee. "Improving the Adversarial Robustness for Speaker Verification by Self-supervised Learning," IEEE Transactions on Audio, Speech, and Language Processing (TASLP), vol. 30, pp. 202-217. IEEE, January 8, 2022.. (SCI: WOS:000742179300004, EI: 20215111368713, CCF-B, THU-A)
Jingbei Li, Yi Meng, Xixin Wu, Zhiyong Wu, Jia Jia, Helen Meng, Qiao Tian, Yuping Wang, Yuxuan Wang. "Inferring Speaking Styles from Multi-modal Conversational Context by Multi-scale Relational Graph Convolutional Networks," [in] ACM International Conference on Multimedia (ACM MM), pp. 5811-5820. ACM, Lisboa, Portugal, October 10-14, 2022.. (CCF-A, THU-A)
Xueyuan Chen, Shun Lei, Zhiyong Wu, Dong Xu, Weifeng Zhao, Helen Meng. "Unsupervised Multi-scale Expressive Speaking Style Modeling with Hierarchical Context Information for Audiobook Speech Synthesis," [in] International Conference on Computational Linguistics (COLING), pp. 7193-7202. Gyeongju, Korea, October 12-17, 2022.. (EI:20233014452441, CCF-B, THU-B)
Sipan Li, Luwen Zhang, Chenyu Dong, Haiwei Xue, Zhiyong Wu, Lifa Sun, Kun Li, Helen Meng. "FastFoley: Non-Autoregressive Foley Sound Generation based on Visual Semantics," [in] National Conference on Man-Machine Speech Communication (NCMMSC), pp. 252-263. Hefei, China, December 15-18, 2022.. (EI:20232414230161, CCF-C)
Xueyuan Chen, Qiaochu Huang, Xixin Wu, Zhiyong Wu, Helen Meng. "HILvoice: Human-in-the-Loop Style Selection for Elder-Facing Speech Synthesis," [in] International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 86-90. Singapore, December 11-14, 2022.. (EI:20230913638722)
Chenyi Li, Zhiyong Wu, Wei Rao, Yannan Wang, Helen Meng. "Boosting the Performance of SpEx+ by Attention and Contextual Mechanism," [in] International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 135-139. Singapore, December 11-14, 2022.. (EI:20230913638759)
Chenyi Li, Yi Li, Xuhao Du, Yaolong Ju, Shichao Hu, Zhiyong Wu. "VocEmb4SVS: Improving Singing Voice Separation with Vocal Embeddings," [in] Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 234-239. APSIPA, Chiang Mai, Thailand, November 7-10, 2022.. (EI: 20230313400673)
Sicheng Yang, Zhiyong Wu, Minglei Li, Mengchen Zhao, Jiuxin Lin, Liyang Chen, Weihong Bao. "The ReprGesture entry to the GENEA Challenge 2022," [in] ACM International Conference on Multimodal Interaction (ICMI), pp. 758-763. ACM, Bengaluru, India, November 7-11, 2022.. (EI: 20224813192683, CCF-C, THU-B)
Yulan Chen, Zhiyong Wu, Zheyan Shen, Jia Jia. "Learning from Designers: Fashion Compatibility Analysis Via Dataset Distillation," [in] IEEE International Conference on Image Processing (ICIP), pp. 856-860. IEEE, Bordeaux, France, October 16-19, 2022.. (EI: 20230413450788, CCF-C, THU-B)
Jie Chen, Changhe Song, Deyi Tuo, Xixin Wu, Shiyin Kang, Zhiyong Wu, Helen Meng. "Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information," [in] Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 426-430. ISCA, Incheon, Korea, September 18-22, 2022.. (EI: 20224312992984, CCF-B)
Jun Chen, Wei Rao, Zilin Wang, Zhiyong Wu, Yannan Wang, Tao Yu, Shidong Shang, Helen Meng. "Speech Enhancement with Fullband-Subband Cross-Attention Network," [in] Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 976-980. ISCA, Incheon, Korea, September 18-22, 2022.. (EI: 20224312992993, CCF-B)
Shaohuan Zhou, Shun Lei, Weiya You, Deyi Tuo,Yuren You, Zhiyong Wu, Shiyin Kang, Helen Meng. "Towards Improving the Expressiveness of Singing Voice Synthesis with BERT Derived Semantic Information," [in] Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 4292-4296. ISCA, Incheon, Korea, September 18-22, 2022.. (EI: 20224312992850, CCF-B)
Shun Lei, Yixuan Zhou, Liyang Chen, Jiankun Hu, Zhiyong Wu, Shiyin Kang, Helen Meng. "Towards Multi-Scale Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis," [in] Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 5523-5527. ISCA, Incheon, Korea, September 18-22, 2022.. (EI: 20224312993229, CCF-B)
Sicheng Yang, Methawee Tantrawenith, Haolin Zhuang, Zhiyong Wu, Aolan Sun, Jianzong Wang, Jianzong Wang, Huaizhen Tang, Xintao Zhao, Jie Wang, Helen Meng. "Speech Representation Disentanglement with Adversarial Mutual Information Learning for One-shot Voice Conversion," [in] Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 2553-2557. ISCA, Incheon, Korea, September 18-22, 2022.. (EI: 20224312993920, CCF-B)
Xiang Li, Changhe Song, Xianhao Wei, Zhiyong Wu, Jia Jia, Helen Meng. "Towards Cross-speaker Reading Style Transfer on Audiobook Dataset," [in] Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 5528-5532. ISCA, Incheon, Korea, September 18-22, 2022.. (EI: 20224312993241, CCF-B)
Yang Zhang, Zhiqiang Lv, Haibin Wu, Shanshan Zhang, Pengfei Hu, Zhiyong Wu, Hung-Yi Lee, Helen Meng. "MFA-Conformer: Multi-scale Feature Aggregation Conformer for Automatic Speaker Verification," [in] Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 306-310. ISCA, Incheon, Korea, September 18-22, 2022.. (EI: 20224312993833, CCF-B)
Yi Meng, Xiang Li, Zhiyong Wu, Tingtian Li, Zixun Sun, Xinyu Xiao, Chi Sun, Hui Zhan, Helen Meng. "CALM: Contrastive Cross-modal Speaking Style Modeling for Expressive Text-to-Speech Synthesis," [in] Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 5533-5537. ISCA, Incheon, Korea, September 18-22, 2022.. (EI: 20224312993452, CCF-B)
Yixuan Zhou, Changhe Song, Xiang Li, Luwen Zhang, Zhiyong Wu, Yanyao Bian, Dan Su, Helen Meng. "Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker Adaptation in Text-to-Speech Synthesis," [in] Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 2573-2577. ISCA, Incheon, Korea, September 18-22, 2022.. (EI: 20224312992848, CCF-B)
Yixuan Zhou, Changhe Song, Jingbei Li, Zhiyong Wu, Yanyao Bian, Dan Su, Helen Meng. "Enhancing Word-Level Semantic Representation via Dependency Structure for Expressive Text-to-Speech Synthesis," [in] Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 5518-5522. ISCA, Incheon, Korea, September 18-22, 2022.. (EI: 20224312992849, CCF-B)
Zhihan Yang, Zhiyong Wu, Jia Jia. "Speaker Characteristics Guided Speech Synthesis," [in] IEEE International Joint Conference on Neural Networks (IJCNN), pp. 1-8. IEEE, Padua, Italy, July 18-23, 2022.. (EI: 20224413031083, CCF-C, THU-B)
Shun Lei, Yixuan Zhou, Liyang Chen, Zhiyong Wu, Shiyin Kang, Helen Meng. "Towards Expressive Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7922-7926. IEEE, Singapore, May 22-27, 2022.. (EI: 20222312199470, CCF-B, THU-B)
Jingbei Li, Yi Meng, Chenyi Li, Zhiyong Wu, Helen Meng, Chao Weng, Dan Su. "Enhancing Speaking Styles in Conversational Text-to-Speech Synthesis with Graph-Based Multi-Modal Context Modeling," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7917-7921. IEEE, Singapore, May 22-27, 2022.. (EI: 20222912369189, CCF-B, THU-B)
Liyang Chen, Zhiyong Wu, Jun Ling, Runnan Li, Xu Tan, Sheng Zhao. "Transformer-S2A: Robust and Efficient Speech-to-Animation," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7247-7251. IEEE, Singapore, May 22-27, 2022.. (EI: 20222312198574, CCF-B, THU-B)
Xintao Zhao, Feng Liu, Changhe Song, Zhiyong Wu, Shiyin Kang, Deyi Tuo, Helen Meng. "Disentangling Content and Fine-Grained Prosody Information Via Hybrid ASR Bottleneck Features for Voice Conversion," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7022-7026. IEEE, Singapore, May 22-27, 2022.. (EI: 20222312198907, CCF-B, THU-B)
Xueyuan Chen, Changhe Song, Yixuan Zhou, Zhiyong Wu, Changbin Chen, Zhongqin Wu, Helen Meng. "A Character-Level Span-Based Model for Mandarin Prosodic Structure Prediction," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7602-7606. IEEE, Singapore, May 22-27, 2022.. (EI: 20222312198495, CCF-B, THU-B)
Wenlin Dai, Changhe Song, Xiang Li, Zhiyong Wu, Huashan Pan, Xiulin Li, Helen Meng. "An End-to-End Chinese Text Normalization Model Based on Rule-Guided Flat-Lattice Transformer," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7122-7126. IEEE, Singapore, May 22-27, 2022.. (EI: 20222312198496, CCF-B, THU-B)
Jingbei Li, Yi Meng, Zhiyong Wu, Helen Meng, Qiao Tian, Yuping Wang, Yuxuan Wang. "NeuFA: Neural Network Based End-to-End Forced Alignment with Bidirectional Attention Mechanism," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8007-8011. IEEE, Singapore, May 22-27, 2022.. (EI: 20222312198218, CCF-B, THU-B)
Wenxuan Ye, Shaoguang Mao, Frank Soong, Wenshan Wu, Yan Xia, Jonathan Tien, Zhiyong Wu. "An Approach to Mispronunciation Detection and Diagnosis with Acoustic, Phonetic and Linguistic (APL) Embeddings," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6827-6831. IEEE, Singapore, May 22-27, 2022.. (EI: 20222312199246, CCF-B, THU-B)
Jun Chen, Zilin Wang, Deyi Tuo, Zhiyong Wu, Shiyin Kang, Helen Meng. "FullSubNet+: Channel Attention Fullsubnet with Complex Spectrograms for Speech Enhancement," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7857-7861. IEEE, Singapore, May 22-27, 2022.. (EI: 20222912369039, CCF-B, THU-B)
Xixin Wu, Shoukang Hu, Zhiyong Wu, Xunying Liu, Helen Meng. "Neural Architecture Search for Speech Emotion Recognition," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6902-6906. IEEE, Singapore, May 22-27, 2022.. (EI: 20222312198129, CCF-B, THU-B)
Haibin Wu, Po-Chun Hsu, Ji Gao, Shanshan Zhang, Shen Huang, Jian Kang, Zhiyong Wu, Helen Meng, Hung-Yi Lee. "Adversarial Sample Detection for Speaker Verification by Neural Vocoders," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 236-240. IEEE, Singapore, May 22-27, 2022.. (EI: 20222312198990, CCF-B, THU-B)
Xixin Wu, Yuewen Cao, Hui Lu, Songxiang Liu, Disong Wang, Zhiyong Wu, Xunying Liu, Helen Meng. "Speech Emotion Recognition Using Sequential Capsule Networks," IEEE Transactions on Audio, Speech, and Language Processing (TASLP), vol. 29, pp. 3280-3291. IEEE, October 15, 2021.. (SCI: WOS:000714713700004, EI: 20214311082562, CCF-B, THU-A)
Xixin Wu, Yuewen Cao, Hui Lu, Songxiang Liu, Shiyin Kang, Zhiyong Wu, Xunying Liu, Helen Meng. "Exemplar-Based Emotive Speech Synthesis," IEEE Transactions on Audio, Speech, and Language Processing (TASLP), vol. 29, pp. 874-886. IEEE, January 18, 2021.. (SCI: WOS:000619310400001, EI: 20210409830187, CCF-B, THU-A)
Suping Zhou, Jia Jia, Zhiyong Wu, Zhihan Yang, Yanfeng Wang, Wei Chen, Fanbo Meng, Shuo Huang, Jialie Shen, Xiaochuan Wang. "Inferring Emotion from Large-Scale Internet Voice Data: A Semi-supervised Curriculum Augmentation based Deep Learning Approach," [in] AAAI Conference on Artificial Intelligence (AAAI), pp. 6039-6047. AAAI, Virtual, Online, February 2-9, 2021.. (EI: 20222012114882, CCF-A, THU-A)
Yingmei Guo, Linjun Shou, Jian Pei, Ming Gong, Mingxing Xu, Zhiyong Wu, Daxin Jiang. "Learning from Multiple Noisy Augmented Data Sets for Better Cross-Lingual Spoken Language Understanding," [in] Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 3226-3237. ACL, Punta Cana, Dominican Republic, November 7-11, 2021.. (EI: 20221411909706, CCF-B, THU-A)
Yaohua Bu, Tianyi Ma, Weijun Li, Hang Zhou, Jia Jia, Shengqi Chen, Kaiyuan Xu, Dachuan Shi, Haozhe Wu, Zhihan Yang, Kun Li, Zhiyong Wu, Yuanchun Shi, Xiaobo Lu, Ziwei Liu. "PTeacher: A Computer-Aided Personalized Pronunciation Training System with Exaggerated Audio-Visual Corrective Feedback," [in] ACM Conference on Human Factors in Computing Systems (CHI), pp. 1-14. ACM, Yokohama, Japan, May 8-13, 2021.. (EI: 20212210439123, CCF-A, THU-A)
Liangqi Liu, Jiankun Hu, Zhiyong Wu, Song Yang, Songfan Yang, Jia Jia, Helen Meng. "Controllable Emphatic Speech Synthesis based on Forward Attention for Expressive Speech Synthesis," [in] IEEE Spoken Language Technology Workshop (SLT), pp. 410-414. IEEE, Shenzhen, China, January 19-22, 2021.. (EI: 20211510210781, CCF-C, Best Paper Finalist)
Huirong Huang, Zhiyong Wu, Shiyin Kang, Dongyang Dai, Jia Jia, Tianxiao Fu, Deyi Tuo, Guangzhi Lei, Peng Liu, Dan Su, Dong Yu, Helen Meng. "Speaker Independent and Multilingual/Mixlingual Speech-driven Talking Head Generation Using Phonetic Posteriorgrams," [in] Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 1433-1437. APSIPA, Tokyo, Japan, December 14-17, 2021.. (EI: 20221211827369)
Aolan Sun, Jianzong Wang, Ning Cheng, Methawee Tantrawenith, Zhiyong Wu, Helen Meng, Edward Xiao, Jing Xiao. "Reconstructing Dual Learning for Neural Voice Conversion Using Relatively Few Samples," [in] IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 946-953. IEEE, Cartagena, Colombia, December 13-17, 2021.. (EI: 20221211830976, CCF-C)
Xinyu Cai, Heinrich Dinkel, Zhiyong Yan, Yongqing Wang, Junbo Zhang, Zhiyong Wu, Yujun Wang. "A Contrastive Semi-Supervised Learning Framework For Anomaly Sound Detection," [in] Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE), pp. 31-34. November 15-19, 2021..
Hui Lu, Zhiyong Wu, Xixin Wu, Xu Li, Shiyin Kang, Xunying Liu, Helen Meng. "VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis," [in] Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 3775-3779. ISCA, Brno, Czech Republic, August 30-September 3, 2021.. (EI: 20214711186915, CCF-B)
Xiang Li, Changhe Song, Jingbei Li, Zhiyong Wu, Jia Jia, Helen Meng. "Towards Multi-Scale Style Control for Expressive Speech Synthesis," [in] Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 4673-4677. ISCA, Brno, Czech Republic, August 30-September 3, 2021.. (EI: 20214711190435, CCF-B)
Jie Wang, Jingbei Li, Xintao Zhao, Zhiyong Wu, Shiyin Kang, Helen Meng. "Adversarially Learning Disentangled Speech Representations for Robust Multi-factor Voice Conversion," [in] Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 846-850. ISCA, Brno, Czech Republic, August 30-September 3, 2021.. (EI: 20214711194412, CCF-B)
Haibin Wu, Yang Zhang, Zhiyong Wu, Dong Wang, Hung-Yi Lee. "Voting for the Right Answer: Adversarial Defense for Speaker Verification," [in] Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 4294-4298. ISCA, Brno, Czech Republic, August 30-September 3, 2021.. (EI: 20214711194533, CCF-B)
Xingchen Song, Zhiyong Wu, Yiheng Huang, Chao Weng, Dan Su, Helen Meng. "Non-Autoregressive Transformer ASR with CTC-Enhanced Decoder Input," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5894-5898. IEEE, Toronto, Canada, June 6-11, 2021.. (EI: 20213810913803, CCF-B, THU-B)
Changhe Song, Jingbei Li, Yixuan Zhou, Zhiyong Wu, Helen Meng. "Syntactic Representation Learning for Neural Network based TTS with Syntactic Parse Tree Traversal," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6064-6068. IEEE, Toronto, Canada, June 6-11, 2021.. (EI: 20213810921439, CCF-B, THU-B)
Xiong Cai, Dongyang Dai, Zhiyong Wu, Xiang Li, Jingbei Li, Helen Meng. "Emotion Controllable Speech Synthesis using Emotion-Unlabeled Dataset with the Assistance of Cross-Domain Speech Emotion Recognition," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5734-5738. IEEE, Toronto, Canada, June 6-11, 2021.. (EI: 20213810922222, CCF-B, THU-B)
Jie Wang, Yuren You, Feng Liu, Deyi Tuo, Shiyin Kang, Zhiyong Wu, Helen Meng. "The Huya Multi-speaker and Multi-style Speech Synthesis System for M2VOC Challenge 2020," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8608-8612. IEEE, Toronto, Canada, June 6-11, 2021.. (EI: 20213810913901, CCF-B, THU-B)
Haibin Wu, Xu Li, Andy T. Liu, Zhiyong Wu, Helen Meng, Hung-Yi Lee. "Adversarial Defense for Automatic Speaker Verification by Cascaded Self-supervised Learning Models," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6718-6722. IEEE, Toronto, Canada, June 6-11, 2021.. (EI: 20213810914628, CCF-B, THU-B)
Bin Su, Shaoguang Mao, Frank Soong, Yan Xia, Jonathan Tien, Zhiyong Wu. "Improving Pronunciation Assessment via Ordinal Regression with Anchored Reference Samples," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7748-7752. IEEE, Toronto, Canada, June 6-11, 2021.. (EI: 20213810908107, CCF-B, THU-B)
Qicong Xie, Xiaohai Tian, Guanghou Liu, Kun Song, Lei Xie, Zhiyong Wu, Hai Li, Song Shi, Haizhou Li, Fen Hong, Hui Bu, Xin Xu. "The Multi-speaker Multi-style Voice Cloning Challenge 2021," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8613-8617. IEEE, Toronto, Canada, June 6-11, 2021.. (EI: 20213810922367, CCF-B, THU-B)
Xiong Cai, Zhiyong Wu, Kuo Zhong, Bin Su, Dongyang Dai, Helen Meng. "Unsupervised Cross-Lingual Speech Emotion Recognition Using Domain Adversarial Neural Network," [in] International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 1-5. Hong Kong, China, January 24-26, 2021.. (EI: 20211210098767)
Michael Lao BanTeng, Zhiyong Wu. "Channel-Wise Dense Connection Graph Convolutional Network for Skeleton-Based Action Recognition," [in] International Conference on Pattern Recognition (ICPR), pp. 3799-3806. IAPR, Milan, Italy, January 10-15, 2021.. (EI: 20212910658234, CCF-C, THU-B)
Xingchen Song, Zhiyong Wu, Yiheng Huang, Dan Su, Helen Meng. "SpecSwap: A Simple Data Augmentation Method for End-to-End Speech Recognition," [in] Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 581-585. ISCA, Shanghai, China, October 25-29, 2020.. (EI: 20205209692178, CCF-B)
Xingchen Song, Guangsen Wang, Yiheng Huang, Zhiyong Wu, Dan Su, Helen Meng. "Speech-XLNet: Unsupervised Acoustic Model Pretraining for Self-Attention Networks," [in] Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 3765-3769. ISCA, Shanghai, China, October 25-29, 2020.. (EI: 20205209692164, CCF-B)
Kun Zhang, Zhiyong Wu, Daode Yuan, Jian Luan, Jia Jia, Helen Meng, Binheng Song. "Re-weighted Interval Loss for Handling Data Imbalance Problem of End-to-End Keyword Spotting," [in] Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 2567-2571. ISCA, Shanghai, China, October 25-29, 2020.. (EI: 20205209692622, CCF-B)
Xiangyu Liang, Zhiyong Wu, Runnan Li, Yanqing Liu, Sheng Zhao, Helen Meng. "Enhancing Monotonicity for Robust Autoregressive Transformer TTS," [in] Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 3181-3185. ISCA, Shanghai, China, October 25-29, 2020.. (EI: 20205209692668, CCF-B)
Yuewen Cao, Songxiang Liu, Xixin Wu, Shiyin Kang, Peng Liu, Zhiyong Wu, Xunying Liu, Dan Su, Dong Yu, Helen Meng. "Code-Switched Speech Synthesis Using Bilingual Phonetic Posteriorgram with Only Monolingual Corpora," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7619-7623. IEEE, Barcelona, Spain, May 4-8, 2020.. (EI: 20203309041046, CCF-B, THU-B)
Songxiang Liu, Disong Wang, Yuewen Cao, Lifa Sun, Xixin Wu, Shiyin Kang, Zhiyong Wu, Xunying Liu, Dan Su, Dong Yu, Helen Meng. "End-To-End Accent Conversion Without Using Native Utterances," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6289-6293. IEEE, Barcelona, Spain, May 4-8, 2020.. (EI: 20203309040748, CCF-B, THU-B)
Yingmei Guo, Zhiyong Wu, Mingxing Xu,. "FERNet: Fine-grained Extraction and Reasoning Network for Emotion Recognition in Dialogues," [in] Asia-Pacific Chapter of the Association for Computational Linguistics and International Joint Conference on Natural Language Processing (AACL-IJCNLP), pp. 37-43. AACL, Suzhou, China, December 4-7, 2020..
Runnan Li, Zhiyong Wu, Yaohua Bu, Jia Jia, Sheng Zhao, Helen Meng. "Towards Discriminative Representation Learning for Speech Emotion Recognition," [in] International Joint Conference on Artificial Intelligence (IJCAI), pp. 5060-5066. Morgan Kaufmann, Macao, China, August 10-16, 2019.. (EI: 20194607696464, CCF-B, THU-B)
Yishuang NING, Sheng HE, Zhiyong WU, Chunxiao XING, Liangjie ZHANG. "A Review of Deep Learning Based Speech Synthesis," Applied Sciences-Basel, vol. 9, no. 19, pp. 4050. MDPI, September, 2019.. (SCI: WOS:000496258100108)
Liangqi Liu, Zhiyong Wu, Runnan Li, Jia Jia, Helen Meng. "Learning Contextual Representation with Convolution Bank and Multi-head Self-attention for Speech Emphasis Detection," [in] Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 922-926. APSIPA, Lanzhou, China, November 18-21, 2019.. (EI: 20201308362271)
Kun Zhang, Zhiyong Wu, Jia Jia, Helen Meng, Binheng Song. "Query-by-Example Spoken Term Detection using Attentive Pooling Networks," [in] Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 1267-1272. APSIPA, Lanzhou, China, November 18-21, 2019.. (EI: 20201308362101)
Yao Du, Zhiyong Wu, Shiyin Kang, Dan Su, Dong Yu, Helen Meng. "Automatic Prosodic Structure Labeling using DNN-BGRU-CRF Hybrid Neural Network," [in] Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 1234-1238. APSIPA, Lanzhou, China, November 18-21, 2019.. (EI: 20201308362428)
Yao Du, Zhiyong Wu, Shiyin Kang, Dan Su, Dong Yu, Helen Meng. "Prosodic Structure Prediction using Deep Self-attention Neural Network," [in] Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 320-324. APSIPA, Lanzhou, China, November 18-21, 2019.. (EI: 20201308362388)
Yulan Chen, Zhiyong Wu, Jia Jia. "Modeling Emotion Influence Using Attention-based Graph Convolutional Recurrent Network," [in] ACM International Conference on Multimodal Interaction (ICMI), pp. 302-309. ACM, Suzhou, China, October 14-18, 2019.. (EI: 20194607696646, CCF-C, THU-B)
Hui Lu, Zhiyong Wu, Dongyang Dai, Runnan Li, Shiyin Kang, Jia Jia, Helen Meng. "One-shot Voice Conversion with Global Speaker Embeddings," [in] Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 669-673. ISCA, Graz, Austria, September 15-19, 2019.. (EI: 20194607674295, CCF-B)
Dongyang Dai, Zhiyong Wu, Shiyin Kang, Xixin Wu, Jia Jia, Dan Su, Dong Yu, Helen Meng. "Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-trained BERT," [in] Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 2090-2094. ISCA, Graz, Austria, September 15-19, 2019.. (EI: 20194607674520, CCF-B)
Jingbei Li, Zhiyong Wu, Runnan Li, Pengpeng Zhi, Song Yang, Helen Meng. "Knowledge-based Linguistic Encoding for End-to-End Mandarin Text-to-Speech Synthesis," [in] Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 4494-4498. ISCA, Graz, Austria, September 15-19, 2019.. (EI: 20194607674398, CCF-B)
Yingmei Guo, Mingxing Xu, Zhiyong Wu, Jianming Wu, Bin Su. "Multi-Scale Convolutional Recurrent Neural Network with Ensemble Method for Weakly Labeled Sound Event Detection," [in] International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW), pp. 110-114. Cambridge, UK, September 3-6, 2019.. (EI: 20200308046817)
Dongyang Dai, Zhiyong Wu, Runnan Li, Xixin Wu, Jia Jia, Helen Meng. "Learning Discriminative Features from Spectrograms Using Center Loss for Speech Emotion Recognition," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7405-7409. IEEE, Brighton, UK, May 12-17, 2019.. (EI: 20193007228731, CCF-B, THU-B)
Hui Lu, Zhiyong Wu, Runnan Li, Shiyin Kang, Jia Jia, Helen Meng. "A Compact Framework for Voice Conversion Using WaveNet Conditioned on Phonetic Posteriorgrams," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6810-6814. IEEE, Brighton, UK, May 12-17, 2019.. (EI: 20192907201683, CCF-B, THU-B)
Mu Wang, Xixin Wu, Zhiyong Wu, Shiyin Kang, Deyi Tuo, Guangzhi Li, Dan Su, Dong Yu, Helen Meng. "Quasi-fully Convolutional Neural Network with Variational Inference for Speech Synthesis," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7060-7064. IEEE, Brighton, UK, May 12-17, 2019.. (EI: 20192907202523, CCF-B, THU-B)
Runnan Li, Zhiyong Wu, Jia Jia, Sheng Zhao, Helen Meng. "Dilated Residual Network with Multi-head Self-attention for Speech Emotion Recognition," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6675-6679. IEEE, Brighton, UK, May 12-17, 2019.. (EI: 20192907202018, CCF-B, THU-B)
Shaoguang Mao, Zhiyong Wu, Jingshuai Jiang, Peiyun Liu, Frank K. Soong. "NN-based Ordinal Regression for Assessing Fluency of ESL Speech," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7420-7424. IEEE, Brighton, UK, May 12-17, 2019.. (EI: 20192907202051, CCF-B, THU-B)
Xixin Wu, Songxiang Liu, Yuewen Cao, Xu Li, Jianwei Yu, Dongyang Dai, Xi Ma, Shoukang Hu, Zhiyong Wu, Xunying Liu, Helen Meng. "Speech Emotion Recognition Using Capsule Networks," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6695-6699. IEEE, Brighton, UK, May 12-17, 2019.. (EI: 20192907201454, CCF-B, THU-B)
Yuewen Cao, Xixin Wu, Songxiang Liu, Jianwei Yu, Xu Li, Zhiyong Wu, Xunying Liu, Helen Meng. "End-to-End Code-switched TTS with Mix of Monolingual Recordings," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6935-6939. IEEE, Brighton, UK, May 12-17, 2019.. (EI: 20192907201672, CCF-B, THU-B)
Runnan Li, Zhiyong Wu, Jia Jia, Jingbei Li, Wei Chen, Helen Meng. "Inferring User Emotive State Changes in Realistic Human-Computer Conversational Dialogs," [in] ACM International Conference on Multimedia (ACM MM), pp. 136-144. ACM, Seoul, Korea, October 22-26, 2018.. (EI: 20185006246269, CCF-A, THU-A)
Kun Li, Shaoguang Mao, Xu Li, Zhiyong Wu, Helen Meng. "Automatic Lexical Stress and Pitch Accent Detection for L2 English Speech using Multi-Distribution Deep Neural Networks," Speech Communication (Speech Com), vol. 96, pp. 28-36. Elsevier, February, 2018.. (SCI: WOS:000424723700003 , EI: 20174704448303, THU-B)
Jingbei Li, Zhiyong Wu, Runnan Li, Mingxing Xu, Kehua Lei, Lianhong Ca. "Multi-modal Multi-scale Speech Expression Evaluation in Computer-Assisted Language Learning," Lecture Notes in Computer Science, [in] Proc. Artificial Intelligence and Mobile Services (AIMS), vol. 10970, pp. 16-28. Seattle, USA, June 25-30, 2018.. (SCI: WOS:000443112000002, EI: 20182705519834)
Ziwei Zhu, Zhiyong Wu, Runnan Li, Yishuang Ning, Helen Meng. "Learning Frame-Level Recurrent Neural Networks Representations for Query-by-Example Spoken Term Detection on Mobile Devices," Lecture Notes in Computer Science, [in] Proc. Artificial Intelligence and Mobile Services (AIMS), vol. 10970, pp. 55-66. Seattle, USA, June 25-30, 2018.. (SCI: WOS:000443112000005, EI: 20182705519838)
Mu Wang, Zhiyong Wu, Shiyin Kang, Xixin Wu, Jia Jia, Dan Su, Dong Yu, Helen Meng. "Speech Super Resolution Using Parallel WaveNet," [in] International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 260-264. Taipei, China, November 26-29, 2018.. (EI: 20192106959272)
Ziwei Zhu, Zhiyong Wu, Runnan Li, Helen Meng, Lianhong Cai. "Siamese Recurrent Auto-Encoder Representation for Query-by-Example Spoken Term Detection," [in] Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 102-106. ISCA, Hyderabad, India, September 2-6, 2018.. (EI: 20184305969082, CCF-B)
Xixin Wu, Yuewen Cao, Mu Wang, Songxiang Liu, Shiyin Kang, Zhiyong Wu, Xunying Liu, Dan Su, Dong Yu, Helen Meng. "Rapid Style Adaptation using Residual Error Embedding for Expressive Speech Synthesis," [in] Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 3072-3076. ISCA, Hyderabad, India, September 2-6, 2018.. (EI: 20184305968770, CCF-B)
Shuai Yang, Zhiyong Wu, Binbin Shen, Helen Meng. "Detection of Glottal Closure Instants from Speech Signals: A Convolutional Neural Network based Method," [in] Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 317-321. ISCA, Hyderabad, India, September 2-6, 2018.. (EI: 20184305968631, CCF-B)
Xi Ma, Zhiyong Wu, Jia Jia, Mingxing Xu, Helen Meng, Lianhong Cai. "Emotion Recognition from Variable-Length Speech Segments using Deep Learning on Spectrograms," [in] Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 3683-3687. ISCA, Hyderabad, India, September 2-6, 2018.. (EI: 20184305969207, CCF-B)
Shaoguang Mao, Zhiyong Wu, Xu Li, Runnan Li, Xixin Wu, Helen Meng. "Integrating Articulatory Features into Acoustic-Phonemic Model for Mispronunciation Detection and Diagnosis in L2 English Speech," [in] IEEE International Conference on Multimedia and Expo (ICME), pp. 1-6. IEEE, San Diego, USA, July 23-27, 2018.. (EI: 20190706509298, CCF-B)
Runnan Li, Zhiyong Wu, Yuchen Huang, Jia Jia, Helen Meng, Lianhong Cai. "Emphatic Speech Generation with Conditional Input Layer and Bidirectional LSTMs for Expressive Speech Synthesis," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5129-5133. IEEE, Calgary, Canada, April 15-20, 2018.. (EI: 20184005908536, CCF-B, THU-B)
Shaoguang Mao, Zhiyong Wu, Runnan Li, Xu Li, Helen Meng, Lianhong Cai. "Applying Multitask Learning to Acoustic-Phonemic Model for Mispronunciation Detection and Diagnosis in L2 English Speech," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6254-6258. IEEE, Calgary, Canada, April 15-20, 2018.. (EI: 20184005907878, CCF-B, THU-B)
Shaoguang Mao, Xu Li, Kun Li, Zhiyong Wu, Xunying Liu, Helen Meng. "Unsupervised Discovery of An Extended Phoneme Set in L2 English Speech for Mispronunciation Detection and Diagnosis," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6244-6248. IEEE, Calgary, Canada, April 15-20, 2018.. (EI: 20184005908409, CCF-B, THU-B)
Xixin Wu, Lifa Sun, Shiyin Kang, Songxiang Liu, Zhiyong Wu, Xunying Liu, Helen Meng. "Feature based Adaptation for Speaking Style Synthesis," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5304-5308. IEEE, Calgary, Canada, April 15-20, 2018.. (EI: 20184005907958, CCF-B, THU-B)
Mu Wang, Zhiyong Wu, Xixin Wu, Helen Meng, Shiyin Kang, Jia Jia, Lianhong Cai. "Emphatic Speech Synthesis and Control based on Characteristic Transferring in End-to-End Speech Synthesis," [in] Asian Conference on Affective Computing and Intelligent Interaction (ACII Asia), pp. 1-6. Beijing, China, May 20-22, 2018.. (EI: 20184406009875)
Yishuang Ning, Jia Jia, Zhiyong Wu, Runnan Li, Yongsheng An, Yanfeng Wang, Helen Meng. "Multi-task Deep Learning for User Intention Understanding in Speech Interaction Systems," [in] AAAI Conference on Artificial Intelligence (AAAI), pp. 161-167. AAAI, San Francisco, USA, February 4-9, 2017.. (EI: 20174104242835, CCF-A, THU-A)
Runnan Li, Zhiyong Wu, Yishuang Ning, Lifa Sun, Helen Meng, Lianhong Cai. "Spectro-Temporal Modelling with Time-Frequency LSTM and Structured Output Layer for Voice Conversion," [in] Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 3409-3413. ISCA, Stockholm, Sweden, August 20-24, 2017.. (EI: 20175204590811, CCF-B)
Yuchen Huang, Zhiyong Wu, Runnan Li, Helen Meng, Lianhong Cai. "Multi-Task Learning for Prosodic Structure Generation using BLSTM RNN with Structured Output Layer," [in] Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 779-783. ISCA, Stockholm, Sweden, August 20-24, 2017.. (EI: 20175204591488, CCF-B)
Xi Ma, Zhiyong Wu, Jia Jia, Mingxing Xu, Helen Meng, Lianhong Cai. "Speech Emotion Recognition with Emotion-Pair based Framework Considering Emotion Distribution Information in Dimensional Emotion Space," [in] Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 1238-1242. ISCA, Stockholm, Sweden, August 20-24, 2017.. (EI: 20175204591394, CCF-B)
Yishuang Ning, Zhiyong Wu, Runnan Li, Jia Jia, Mingxing Xu, Helen Meng, Lianhong Cai. "Learning Cross-Lingual Knowledge with Multilingual BLSTM for Emphasis Detection with Limited Training Data," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5615-5619. IEEE, New Orleans, USA, March 5-9, 2017.. (EI: 20172903955037, CCF-B, THU-B)
Runnan Li, Zhiyong Wu, Xunying Liu, Helen Meng, Lianhong Cai. "Multi-Task Learning of Structured Output Layer Bidirectional LSTMs for Speech Synthesis," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5510-5514. IEEE, New Orleans, USA, March 5-9, 2017.. (EI: 20172903955266, CCF-B, THU-B)
Xixin Wu, Shiyin Kang, Lifa Sun, Yishuang Ning, Zhiyong Wu, Helen Meng. "Attention-based Recurrent Generator with Gaussian Tolerance for Statistical Parametric Speech Synthesis," [in] Affective Social Multimedia Computing (ASMMC), pp. 1-5. Stockholm, Sweden, August 20-24, 2017..
Runnan Li, Zhiyong Wu, Helen Meng, Lianhong Cai. "DBLSTM-based Multi-Task Learning for Pitch Transformation in Voice Conversion," [in] International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 1-5. Tianjin, China, October 17-20, 2016.. (EI: 20172303743441)
Xu Li, Zhiyong Wu, Helen Meng, Jia Jia, Xiaoyan Lou, Lianhong Cai. "Phoneme Embedding and its Application to Speech Driven Talking Avatar Synthesis," [in] Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 1472-1476. ISCA, San Francisco, USA, September 8-12, 2016.. (EI: 20164603004231, CCF-B)
Xu Li, Zhiyong Wu, Helen Meng, Jia Jia, Xiaoyan Lou, Lianhong Cai. "Expressive Speech Driven Talking Avatar Synthesis with DBLSTM using Limited Amount of Emotional Bimodal Data," [in] Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 1477-1481. ISCA, San Francisco, USA, September 8-12, 2016.. (EI: 20164603004232, CCF-B)
Yaodong Tang, Zhiyong Wu, Helen Meng, Mingxing Xu, Lianhong Cai. "Analysis on Gated Recurrent Unit based Question Detection Approach," [in] Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 735-739. ISCA, San Francisco, USA, September 8-12, 2016.. (EI: 20164603003979, CCF-B)
Linchuan Li, Zhiyong Wu, Mingxing Xu, Helen Meng, Lianhong Cai. "Combining CNN and BLSTM to Extract Textual and Acoustic Features for Recognizing Stances in Mandarin Ideological Debate Competition," [in] Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 1392-1396. ISCA, San Francisco, USA, September 8-12, 2016.. (EI: 20164603003717, CCF-B)
Linchuan Li, Zhiyong Wu, Mingxing Xu, Helen Meng, Lianhong Cai. "Recognizing Stances in Mandarin Social Ideological Debates with Text and Acoustic Features," [in] IEEE International Conference on Multimedia and Expo (ICME), pp. 1-6. IEEE, Seattle, USA, July 11-15, 2016.. (EI: 20164302952120, CCF-B)
Haishu Xianyu, Mingxing Xu, Zhiyong Wu, Lianhong Cai. "Heterogeneity-Entropy based Unsupervised Feature Learning for Personality Prediction with Cross-media Data," [in] IEEE International Conference on Multimedia and Expo (ICME), pp. 1-6. IEEE, Seattle, USA, July 11-15, 2016.. (EI: 20163802815545, CCF-B)
Yaodong Tang, Yuchen Huang, Zhiyong Wu, Helen Meng, Mingxing Xu, Lianhong Cai. "Question Detection from Acoustic Features using Recurrent Neural Network with Gated Recurrent Unit," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6125-6129. IEEE, Shanghai, China, March 20-25, 2016.. (EI: 20162402488463, CCF-B, THU-B)
Quanjie Yu, Peng Liu, Zhiyong Wu, Shiyin Kang, Helen Meng, Lianhong Cai. "Learning Cross-lingual Information with Multilingual BLSTM for Speech Synthesis of Low-resource Languages," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5545-5549. IEEE, Shanghai, China, March 20-25, 2016.. (EI: 20162402488723, CCF-B, THU-B)
Xinyu Lan, Xu Li, Yishuang Ning, Zhiyong Wu, Helen Meng, Jia Jia, Lianhong Cai. "Low Level Descriptors based DBLSTM Bottleneck Feature for Speech Driven Talking Avatar," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5550-5554. IEEE, Shanghai, China, March 20-25, 2016.. (EI: 20162402488482, CCF-B, THU-B)
Zhiyong Wu, Yishuang Ning, Xiao Zang, Jia Jia, Fanbo Meng, Helen Meng, Lianhong Cai. "Generating Emphatic Speech with Hidden Markov Model for Expressive Speech Synthesis," Multimedia Tools and Applications (MTA), vol. 74, no. 22, pp. 9909-9925. Springer, July, 2015.. (SCI: WOS:000364019400005 , EI: 20143600027913, CCF-C)
Zhiyong Wu, Kai Zhao, Xixin Wu, Xinyu Lan, Helen Meng. "Acoustic to Articulatory mapping with Deep Neural Network," Multimedia Tools and Applications (MTA), vol. 74, no. 22, pp. 9889-9907. Springer, August, 2015.. (SCI: WOS:000364019400004, EI: 20143600014973, CCF-C)
Qi Lyu, Zhiyong Wu, Jun Zhu. "Polyphonic Music Modelling with LSTM-RTRBM," [in] ACM International Conference on Multimedia (ACM MM), pp. 991-994. ACM, Brisbane, Australia, October 26-30, 2015.. (EI: 20161602252616, CCF-A, THU-A)
Qi Lyu, Zhiyong Wu, Jun Zhu, Helen Meng. "Modelling High-dimensional Sequences with LSTM-RTRBM: Application to Polyphonic Music Generation," [in] International Joint Conference on Artificial Intelligence (IJCAI), pp. 4138-4139. Morgan Kaufmann, Buenos Aires, Argentina, July 25-31, 2015.. (EI: 20155101693661, CCF-B, THU-B)
Peng Liu, Quanjie Yu, Zhiyong Wu, Shiyin Kang, Helen Meng, Lianhong Cai. "A Deep Recurrent Approach for Acoustic-to-Articulatory Inversion," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4450-4454. IEEE, Brisbane, Australia, April 19-24, 2015.. (EI: 20154501510018, CCF-B, THU-B)
Yishuang Ning, Zhiyong Wu, Jia Jia, Fanbo Meng, Helen Meng, Lianhong Cai. "HMM-based Emphatic Speech Synthesis for Corrective Feedback in Computer-Aided Pronunciation Training," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4934-4938. IEEE, Brisbane, Australia, April 19-24, 2015.. (EI: 20154501509415, CCF-B, THU-B)
Yishuang Ning, Zhiyong Wu, Xiaoyan Lou, Helen Meng, Jia Jia, Lianhong Cai. "Using Tilt for Automatic Emphasis Detection with Bayesian Networks," [in] Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 578-582. ISCA, Dresden, Germany, September 6-10, 2015.. (EI: 20160902029674, CCF-B)
Xixin Wu, Zhiyong Wu, Yishuang Ning, Jia Jia, Lianhong Cai, Helen Meng. "Understanding Speaking Styles of Internet Speech Data with LSTM and Low-resource Training," [in] International Conference on Affective Computing and Intelligent Interaction (ACII), pp. 815-820. Xi'an, China, September 21-24, 2015.. (EI: 20161502238729)
孟凡博, 吴志勇, 贾珈, 蔡莲红. "汉语重音的凸显度分析与合成," 声学学报, vol. 40, no. 1, pp. 1-11. 2015年1月.. (EI: 20151000618075)
Yuchen Huang, Mingxing Xu, Zhiyong Wu, Lianhong Cai. "Study on the Distribution of Acoustic Features Characterizing Sentence Intonation," [in] National Conference on Man-Machine Speech Communication (NCMMSC), pp. 1-4. Tianjin, China, October 25-27, 2015.. (CCF-C)
Fanbo Meng, Zhiyong Wu, Jia Jia, Helen Meng, Lianhong Cai. "Synthesizing English Emphatic Speech for Multimodal Corrective Feedback in Computer-Aided Pronunciation Training," Multimedia Tools and Applications (MTA), vol. 73, no. 1, pp. 463-489. Springer, September, 2014.. (SCI: WOS:000342418700022 , EI: 20143600046713, CCF-C)
Jia Jia, Zhiyong Wu, Shen Zhang, Helen Meng, Lianhong Cai. "Head and Facial Gestures Synthesis using PAD Model for an Expressive Talking Avatar," Multimedia Tools and Applications (MTA), vol. 73, no. 1, pp. 439-461. Springer, September, 2014.. (SCI: WOS:000342418700023 , EI: 20143600046670, CCF-C)
Xin Zheng, Zhiyong Wu, Helen Meng, Lianhong Cai. "Contrastive Auto-encoder for Phoneme Recognition," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2548-2552. IEEE, Florence, Italy, May 4-9, 2014.. (EI: 20143218037687, CCF-B, THU-B)
Xin Zheng, Zhiyong Wu, Helen Meng, Lianhong Cai. "Learning Dynamic Features with Neural Networks for Phoneme Recognition," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2543-2547. IEEE, Florence, Italy, May 4-9, 2014.. (EI: 20143218037686, CCF-B, THU-B)
Xiao Zang, Zhiyong Wu, Helen Meng, Jia Jia, Lianhong Cai. "Using Conditional Random Fields to Predict Focus Word Pair in Spontaneous Spoken English," [in] Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 756-760. ISCA, Singapore, September 14-18, 2014.. (EI: 20144600199537, CCF-B)
Xixin Wu, Zhiyong Wu, Jia Jia, Helen Meng, Lianhong Cai. "Automatic Speech Data Clustering with Human Perception based Weighted Distance," [in] International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 216-220. Singapore, September 14-18, 2014.. (EI: 20144900274075)
Xiao Zang, Zhiyong Wu, Yishuang Ning, Helen Meng, Lianhong Cai. "Automatic Detection of Contrastive Word Pairs using Textual and Acoustic Features," [in] IEEE International Conference on Signal Processing (ICSP), pp. 594-598. IEEE, Hangzhou, China, October 19-23, 2014.. (EI: 20153101078079)
Yuchao Fan, Mingxing Xu, Zhiyong Wu, Lianhong Cai. "Automatic Emotion Variation Detection using Multi-Scaled Sliding Window," [in] IEEE International Conference on Orange Technologies (ICOT), pp. 229-233. IEEE, Xi'an, China, September 20-23, 2014.. (EI: 20145000323155)
Xin Wang, Zhiyong Wu, Lianhong Cai. "Stable Boundary-based Non-uniform Unit Selection in Speech Synthesis," Journal of Software, vol. 25, no. Supplement 2, pp. 63-69. December, 2014.. (EI: 20152100877399)
Fanbo Meng, Zhiyong Wu, Helen Meng, Jia Jia, Lianhong Cai. "English Emphatic Speech Conversion based on a Decision Tree," Tsinghua Science and Technology, vol. 53, no. 7, pp. 1046-1051. July, 2013.. (EI: 20135217144112)
Xin Zheng, Zhiyong Wu, Binbin Shen, Helen Meng, Lianhong Cai. "Investigation of Tandem Deep Belief Network Approach for Phoneme Recognition," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7586-7590. IEEE, Vancouver, Canada, May 26-31, 2013.. (EI: 20135217121577, CCF-B, THU-B)
Jianbo Jiang, Zhiyong Wu, Mingxing Xu, Jia Jia, Lianhong Cai. "Comparing Feature Dimension Reduction Algorithms for GMM-SVM based Speech Emotion Recognition," [in] Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 1-4. APSIPA, Kaohsiung, China, October 29-November 1, 2013.. (EI: 20140717305313)
Kai Zhao, Zhiyong Wu, Lianhong Cai. "A Real-time Speech Driven Talking Avatar based on Deep Neural Network," [in] Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 1-4. APSIPA, Kaohsiung, China, October 29-November 1, 2013.. (EI: 20140717305312)
Jia Jia, Xiaohui Wang, Zhiyong Wu, Lianhong Cai, Helen Meng. "Modeling the Correlation between Modality Semantics and Facial Expressions," [in] Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 1-4. APSIPA, Hollywood, USA, December 3-6, 2012.. (EI: 20131016079234)
Jianbo Jiang, Zhiyong Wu, Mingxing Xu, Jia Jia, Lianhong Cai. "Comparison of Adaptation Methods for GMM-SVM based Speech Emotion Recognition," [in] IEEE Spoken Language Technology Workshop (SLT), pp. 269-273. IEEE, Miami, USA, December 2-5, 2012.. (EI: 20130916065166, CCF-C)
Tao Jiang, Zhiyong Wu, Jia Jia, Lianhong Cai. "Perceptual Clustering based Unit Selection Optimization for Concatenative Text-to-Speech Synthesis," [in] International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 64-68. Hong Kong, China, December 5-8, 2012.. (EI: 20131016084519)
Chunrong Li, Zhiyong Wu, Fanbo Meng, Helen Meng, Lianhong Cai. "Detection and Emphatic Realization of Contrastive Word Pairs for Expressive Text-to-Speech Synthesis," [in] International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 93-97. Hong Kong, China, December 5-8, 2012.. (EI: 20131016084523)
Xixin Wu, Zhiyong Wu, Jia Jia, Lianhong Cai. "Adaptive Named Entity Recognition based on Conditional Random Fields with Automatic Updated Dynamic Gazetteers," [in] International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 363-367. Hong Kong, China, December 5-8, 2012.. (EI: 20131016084525)
Fanbo Meng, Zhiyong Wu, Helen Meng, Jia Jia, Lianhong Cai. "Hierarchical English Emphatic Speech Synthesis based on HMM with Limited Training Data," [in] Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 466-469. ISCA, Portland, USA, September 8-13, 2012.. (EI: 20132316399086, CCF-B)
Fanbo Meng, Zhiyong Wu, Helen Meng, Jia Jia, Lianhong Cai. "Generating Emphasis from Neutral Speech using Hierarchical Perturbation Model by Decision Tree and Support Vector Machine," [in] International Conference on Audio, Language and Image Processing (ICALIP), pp. 442-448. IEEE, Shanghai, China, July 16-18, 2012.. (EI: 20130315907216)
Zhang Zhang, Zhiyong Wu, Jia Jia, Lianhong Cai. "Modeling Prosody Pattern of Chinese Expressive Speech and Its Application in Personalized Speech Conversion," [in] International Symposium on Tonal Aspects of Languages (TAL), pp. 1-5. Nanjing, China, May 26-29, 2012..
Kai Zhao, Zhiyong Wu, Jia Jia, Lianhong Cai. "An Online Speech Driven Talking Head System," [in] IEEE Global High Tech Congress on Electronics (GHTCE), pp. 186-187. IEEE, Shenzhen, China, November 18-20, 2012..
Xin Wang, Zhiyong Wu. "An HMM-based Cantonese Speech Synthesis System," [in] IEEE Global High Tech Congress on Electronics (GHTCE), pp. 141-142. IEEE, Shenzhen, China, November 18-20, 2012..
姜涛, 吴志勇, 蔡莲红. "语音合成自然度的客观度量实验研究," [in] 中国语音学学术会议 (PCC), pp. 1-6. 上海, 2012年5月18日-20日..
Hui Pang, Zhiyong Wu, Lianhong Cai. "Modeling Pitch Contour of Chinese Mandarin Sentences with the PENTA Model," [in] National Conference on Man-Machine Speech Communication (NCMMSC), pp. 1-7. Xi'an, China, October 16-18, 2011.. (EI: 20123215322698, CCF-C, Best Student Paper)
Binbin Shen, Zhiyong Wu, Yongxin Wang, Lianhong Cai. "Combining Active and Semi-supervised Learning for Homograph Disambiguation in Mandarin Text-to-Speech Synthesis," [in] Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 2165-2168. ISCA, Florence, Italy, August 27-31, 2011.. (EI: 20123715411045, CCF-B)
陈龙, 吴志勇, 袁春, 蒙美玲, 蔡莲红. "面向数字版权管理的声纹辅助认证系统," [in] 全国人机语音通讯学术会议 (NCMMSC), pp. 1-7. 陕西西安, 2011年10月16日-18日.. (CCF-C)
Zhiyong Wu, Lianhong Cai, Helen Meng. "Modeling Prosody Patterns for Chinese Expressive Text-to-Speech Synthesis," [in] International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 148-152. Tainan, China, November 29-December 3, 2010.. (EI: 20110713663203)
Fanbo Meng, Helen Meng, Zhiyong Wu, Lianhong Cai. "Synthesizing Expressive Speech to Convey Focus using a Perturbation Model for Computer-Aided Pronunciation Training," [in] Second Language Studies: Acquisition, Learning, Education and Technology (L2WS), pp. 1-4. Tokyo, Japan, September 22-27, 2010..
Quansheng Duan, Shiyin Kang, Zhiyong Wu, Lianhong Cai, Zhiwei Shuang, Yong Qin. "Comparison of Syllable/Phone HMM Based Mandarin TTS," [in] International Conference on Pattern Recognition (ICPR), pp. 4496-4499. IAPR, Istanbul, Turkey, August 23-26, 2010.. (EI: 20104613390878, CCF-C, THU-B)
Shen Zhang, Zhiyong Wu, Helen Meng, Lianhong Cai. "Facial Expression Synthesis based on Emotion Dimensions for Affective Talking Avatar," Smart Innovation, Systems and Technologies (SIST), Modeling Machine Emotions for Realizing Intelligence, vol. 2010, no. 1, pp. 109-132. 2010.. (EI: 20123715421851)
张章, 贾珈, 蔡莲红, 吴志勇. "汉语音高模式及参数化描述的研究," [in] 中国语音学学术会议 (PCC), pp. 1-6. 天津, 2010年5月28日-30日..
Zhiyong Wu, Helen Meng, Hongwu Yang, Lianhong Cai. "Modeling the Expressivity of Input Text Semantics for Chinese Text-to-Speech Synthesis in a Spoken Dialog System," IEEE Transactions on Audio, Speech, and Language Processing (TASLP), vol. 17, no. 8, pp. 1567-1577. IEEE, November, 2009.. (SCI: WOS:000268903600010, EI: 20093612281690, CCF-B, THU-A)
Zhiyong Wu, Guangqi Cao, Helen Meng, Lianhong Cai. "A Unified Framework for Multilingual Text-to-Speech Synthesis with SSML Specification as Interface," [in] National Conference on Man-Machine Speech Communication (NCMMSC), pp. 623-630. Lanzhou, China, August 14-16, 2009.. (EI: 20094012358727, CCF-C)
Zhiyong Wu, Guangqi Cao, Helen Meng, Lianhong Cai. "A Unified Framework for Multilingual Text-to-Speech Synthesis with SSML Specification as Interface," Tsinghua Science and Technology, vol. 14, no. 5, pp. 623-630. Lanzhou, China, August 14-16, 2009.. (EI: 20094012358727)
段全盛, 康世胤, 双志伟, 吴志勇, 蔡莲红, 秦勇. "一种适合HMM汉语语音合成的建模单元挑选算法," [in] 全国人机语音通讯学术会议 (NCMMSC), pp. 434-439. 甘肃兰州, 2009年8月14日-16日.. (CCF-C)
Honglei Cong, Zhiyong Wu, Lianhong Cai, Helen Meng. "A New Prosodic Strength Calculation Method for Prosody Reduction Modeling," [in] International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 53-56. Kunming, China, December 16-19, 2008.. (EI: 20091011939031, Best Paper Finalist)
Zhiyong Wu, Jiying Wu, Helen Meng. "The Use of Dynamic Deformable Templates for Lip Tracking in an Audio-Visual Corpus with Large Variations in Head Pose, Face Illumination and Lip Shapes," [in] International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 370-373. Kunming, China, December 16-19, 2008.. (EI: 20091011939107)
Xinxin Zhou, Zhiyong Wu, Chun Yuan, Yuzhuo Zhong. "Document Structure Analysis and Text Normalization for Chinese Putonghua and Cantonese Text-to-Speech Synthesis," [in] International Symposium on Intelligent Information Technology Application (IITA), pp. 477-481. Shanghai, China, December 20-22, 2008.. (EI: 20091411996990)
Yu Wang, Zhiyong Wu, Lianhong Cai, Helen Meng. "Modeling the Synchrony between Audio and Visual Modalities for Speaker Identification," [in] Phonetic Conference of China and the International Symposium on Phonetic Frontiers (PCC), pp. 1-5. Beijing, China, April 18-20, 2008..
Shen Zhang, Zhiyong Wu, Helen Meng, Lianhong Cai. "Facial Expression Synthesis Using PAD Emotional Parameters for a Chinese Expressive Avatar," [in] International Conference on Affective Computing and Intelligent Interaction (ACII), pp. 24-35. Lisbon, Portugal, September 12-14, 2007.. (EI: 20080311024879)
Shen Zhang, Zhiyong Wu, Helen Meng, Lianhong Cai. "Head Movement Synthesis based on Semantic and Prosodic Features for a Chinese Expressive Avatar," [in] IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 837-840. IEEE, Hawaii, USA, April 15-20, 2007.. (EI: 20073210745929, CCF-B, THU-B)
Zhiyong Wu, Helen Meng, Hui Ning, Sam Tse. "A Corpus-based Approach for Cooperative Response Generation in a Dialog System," [in] International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 614-626. Singapore, December 13-16, 2006.. (SCI: WOS:000244824800058, EI: 20100912736122)
Hongwu Yang, Helen Meng, Zhiyong Wu, Lianhong Cai. "Modelling the Global Acoustic Correlates of Expressivity for Chinese Text-to-speech Synthesis," [in] IEEE Spoken Language Technology Workshop (SLT), pp. 138-141. IEEE, Palm Beach, Aruba, December 10-13, 2006.. (EI: 20083311451167, CCF-C)
Zhiyong Wu, Shen Zhang, Lianhong Cai, Helen Meng. "Real-time Synthesis of Chinese Visual Speech and Facial Expressions using MPEG-4 FAP Features in a Three-dimensional Avatar," [in] International Conference on Spoken Language Processing (INTERSPEECH - ICSLP), pp. 1802-1805. Pittsburgh, USA, September 17-21, 2006.. (EI: 20082511324456, THU-B)
Zhiyong Wu, Lianhong Cai, Helen Meng. "Weight Estimation for Audio-Visual Multi-level Fusion in Bimodal Speaker Identification," [in] International Conference on Intelligent Computing (ICIC), pp. 1107-1112. Springer, Kunming, China, August 16-19, 2006.. (SCI: WOS:000240385300144, CCF-C)
Zhiyong Wu, Lianhong Cai. "Audio-Visual Bimodal Speaker Identification Using Dynamic Bayesian Networks," Journal of Computer Research and Development, vol. 43, no. 3, pp. 470-475. March, 2006.. (EI: 2006239925198, THU-B)
Zhiyong Wu, Lianhong Cai, Lei Ma, Jia Jia. "Design and Implementation of a Multi-Biometric Platform," Mini-Micro Systems, vol. 27, no. 2, pp. 375-379. February, 2006..
Zhiyong Wu, Lianhong Cai, Helen Meng. "Multi-level Fusion of Audio and Visual Features for Speaker Identification," [in] International Conference on Biometrics (ICB), pp. 493-499. Hong Kong, China, January 5-7, 2006.. (SCI: WOS:000235768300066, EI: 2006249940530, CCF-C, THU-B)
吴志勇, 蔡莲红, 蒙美玲. "可视语音合成中基于音视频关联模型的视位参数优化," [in] 全国人机语音通讯学术会议 (NCMMSC), pp. 334-337. 北京, 2005年10月22日-24日.. (CCF-C, Best Paper)
Zhiyong Wu, Lianhong Cai, Rui Cai. "Perceptual Evaluation Weight Training for Text-to-Speech Synthesis," Tsinghua Science and Technology, vol. 45, no. 1, pp. 52-56. January, 2005.. (EI: 2005139014229)
Zhiyong Wu, Lianhong Cai. "Prosodic Correlation Model in Text-to-Speech Synthesis," Journal of Chinese Information Processing, vol. 18, no. 2, pp. 44-50. February, 2004..
Zhiming Wang, Lianhong Cai, Zhiyong Wu, Jianhua Tao. "Study of Text to Visual Speech in Chinese," Mini-Micro Systems, vol. 23, no. 4, pp. 474-477. April, 2002..
Jianhua Tao, Lianhong Cai, Shixia Zhao, Zhiyong Wu. "The Study of the trainable prosodic model for the Chinese text to speech system," ACTA ACUSTICA, vol. 26, no. 1, pp. 67-72. January, 2001..
Zhiyong, Wu, Lianhong Cai, Jianhua Tao. "基于汉语韵律参数的语音基元选取," [in] 全国人机语音通讯学术会议 (NCMMSC), pp. 199-202. 广东深圳, 2001年11月20日-22日.. (CCF-C)