• Publications

 

  • All
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • All
  • Speech Synthesis
  • Speech Recognition
  • Speaker Recognition
  • Speech Signal Processing
  • Affective Computing
  • Multimodal Speech and Language Processing
  • All
  • Journal
  • Conference
  • Selected
Haibin Wu, Xu Li, Andy T Liu, Zhiyong Wu, Helen Meng, Hung-Yi Lee, "Improving the Adversarial Robustness for Speaker Verification by Self-supervised Learning," IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), vol. 30, pp. 202-217. IEEE, January, 2022. (SCI: WOS:000742179300004, EI: 20215111368713, THU-A) Paper
Shun Lei, Yixuan Zhou, Liyang Chen, Zhiyong Wu, Shiyin Kang, Helen Meng, "Towards Expressive Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis," [in] Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7922-7926. Singapore, May 22-27, 2022. (EI: 20222312199470, CCF-B) Paper Demo
Jingbei Li, Yi Meng, Chenyi Li, Zhiyong Wu, Helen Meng, Chao Weng, Dan Su, "Enhancing Speaking Styles in Conversational Text-to-Speech Synthesis with Graph-Based Multi-Modal Context Modeling," [in] Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7917-7921. Singapore, May 22-27, 2022. (EI, CCF-B) Paper Demo
Liyang Chen, Zhiyong Wu, Jun Ling, Runnan Li, Xu Tan, Sheng Zhao, "Transformer-S2A: Robust and Efficient Speech-to-Animation," [in] Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7247-7251. Singapore, May 22-27, 2022. (EI: 20222312198574, CCF-B) Paper Demo
Xintao Zhao, Feng Liu, Changhe Song, Zhiyong Wu, Shiyin Kang, Deyi Tuo, Helen Meng, "Disentangling Content and Fine-Grained Prosody Information Via Hybrid ASR Bottleneck Features for Voice Conversion," [in] Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7022-7026. Singapore, May 22-27, 2022. (EI: 20222312198907, CCF-B) Paper Demo
Xueyuan Chen, Changhe Song, Yixuan Zhou, Zhiyong Wu, Changbin Chen, Zhongqin Wu, Helen Meng, "A Character-Level Span-Based Model for Mandarin Prosodic Structure Prediction," [in] Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7602-7606. Singapore, May 22-27, 2022. (EI: 20222312198495, CCF-B) Paper Code Demo
Wenlin Dai, Changhe Song, Xiang Li, Zhiyong Wu, Huashan Pan, Xiulin Li, Helen Meng, "An End-to-End Chinese Text Normalization Model Based on Rule-Guided Flat-Lattice Transformer," [in] Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7122-7126. Singapore, May 22-27, 2022. (EI: 20222312198496, CCF-B) Paper Code
Jingbei Li, Yi Meng, Zhiyong Wu, Helen Meng, Qiao Tian, Yuping Wang, Yuxuan Wang, "Neufa: Neural Network Based End-to-End Forced Alignment with Bidirectional Attention Mechanism," [in] Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8007-8011. Singapore, May 22-27, 2022. (EI: 20222312198218, CCF-B) Paper Code
Wenxuan Ye, Shaoguang Mao, Frank Soong, Wenshan Wu, Yan Xia, Jonathan Tien, Zhiyong Wu, "An Approach to Mispronunciation Detection and Diagnosis with Acoustic, Phonetic and Linguistic (APL) Embeddings," [in] Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6827-6831. Singapore, May 22-27, 2022. (EI: 20222312199246, CCF-B) Paper Demo
Jun Chen, Zilin Wang, Deyi Tuo, Zhiyong Wu, Shiyin Kang, Helen Meng, "FullSubNet+: Channel Attention Fullsubnet with Complex Spectrograms for Speech Enhancement," [in] Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7857-7861. Singapore, May 22-27, 2022. (EI, CCF-B) Paper Code Demo
Xixin Wu, Shoukang Hu, Zhiyong Wu, Xunying Liu, Helen Meng, "Neural Architecture Search for Speech Emotion Recognition," [in] Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6902-6906. Singapore, May 22-27, 2022. (EI: 20222312198129, CCF-B) Paper
Haibin Wu, Po-Chun Hsu, Ji Gao, Shanshan Zhang, Shen Huang, Jian Kang, Zhiyong Wu, Helen Meng, Hung-Yi Lee, "Adversarial Sample Detection for Speaker Verification by Neural Vocoders," [in] Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 236-240. Singapore, May 22-27, 2022. (EI: 20222312198990, CCF-B) Paper Code
Xixin Wu, Yuewen Cao, Hui Lu, Songxiang Liu, Disong Wang, Zhiyong Wu, Xunying Liu, Helen Meng, "Speech Emotion Recognition Using Sequential Capsule Networks," IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), vol. 29, pp. 3280-3291. IEEE, October, 2021. (SCI: WOS:000714713700004, EI: 20214311082562, THU-A) Paper
Xixin Wu, Yuewen Cao, Hui Lu, Songxiang Liu, Shiyin Kang, Zhiyong Wu, Xunying Liu, Helen Meng, "Exemplar-Based Emotive Speech Synthesis," IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), vol. 29, pp. 874-886. IEEE, January, 2021. (SCI: WOS:000619310400001, EI: 20210409830187, THU-A) Paper Demo
Yingmei Guo, Linjun Shou, Jian Pei, Ming Gong, Mingxing Xu, Zhiyong Wu, Daxin Jiang, "Learning from Multiple Noisy Augmented Data Sets for Better Cross-Lingual Spoken Language Understanding," [in] Proc. 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1-12. Punta Cana, Dominican Republic, November 7-11, 2021. (EI: 20221411909706, THU-A) Paper
Yaohua Bu, Tianyi Ma, Weijun Li, Hang Zhou, Jia Jia, Shengqi Chen, Kaiyuan Xu, Dachuan Shi, Haozhe Wu, Zhihan Yang, Kun Li, Zhiyong Wu, Yuanchun Shi, Xiaobo Lu, Ziwei Liu, "PTeacher: A Computer-Aided Personalized Pronunciation Training System with Exaggerated Audio-Visual Corrective Feedback," [in] Proc. 2021 CHI Conference on Human Factors in Computing Systems (CHI), pp. 1-14. Yokohama, Japan, May 8-13, 2021. (EI: 20212210439123, CCF-A) Paper Demo
Suping Zhou, Jia Jia, Zhiyong Wu, Zhihan Yang, Yanfeng Wang, Wei Chen, Fanbo Meng, Shuo Huang, Jialie Shen, Xiaochuan Wang, "Inferring Emotion from Large-Scale Internet Voice Data: A Semi-supervised Curriculum Augmentation based Deep Learning Approach," [in] Proc. the 35th AAAI Conference on Artificial Intelligence (AAAI), pp. 6039-6047. Virtual, Online, February 2-9, 2021. (EI: 20222012114882, CCF-A) Paper
Liangqi Liu, Jiankun Hu, Zhiyong Wu, Song Yang, Songfan Yang, Jia Jia, Helen Meng, "Controllable Emphatic Speech Synthesis based on Forward Attention for Expressive Speech Synthesis," [in] Proc. IEEE Spoken Language Technology Workshop (SLT), pp. 410-414. Shenzhen, China, January 19-22, 2021. (EI: 20211510210781, Best Paper Finalist) Paper Demo
Huirong Huang, Zhiyong Wu, Shiyin Kang, Dongyang Dai, Jia Jia, Tianxiao Fu, Deyi Tuo, Guangzhi Lei, Peng Liu, Dan Su, Dong Yu, Helen Meng, "Speaker Independent and Multilingual/Mixlingual Speech-driven Talking Head Generation Using Phonetic Posteriorgrams," [in] Proc. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 1433-1437. Tokyo, Japan, December 14-17, 2021. (EI: 20221211827369) Paper Demo
Aolan Sun, Jianzong Wang, Ning Cheng, Methawee Tantrawenith, Zhiyong Wu, Helen Meng, Edward Xiao, Jing Xiao, "Reconstructing Dual Learning for Neural Voice Conversion Using Relatively Few Samples," [in] Proc. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 946-953. December 13-17, 2021. (EI: 20221211830976) Paper
Xinyu Cai, Heinrich Dinkel, Zhiyong Yan, Yongqing Wang, Junbo Zhang, Zhiyong Wu, Yujun Wang, "A Contrastive Semi-Supervised Learning Framework For Anomaly Sound Detection," [in] Proc. Workshop on Detection and Classification of Acousitic Scenes and Events (DCASE), pp. 31-34. November 15–19, 2021. Paper Code Demo
Hui Lu, Zhiyong Wu, Xixin Wu, Xu Li, Shiyin Kang, Xunying Liu, Helen Meng, "VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis," [in] Proc. Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 3775-3779. Brno, Czech republic, August 30-September 3, 2021. (EI: 20214711186915, CCF-C) Paper Code Demo
Xiang Li, Changhe Song, Jingbei Li, Zhiyong Wu, Jia Jia, Helen Meng, "Towards Multi-Scale Style Control for Expressive Speech Synthesis," [in] Proc. Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 4673-4677. Brno, Czech republic, August 30-September 3, 2021. (EI: 20214711190435, CCF-C) Paper Demo
Jie Wang, Jingbei Li, Xintao Zhao, Zhiyong Wu, Shiyin Kang, Helen Meng, "Adversarially Learning Disentangled Speech Representations for Robust Multi-factor Voice Conversion," [in] Proc. Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 846-850. Brno, Czech republic, August 30-September 3, 2021. (EI: 20214711194412, CCF-C) Paper Demo
Haibin Wu, Yang Zhang, Zhiyong Wu, Dong Wang, Hung-Yi Lee, "Voting for the Right Answer: Adversarial Defense for Speaker Verification," [in] Proc. Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 4294-4298. Brno, Czech republic, August 30-September 3, 2021. (EI: 20214711194533, CCF-C) Paper Code Demo
Xingchen Song, Zhiyong Wu, Yiheng Huang, Chao Weng, Dan Su, Helen Meng, "Non-Autoregressive Transformer ASR with CTC-Enhanced Decoder Input," [in] Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5894-5898. Toronto, Canada, June 6-11, 2021. (EI: 20213810913803, CCF-B) Paper
Changhe Song, Jingbei Li, Yixuan Zhou, Zhiyong Wu, Helen Meng, "Syntactic Representation Learning for Neural Network based TTS with Syntactic Parse Tree Traversal," [in] Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6064-6068. Toronto, Canada, June 6-11, 2021. (EI: 20213810921439, CCF-B) Paper Demo
Xiong Cai, Dongyang Dai, Zhiyong Wu, Xiang Li, Jingbei Li, Helen Meng, "Emotion Controllable Speech Synthesis using Emotion-Unlabeled Dataset with the Assistance of Cross-Domain Speech Emotion Recognition," [in] Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5734-5738. Toronto, Canada, June 6-11, 2021. (EI: 20213810922222, CCF-B) Paper Code Demo
Jie Wnag, Yuren You, Feng Liu, Deyi Tuo, Shiyin Kang, Zhiyong Wu, Helen Meng, "The Huya Multi-speaker and Multi-style Speech Synthesis System for M2VOC Challenge 2020," [in] Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8608-8612. Toronto, Canada, June 6-11, 2021. (EI: 20213810913901, CCF-B) Paper
Haibin Wu, Xu Li, Andy T. Liu, Zhiyong Wu, Helen Meng, Hung-Yi Lee, "Adversarial Defense for Automatic Speaker Verification by Cascaded Self-supervised Learning Models," [in] Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6718-6722. Toronto, Canada, June 6-11, 2021. (EI: 20213810914628, CCF-B) Paper
Bin Su, Shaoguang Mao, Frank Soong, Yan Xia, Jonathan Tien, Zhiyong Wu, "Improving Pronunciation Assessment via Ordinal Regression with Anchored Reference Samples," [in] Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7748-7752. Toronto, Canada, June 6-11, 2021. (EI: 20213810908107, CCF-B) Paper
Qicong Xie, Xiaohai Tian, Guanghou Liu, Kun Song, Lei Xie, Zhiyong Wu, Hai Li, Song Shi, Haizhou Li, Fen Hong, Hui Bu, Xin Xu, "The Multi-speaker Multi-style Voice Cloning Challenge 2021," [in] Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8613-8617. Toronto, Canada, June 6-11, 2021. (EI: 20213810922367, CCF-B) Paper
Xiong Cai, Zhiyong Wu, Kuo Zhong, Bin Su, Dongyang Dai, Helen Meng, "Unsupervised Cross-Lingual Speech Emotion Recognition Using Domain Adversarial Neural Network," [in] Proc. International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 1-5. Hong Kong, China, January 24-26, 2021. (EI: 20211210098767) Paper
Michael Lao BanTeng, Zhiyong Wu, "Channel-Wise Dense Connection Graph Convolutional Network for Skeleton-Based Action Recognition," [in] Proc. International Conference on Pattern Recognition (ICPR), pp. 3799-3806. Milan, Italy, January 10-15, 2021. (EI: 20212910658234, THU-B) Paper
Xingchen Song, Zhiyong Wu, Yiheng Huang, Dan Su, Helen Meng, "SpecSwap: A Simple Data Augmentation Method for End-to-End Speech Recognition," [in] Proc. Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 581-585. Shanghai, China, October 25-29, 2020. (EI: 20205209692178, CCF-C) Paper
Xingchen Song, Guangsen Wang, Yiheng Huang, Zhiyong Wu, Dan Su, Helen Meng, "Speech-XLNet: Unsupervised Acoustic Model Pretraining For Self-Attention Networks," [in] Proc. Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 3765-3769. Shanghai, China, October 25-29, 2020. (EI: 20205209692164, CCF-C) Paper
Kun Zhang, Zhiyong Wu, Daode Yuan, Jian Luan, Jia Jia, Helen Meng, Binheng Song, "Re-weighted Interval Loss for Handling Data Imbalance Problem of End-to-End Keyword Spotting," [in] Proc. Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 2567-2571. Shanghai, China, October 25-29, 2020. (EI: 20205209692622, CCF-C) Paper
Xiangyu Liang, Zhiyong Wu, Runnan Li, Yanqing Liu, Sheng Zhao, Helen Meng, "Enhancing Monotonicity for Robust Autoregressive Transformer TTS," [in] Proc. Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 3181-3185. Shanghai, China, October 25-29, 2020. (EI: 20205209692668, CCF-C) Paper Demo
Yuewen Cao, Songxiang Liu, Xixin Wu, Shiyin Kang, Peng Liu, Zhiyong Wu, Xunying Liu, Dan Su, Dong Yu, Helen Meng, "Code-Switched Speech Synthesis Using Bilingual Phonetic Posteriorgram with Only Monolingual Corpora," [in] Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7619-7623. Barcelona, Spain, May 4-8, 2020. (EI: 20203309041046, CCF-B) Paper Demo
Songxiang Liu, Disong Wang, Yuewen Cao, Lifa Sun, Xixin Wu, Shiyin Kang, Zhiyong Wu, Xunying Liu, Dan Su, Dong Yu, Helen Meng, "End-To-End Accent Conversion Without Using Native Utterances," [in] Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6289-6293. Barcelona, Spain, May 4-8, 2020. (EI: 20203309040748, CCF-B) Paper Demo
Yingmei Guo, Zhiyong Wu, Mingxing Xu, "FERNet: Fine-grained Extraction and Reasoning Network for Emotion Recognition in Dialogues," [in] Proc. Asia-Pacific Chapter of the Association for Computational Linguistics and International Joint Conference on Natural Language Processing (AACL-IJCNLP), pp. 37-43. Suzhou, China, December 4-7, 2020. Paper
Runnan Li, Zhiyong Wu, Jia Jia, Yaohua Bu, Sheng Zhao, Helen Meng, "Towards Discriminative Representation Learning for Speech Emotion Recognition," [in] Proc. International Joint Conference on Artificial Intelligence (IJCAI), pp. 5060-5066. Macao, China, August 10-16, 2019. (EI: 20194607696464, CCF-A) Paper Code
Yishuang Ning, Sheng He, Zhiyong Wu, Chunxiao Xing, Liangjie Zhang, "A Review of Deep Learning Based Speech Synthesis," Applied Sciences-Basel, vol. 9, no. 19, pp. 4050. MDPI, September, 2019. (SCI: WOS:000496258100108) Paper
Liangqi Liu, Zhiyong Wu, Runnan Li, Jia Jia, Helen Meng, "Learning Contextual Representation with Convolution Bank and Multi-head Self-attention for Speech Emphasis Detection," [in] Proc. APSIPA Annual Summit and Conference (APSIPA ASC), pp. 922-926. Lanzhou, China, November 18-21, 2019. (EI: 20201308362271) Paper
Kun Zhang, Zhiyong Wu, Jia Jia, Helen Meng, Binheng Song, "Query-by-Example Spoken Term Detection using Attentive Pooling Networks," [in] Proc. APSIPA Annual Summit and Conference (APSIPA ASC), pp. 1267-1272. Lanzhou, China, November 18-21, 2019. (EI: 20201308362101) Paper
Yao Du, Zhiyong Wu, Shiyin Kang, Dan Su, Dong Yu, Helen Meng, "Automatic Prosodic Structure Labeling using DNN-BGRU-CRF Hybrid Neural Network," [in] Proc. APSIPA Annual Summit and Conference (APSIPA ASC), pp. 1234-1238. Lanzhou, China, November 18-21, 2019. (EI: 20201308362428) Paper
Yao Du, Zhiyong Wu, Shiyin Kang, Dan Su, Dong Yu, Helen Meng, "Prosodic Structure Prediction using Deep Self-attention Neural Network," [in] Proc. APSIPA Annual Summit and Conference (APSIPA ASC), pp. 320-324. Lanzhou, China, November 18-21, 2019. (EI: 20201308362388) Paper
Yulan Chen, Zhiyong Wu, Jia Jia, "Modeling Emotion Influence Using Attention-based Graph Convolutional Recurrent Network," [in] Proc. International Conference on Multimodal Interaction (ICMI), pp. 302-309. Suzhou, China, October 14-18, 2019. (EI: 20194607696646, CCF-C) Paper
Hui Lu, Zhiyong Wu, Dongyang Dai, Runnan Li, Shiyin Kang, Jia Jia, Helen Meng, "One-shot Voice Conversion with Global Speaker Embeddings," [in] Proc. Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 669-673. Graz, Austria, September 15-19, 2019. (EI: 20194607674295, CCF-C) Paper Demo
Dongyang Dai, Zhiyong Wu, Shiyin Kang, Xixin Wu, Jia Jia, Dan Su, Dong Yu, Helen Meng, "Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-trained BERT," [in] Proc. Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 2090-2094. Graz, Austria, September 15-19, 2019. (EI: 20194607674520, CCF-C) Paper
Jingbei Li, Zhiyong Wu, Runnan Li, Pengpeng Zhi, Song Yang, Helen Meng, "Knowledge-based Linguistic Encoding for End-to-End Mandarin Text-to-Speech Synthesis," [in] Proc. Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 4494-4498. Graz, Austria, September 15-19, 2019. (EI: 20194607674398, CCF-C) Paper
Yingmei Guo, Mingxing Xu, Zhiyong Wu, Jianming Wu, Bin Su, "Multi-Scale Convolutional Recurrent Neural Network with Ensemble Method for Weakly Labeled Sound Event Detection," [in] Proc. International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW), pp. 110-114. Cambridge, UK, September 3-6, 2019. (EI: 20200308046817) Paper
Dongyang Dai, Zhiyong Wu, Runnan Li, Xixin Wu, Jia Jia, Helen Meng, "Learning Discriminative Features from Spectrograms Using Center Loss for Speech Emotion Recognition," [in] Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7405-7409. Brighton, UK, May 12-17, 2019. (EI: 20193007228731, CCF-B) Paper
Hui Lu, Zhiyong Wu, Runnan Li, Shiyin Kang, Jia Jia, Helen Meng, "A Compact Framework for Voice Conversion Using Wavenet Conditioned on Phonetic Posteriorgrams," [in] Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6810-6814. Brighton, UK, May 12-17, 2019. (EI: 20192907201683, CCF-B) Paper Demo
Mu Wang, Xixin Wu, Zhiyong Wu, Shiyin Kang, Deyi Tuo, Guangzhi Li, Dan Su, Dong Yu, Helen Meng, "Quasi-fully Convolutional Neural Network with Variational Inference for Speech Synthesis," [in] Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7060-7064. Brighton, UK, May 12-17, 2019. (EI: 20192907202523, CCF-B) Paper Demo
Runnan Li, Zhiyong Wu, Jia Jia, Sheng Zhao, Helen Meng, "Dilated Residual Network with Multi-head Self-attention for Speech Emotion Recognition," [in] Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6675-6679. Brighton, UK, May 12-17, 2019. (EI: 20192907202018, CCF-B) Paper
Shaoguang Mao, Zhiyong Wu, Jingshuai Jiang, Peiyun Liu, Frank K. Soong, "NN-based Ordinal Regression for Assessing Fluency of ESL Speech," [in] Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7420-7424. Brighton, UK, May 12-17, 2019. (EI: 20192907202051, CCF-B) Paper
Xixin Wu, Songxiang Liu, Yuewen Cao, Xu Li, Jianwei Yu, Dongyang Dai, Xi Ma, Shoukang Hu, Zhiyong Wu, Xunying Liu, Helen Meng, "Speech Emotion Recognition Using Capsule Networks," [in] Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6695-6699. Brighton, UK, May 12-17, 2019. (EI: 20192907201454, CCF-B) Paper
Yuewen Cao, Xixin Wu, Songxiang Liu, Jianwei Yu, Xu Li, Zhiyong Wu, Xunying Liu, Helen Meng, "End-to-End Code-switched TTS with Mix of Monolingual Recordings," [in] Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6935-6939. Brighton, UK, May 12-17, 2019. (EI: 20192907201672, CCF-B) Paper Demo
Runnan Li, Zhiyong Wu, Jia Jia, Jingbei Li, Wei Chen, Helen Meng, "Inferring User Emotive State Changes in Realistic Human-Computer Conversational Dialogs," [in] Proc. ACM Multimedia Conference (ACM MM), pp. 136-144. Seoul, Korea, October 22-26, 2018. (EI: 20185006246269, CCF-A) Paper
Kun Li, Shaoguang Mao, Xu Li, Zhiyong Wu, Helen Meng, "Automatic Lexical Stress and Pitch Accent Detection for L2 English Speech using Multi-Distribution Deep Neural Networks," Speech Communication (Speech Com), vol. 96, pp. 28-36. Elsevier, February, 2018. (SCI: WOS:000424723700003, EI: 20174704448303, CCF-B) Paper
Jingbei Li, Zhiyong Wu, Runnan Li, Mingxing Xu, Kehua Lei, Lianhong Cai, "Multi-modal Multi-scale Speech Expression Evaluation in Computer-Assisted Language Learning," Lecture Notes in Computer Science, [in] Proc. Artificial Intelligence and Mobile Services (AIMS), vol. 10970, pp. 16-28. Seattle, USA, June 25-30, 2018. (SCI: WOS:000443112000002, EI: 20182705519834) Paper
Ziwei Zhu, Zhiyong Wu, Runnan Li, Yishuang Ning, Helen Meng, "Learning Frame-Level Recurrent Neural Networks Representations for Query-by-Example Spoken Term Detection on Mobile Devices," Lecture Notes in Computer Science, [in] Proc. Artificial Intelligence and Mobile Services (AIMS), vol. 10970, pp. 55-66. Seattle, USA, June 25-30, 2018. (SCI: WOS:000443112000005, EI: 20182705519838) Paper
Mu Wang, Zhiyong Wu, Shiyin Kang, Xixin Wu, Jia Jia, Dan Su, Dong Yu, Helen Meng, "Speech Super Resolution Using Parallel WaveNet," [in] Proc. International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 260-264. Taipei, China, November 26-29, 2018. (EI: 20192106959272) Paper
Ziwei Zhu, Zhiyong Wu, Runnan Li, Helen Meng, Lianhong Cai, "Siamese Recurrent Auto-Encoder Representation for Query-by-Example Spoken Term Detection," [in] Proc. Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 102-106. Hyderabad, India, September 2-6, 2018. (EI: 20184305969082, CCF-C) Paper
Xixin Wu, Yuewen Cao, Mu Wang, Songxiang Liu, Shiyin Kang, Zhiyong Wu, Xunying Liu, Dan Su, Dong Yu, Helen Meng, "Rapid Style Adaptation using Residual Error Embedding for Expressive Speech Synthesis," [in] Proc. Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 3072-3076. Hyderabad, India, September 2-6, 2018. (EI: 20184305968770, CCF-C) Paper Demo
Shuai Yang, Zhiyong Wu, Binbin Shen, Helen Meng, "Detection of Glottal Closure Instants from Speech Signals: A Convolutional Neural Network based Method," [in] Proc. Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 317-321. Hyderabad, India, September 2-6, 2018. (EI: 20184305968631, CCF-C) Paper
Xi Ma, Zhiyong Wu, Jia Jia, Mingxing Xu, Helen Meng, Lianhong Cai, "Emotion Recognition from Variable-Length Speech Segments using Deep Learning on Spectrograms," [in] Proc. Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 3683-3687. Hyderabad, India, September 2-6, 2018. (EI: 20184305969207, CCF-C) Paper
Shaoguang Mao, Zhiyong Wu, Xu Li, Runnan Li, Xixin Wu, Helen Meng, "Integrating Articulatory Features into Acoustic-Phonemic Model for Mispronunciation Detection and Diagnosis in L2 English Speech," [in] Proc. IEEE International Conference on Multimedia and Expo (ICME), pp. 1-6. San Diego, USA, July 23-27, 2018. (EI: 20190706509298, CCF-B) Paper
Runnan Li, Zhiyong Wu, Yuchen Huang, Jia Jia, Helen Meng, Lianhong Cai, "Emphatic Speech Generation with Conditional Input Layer and Bidirectional LSTMs for Expressive Speech Synthesis," [in] Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5129-5133. Calgary, Canada, April 15-20, 2018. (EI: 20184005908536, CCF-B) Paper
Shaoguang Mao, Zhiyong Wu, Runnan Li, Xu Li, Helen Meng, Lianhong Cai, "Applying Multitask Learning to Acoustic-Phonemic Model for Mispronunciation Detection and Diagnosis in L2 English Speech," [in] Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6254-6258. Calgary, Canada, April 15-20, 2018. (EI: 20184005907878, CCF-B) Paper
Shaoguang Mao, Xu Li, Kun Li, Zhiyong Wu, Xunying Liu, Helen Meng, "Unsupervised Discovery of An Extended Phoneme Set in L2 English Speech for Mispronunciation Detection and Diagnosis," [in] Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6244-6248. Calgary, Canada, April 15-20, 2018. (EI: 20184005908409, CCF-B) Paper
Xixin Wu, Lifa Sun, Shiyin Kang, Songxiang Liu, Zhiyong Wu, Xunying Liu, Helen Meng, "Feature based Adaptation for Speaking Style Synthesis," [in] Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5304-5308. Calgary, Canada, April 15-20, 2018. (EI: 20184005907958, CCF-B) Paper
Mu Wang, Zhiyong Wu, Xixin Wu, Helen Meng, Shiyin Kang, Jia Jia, Lianhong Cai, "Emphatic Speech Synthesis and Control based on Characteristic Transferring in End-to-End Speech Synthesis," [in] Proc. Asian Conference on Affective Computing and Intelligent Interaction (ACII Asia), pp. 1-6. Beijing, China, May 20-22, 2018. (EI: 20184406009875) Paper
Yishuang Ning, Jia Jia, Zhiyong Wu, Runnan Li, Yongsheng An, Yanfeng Wang, Helen Meng, "Multi-task Deep Learning for User Intention Understanding in Speech Interaction Systems," [in] Proc. AAAI Conference on Artificial Intelligence (AAAI), pp. 161-167. San Francisco, USA, February 4-9, 2017. (EI: 20174104242835, CCF-A) Paper
Runnan Li, Zhiyong Wu, Yishuang Ning, Lifa Sun, Helen Meng, Lianhong Cai, "Spectro-Temporal Modelling with Time-Frequency LSTM and Structured Output Layer for Voice Conversion," [in] Proc. Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 3409-3413. Stockholm, Sweden, August 20-24, 2017. (EI: 20175204590811, CCF-C) Paper
Yuchen Huang, Zhiyong Wu, Runnan Li, Helen Meng, Lianhong Cai, "Multi-Task Learning for Prosodic Structure Generation using BLSTM RNN with Structured Output Layer," [in] Proc. Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 779-783. Stockholm, Sweden, August 20-24, 2017. (EI: 20175204591488, CCF-C) Paper
Xi Ma, Zhiyong Wu, Jia Jia, Mingxing Xu, Helen Meng, Lianhong Cai, "Speech Emotion Recognition with Emotion-Pair based Framework Considering Emotion Distribution Information in Dimensional Emotion Space," [in] Proc. Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 1238-1242. Stockholm, Sweden, August 20-24, 2017. (EI: 20175204591394, CCF-C) Paper
Yishuang Ning, Zhiyong Wu, Runnan Li, Jia Jia, Mingxing Xu, Helen Meng, Lianhong Cai, "Learning Cross-Lingual Knowledge with Multilingual BLSTM for Emphasis Detection with Limited Training Data," [in] Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5615-5619. New Orleans, USA, March 5-9, 2017. (EI: 20172903955037, CCF-B) Paper
Runnan Li, Zhiyong Wu, Xunying Liu, Helen Meng, Lianhong Cai, "Multi-Task Learning of Structured Output Layer Bidirectional LSTMs for Speech Synthesis," [in] Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5510-5514. New Orleans, USA, March 5-9, 2017. (EI: 20172903955266, CCF-B) Paper
Xixin Wu, Shiyin Kang, Lifa Sun, Yishuang Ning, Zhiyong Wu, Helen Meng, "Attention-based Recurrent Generator with Gaussian Tolerance for Statistical Parametric Speech Synthesis," [in] Proc. Affective Social Multimedia Computing (ASMMC), pp. 1-5. Stockholm, Sweden, August 20-24, 2017. Paper
Runnan Li, Zhiyong Wu, Helen Meng, Lianhong Cai, "DBLSTM-based Multi-Task Learning for Pitch Transformation in Voice Conversion," [in] Proc. International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 1-5. Tianjin, China, October 17-20, 2016. (EI: 20172303743441) Paper
Xu Li, Zhiyong Wu, Helen Meng, Jia Jia, Xiaoyan Lou, Lianhong Cai, "Phoneme Embedding and its Application to Speech Driven Talking Avatar Synthesis," [in] Proc. Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 1472-1476. San Francisco, USA, September 8-12, 2016. (EI: 20164603004231, CCF-C) Paper
Xu Li, Zhiyong Wu, Helen Meng, Jia Jia, Xiaoyan Lou, Lianhong Cai, "Expressive Speech Driven Talking Avatar Synthesis with DBLSTM using Limited Amount of Emotional Bimodal Data," [in] Proc. Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 1477-1481. San Francisco, USA, September 8-12, 2016. (EI: 20164603004232, CCF-C) Paper
Yaodong Tang, Zhiyong Wu, Helen Meng, Mingxing Xu, Lianhong Cai, "Analysis on Gated Recurrent Unit based Question Detection Approach," [in] Proc. Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 735-739. San Francisco, USA, September 8-12, 2016. (EI: 20164603003979, CCF-C) Paper
Linchuan Li, Zhiyong Wu, Mingxing Xu, Helen Meng, Lianhong Cai, "Combining CNN and BLSTM to Extract Textual and Acoustic Features for Recognizing Stances in Mandarin Ideological Debate Competition," [in] Proc. Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 1392-1396. San Francisco, USA, September 8-12, 2016. (EI: 20164603003717, CCF-C) Paper
Linchuan Li, Zhiyong Wu, Mingxing Xu, Helen Meng, Lianhong Cai, "Recognizing Stances in Mandarin Social Ideological Debates with Text and Acoustic Features," [in] Proc. IEEE International Conference on Multimedia and Expo (ICME), pp. 1-6. Seattle, USA, July 11-15, 2016. (EI: 20164302952120, CCF-B) Paper
Haishu Xianyu, Mingxing Xu, Zhiyong Wu, Lianhong Cai, "Heterogeneity-Entropy based Unsupervised Feature Learning for Personality Prediction with Cross-media Data," [in] Proc. IEEE International Conference on Multimedia and Expo (ICME), pp. 1-6. Seattle, USA, July 11-15, 2016. (EI: 20163802815545, CCF-B) Paper
Yaodong Tang, Yuchen Huang, Zhiyong Wu, Helen Meng, Mingxing Xu, Lianhong Cai, "Question Detection from Acoustic Features using Recurrent Neural Network with Gated Recurrent Unit," [in] Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6125-6129. Shanghai, China, March 20-25, 2016. (EI: 20162402488463, CCF-B) Paper
Quanjie Yu, Peng Liu, Zhiyong Wu, Shiyin Kang, Helen Meng, Lianhong Cai, "Learning Cross-lingual Information with Multilingual BLSTM for Speech Synthesis of Low-resource Languages," [in] Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5545-5549. Shanghai, China, March 20-25, 2016. (EI: 20162402488723, CCF-B) Paper
Xinyu Lan, Xu Li, Yishuang Ning, Zhiyong Wu, Helen Meng, Jia Jia, Lianhong Cai, "Low Level Descriptors based DBLSTM Bottleneck Feature for Speech Driven Talking Avatar," [in] Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5550-5554. Shanghai, China, March 20-25, 2016. (EI: 20162402488482, CCF-B) Paper
Zhiyong Wu, Yishuang Ning, Xiao Zang, Jia Jia, Fanbo Meng, Helen Meng, Lianhong Cai, "Generating Emphatic Speech with Hidden Markov Model for Expressive Speech Synthesis," Multimedia Tools and Applications (MTA), vol. 74, no. 22, pp. 9909-9925. Springer, July, 2015. (SCI: WOS:000364019400005, EI: 20143600027913, CCF-C) Paper
Zhiyong Wu, Kai Zhao, Xixin Wu, Xinyu Lan, Helen Meng, "Acoustic to Articulatory Mapping with Deep Neural Network," Multimedia Tools and Applications (MTA), vol. 74, no. 22, pp. 9889-9907. Springer, August, 2015. (SCI: WOS:000364019400004, EI: 20143600014973, CCF-C) Paper
Qi Lyu, Zhiyong Wu, Jun Zhu, "Polyphonic Music Modelling with LSTM-RTRBM," [in] Proc. ACM Multimedia Conference (ACM MM), pp. 991-994. Brisbane, Australia, October 26-30, 2015. (EI: 20161602252616, CCF-A) Paper
Qi Lyu, Zhiyong Wu, Jun Zhu, Helen Meng, "Modelling High-dimensional Sequences with LSTM-RTRBM: Application to Polyphonic Music Generation," [in] Proc. International Joint Conference on Artificial Intelligence (IJCAI), pp. 4138-4139. Buenos Aires, Argentina, July 25-31, 2015. (EI: 20155101693661, CCF-A) Paper
Peng Liu, Quanjie Yu, Zhiyong Wu, Shiyin Kang, Helen Meng, Lianhong Cai, "A Deep Recurrent Approach for Acoustic-to-Articulatory Inversion," [in] Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4450-4454. Brisbane, Australia, April 19-24, 2015. (EI: 20154501510018, CCF-B) Paper
Yishuang Ning, Zhiyong Wu, Jia Jia, Fanbo Meng, Helen Meng, Lianhong Cai, "HMM-based Emphatic Speech Synthesis for Corrective Feedback in Computer-Aided Pronunciation Training," [in] Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4934-4938. Brisbane, Australia, April 19-24, 2015. (EI: 20154501509415, CCF-B) Paper
Yishuang Ning, Zhiyong Wu, Xiaoyan Lou, Helen Meng, Jia Jia, Lianhong Cai, "Using Tilt for Automatic Emphasis Detection with Bayesian Networks," [in] Proc. Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 578-582. Dresden, Germany, September 6-10, 2015. (EI: 20160902029674, CCF-C) Paper
Xixin Wu, Zhiyong Wu, Yishuang Ning, Jia Jia, Lianhong Cai, Helen Meng, "Understanding Speaking Styles of Internet Speech Data with LSTM and Low-resource Training," [in] Proc. International Conference on Affective Computing and Intelligent Interaction (ACII), pp. 815-820. Xi'an, China, September 21-24, 2015. (EI: 20161502238729) Paper
孟凡博, 吴志勇, 贾珈, 蔡莲红, "汉语重音的凸显度分析与合成," 声学学报, 2015. 40(1): 1-11. January, 2015. 【Fanbo Meng, Zhiyong Wu, Jia Jia, Lianhong Cai, "The Prominence Analysis and Synthesis of Emphasis in Putonghua," ACTA Acustica, vol. 40, no. 1, pp. 1-11. January, 2015.】 (EI: 20151000618075) Paper
黄雨晨, 徐明星, 吴志勇, 蔡莲红, "表征句式语气的声学信息分布," [in] 全国人机语音通讯学术会议 (NCMMSC). 天津, 2015.10.25-27. 【Yuchen Huang, Mingxing Xu, Zhiyong Wu, Lianhong Cai, "Study on the Distribution of Acoustic Features Characterizing Sentence Intonation," [in] Proc. National Conference on Man-Machine Speech Communication (NCMMSC). Tianjin, China, October 25-27, 2015.】 Paper
Fanbo Meng, Zhiyong Wu, Jia Jia, Helen Mebg, Lianhong Cai, "Synthesizing English Emphatic Speech for Multimodal Corrective Feedback in Computer-Aided Pronunciation Training," Multimedia Tools and Applications (MTA), vol. 73, no. 1, pp. 463-489. Springer, September, 2014. (SCI: WOS:000342418700022, EI: 20143600046713, CCF-C) Paper
Jia Jia, Zhiyong Wu, Shen Zhang, Helen Meng, Lianhong Cai, "Head and Facial Gestures Synthesis using PAD Model for an Expressive Talking Avatar," Multimedia Tools and Applications (MTA), vol. 73, no. 1, pp. 439-461. Springer, September, 2014. (SCI: WOS:000342418700023, EI: 20143600046670, CCF-C) Paper
Xin Zheng, Zhiyong Wu, Helen Meng, Lianhong Cai, "Contrastive Auto-encoder for Phoneme Recognition," [in] Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2548-2552. Florence, Italy, May 4-9, 2014. (EI: 20143218037687, CCF-B) Paper
Xin Zheng, Zhiyong Wu, Helen Meng, Lianhong Cai, "Learning Dynamic Features with Neural Networks for Phoneme Recognition," [in] Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2543-2547. Florence, Italy, May 4-9, 2014. (EI: 20143218037686, CCF-B) Paper
Xiao Zang, Zhiyong Wu, Helen Meng, Jia Jia, Lianhong Cai, "Using Conditional Random Fields to Predict Focus Word Pair in Spontaneous Spoken English," [in] Proc. Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 756-760. Singapore, September 14-18, 2014. (EI: 20144600199537, CCF-C) Paper
Xixin Wu, Zhiyong Wu, Jia Jia, Helen Meng, Lianhong Cai, "Automatic Speech Data Clustering with Human Perception based Weighted Distance," [in] Proc. International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 216-220. Singapore, September 14-18, 2014. (EI: 20144900274075) Paper
Xiao Zang, Zhiyong Wu, Yishuang Ning, Helen Meng, Lianhong Cai, "Automatic Detection of Contrastive Word Pairs using Textual and Acoustic Features," [in] Proc. International Conference on Signal Processing Proceedings (ICSP), pp. 594-598. Hangzhou, China, October 19-23, 2014. (EI: 20153101078079) Paper
Yuchao Fan, Mingxing Xu, Zhiyong Wu, Lianhong Cai, "Automatic Emotion Variation Detection using Multi-Scaled Sliding Window," [in] Proc. IEEE International Conference on Orange Technologies (ICOT), pp. 229-233. Xi'an, China, September 20-23, 2014. (EI: 20145000323155) Paper
王欣, 吴志勇, 蔡莲红, "语音合成中基于稳定段边界的不定长基元选取," 软件学报, 2014, 25(S2): 63-69. 【Xin Wang, Zhiyong Wu, Lianhong Cai, "Stable Boundary-based Non-uniform Unit Selection in Speech Synthesis," Journal of Software, vol. 25, Supplement (2), pp. 63-69, December, 2014. Also [in] 第九届和谐人机环境联合学术会议 (HHME). 南昌, 2013.9.27-28.】 (EI: 20152100877399)
孟凡博, 吴志勇, 蒙美玲, 贾珈, 蔡莲红, "基于决策树的英语焦点语音转换," 清华大学学报(自然科学版), 2013, 53(7): 1046-1051. 【Fanbo Meng, Zhiyong Wu, Helen Meng, Jia Jia, Lianhong Cai, "English Emphatic Speech Conversion based on a Decision Tree," Journal of Tsinghua University, vol. 53, no. 7, pp. 1046-1051. July, 2013.】 (EI: 20135217144112) Paper
Xin Zheng, Zhiyong Wu, Binbin Shen, Helen Meng, Lianhong Cai, "Investigation of Tandem Deep Belief Network Approach for Phoneme Recognition," [in] Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7586-7590. Vancouver, Canada, May 26-31, 2013. (EI: 20135217121577, CCF-B) Paper
Jianbo Jiang, Zhiyong Wu, Mingxing Xu, Jia Jia, Lianhong Cai, "Comparing Feature Dimension Reduction Algorithms for GMM-SVM based Speech Emotion Recognition," [in] Proc. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). Kaohsiung, China, October 29 - November 1, 2013. (EI: 20140717305313) Paper
Kai Zhao, Zhiyong Wu, Lianhong Cai, "A Real-time Speech Driven Talking Avatar based on Deep Neural Network," [in] Proc. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). Kaohsiung, China, October 29 - November 1, 2013. (EI: 20140717305312) Paper
Jia Jia, Xiaohui Wang, Zhiyong Wu, Lianhong Cai, Helen Meng, "Modeling the Correlation between Modality Semantics and Facial Expressions," [in] Proc. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). Hollywood, USA, December 3-6, 2012. (EI: 20131016079234) Paper
Jianbo Jiang, Zhiyong Wu, Mingxing Xu, Jia Jia, Lianhong Cai, "Comparison of Adaptation Methods for GMM-SVM based Speech Emotion Recognition," [in] Proc. IEEE Workshop on Spoken Language Technology (SLT), pp. 269-273. Miami, USA, December 2-5, 2012. (EI: 20130916065166) Paper
Tao Jiang, Zhiyong Wu, Jia Jia, Lianhong Cai, "Perceptual Clustering based Unit Selection Optimization for Concatenative Text-to-Speech Synthesis," [in] Proc. International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 64-68. Hong Kong, China, December 5-8, 2012. (EI: 20131016084519) Paper
Chunrong Li, Zhiyong Wu, Fanbo Meng, Helen Meng, Lianhong Cai, "Detection and Emphatic Realization of Contrastive Word Pairs for Expressive Text-to-Speech Synthesis," [in] Proc. International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 93-97. Hong Kong, China, December 5-8, 2012. (EI: 20131016084523) Paper
Xixin Wu, Zhiyong Wu, Jia Jia, Lianhong Cai, "Adaptive Named Entity Recognition based on Conditional Random Fields with Automatic Updated Dynamic Gazetteers," [in] Proc. International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 363-367. Hong Kong, China, December 5-8, 2012. (EI: 20131016084525) Paper
Fanbo Meng, Zhiyong Wu, Helen Meng, Jia Jia, Lianhong Cai, "Hierarchical English Emphatic Speech Synthesis based on HMM with Limited Training Data," [in] Proc. Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 466-469. Portland, USA, September 8-13, 2012. (EI: 20132316399086, CCF-C) Paper
Fanbo Meng, Zhiyong Wu, Helen Meng, Jia Jia, Lianhong Cai, "Generating Emphasis from Neutral Speech using Hierarchical Perturbation Model by Decision Tree and Support Vector Machine," [in] Proc. International Conference on Audio, Language and Image Processing (ICALIP), pp. 442-448. Shanghai, China, July 16-18, 2012. (EI: 20130315907216) Paper
Zhang Zhang, Zhiyong Wu, Jia Jia, Lianhong Cai, "Modeling Prosody Pattern of Chinese Expressive Speech and Its Application in Personalized Speech Conversion," [in] Proc. Proc. International Symposium on Tonal Aspects of Languages (TAL). Nanjing, China, May 26-29, 2012. Paper
Kai Zhao, Zhiyong Wu, Jia Jia, Lianhong Cai, "An Online Speech Driven Talking Head System," [in] Proc. IEEE Global High Tech Congress on Electronics (GHTCE), pp. 186-187. Shenzhen, China, November 18-20, 2012. (EI: 20131716244276) Paper
Xin Wang, Zhiyong Wu, "An HMM-based Cantonese Speech Synthesis System," [in] Proc. IEEE Global High Tech Congress on Electronics (GHTCE), pp. 141-142. Shenzhen, China, November 18-20, 2012. (EI: 20131716244264) Paper
姜涛, 吴志勇, 蔡莲红, "语音合成自然度的客观度量实验研究," [in] 第十届中国语音学学术会议 (PCC). 上海, 2012.5.18-20.
Hui Pang, Zhiyong Wu, Lianhong Cai, "Modeling Pitch Contour of Chinese Mandarin Sentences with the PENTA Model," [in] Proc. National Conference on Man-Machine Speech Communication (NCMMSC). Xi'an, China, October 16-18, 2011. Also published in Tsinghua Science and Technology (清华大学学报英文版), vol. 17, no. 2, pp. 218-224. February, 2012. (EI: 20123215322698, Best Student Paper) Paper
Binbin Shen, Zhiyong Wu, Yongxin Wang, Lianhong Cai, "Combining Active and Semi-supervised Learning for Homograph Disambiguation in Mandarin Text-to-Speech Synthesis," [in] Proc. Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 2165-2168. Florence, Italy, August 27-31, 2011. (EI: 20123715411045, CCF-C) Paper
陈龙, 吴志勇, 袁春, 蒙美玲, 蔡莲红, "面向数字版权管理的声纹辅助认证系统," [in] 第十一届全国人机语音通讯学术会议 (NCMMSC). 陕西, 西安, 2011.10.16-18. Paper
Zhiyong Wu, Lianhong Cai, Helen Meng, "Modeling Prosody Patterns for Chinese Expressive Text-to-Speech Synthesis," [in] Proc. International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 148-152. Tainan, China, November 29 - December 3, 2010. (EI: 20110713663203) Paper
Fanbo Meng, Helen Meng, Zhiyong Wu, Lianhong Cai, "Synthesizing Expressive Speech to Convey Focus using a Perturbation Model for Computer-Aided Pronunciation Training," [in] Proc. Second Language Studies: Acquisition, Learning, Education and Technology (L2WS), pp. 1-4. Tokyo, Japan, September 22-27, 2010. Paper
Quansheng Duan, Shiyin Kang, Zhiyong Wu, Lianhong Cai, Zhiwei Shuang, Yong Qin, "Comparison of Syllable/Phone HMM Based Mandarin TTS," [in] Proc. International Conference on Pattern Recognition (ICPR), pp. 4496-4499. Istanbul, Turkey, August 23-26, 2010. (EI: 20104613390878, THU-B) Paper
Shen Zhang, Zhiyong Wu, Helen Meng, Lianhong Cai, "Facial Expression Synthesis based on Emotion Dimensions for Affective Talking Avatar," Smart Innovation, Systems and Technologies (SIST), Modeling Machine Emotions for Realizing Intelligence, vol. 2010, no. 1, pp. 109-132. Springer, 2010. (EI: 20123715421851) Paper
张章, 贾珈, 蔡莲红, 吴志勇, "汉语音高模式及参数化描述的研究," [in] 第九届中国语音学学术会议 (PCC). 天津, 2010.5.28-30. Paper
Zhiyong Wu, Helen Meng, Hongwu Yang, Lianhong Cai, "Modeling the Expressivity of Input Text Semantics for Chinese Text-to-Speech Synthesis in a Spoken Dialog System," IEEE Transaction on Audio, Speech and Language Processing (TASLP), vol. 17, no. 8, pp. 1567-1577. IEEE, November, 2009. (SCI: WOS:000268903600010, EI: 20093612281690, THU-A) Paper
Zhiyong Wu, Guangqi Cao, Helen Meng, Lianhong Cai, "A Unified Framework for Multilingual Text-to-Speech Synthesis with SSML Specification as Interface," [in] Proc. National Conference on Man-Machine Speech Communication (NCMMSC). Lanzhou, Gansu, August 14-16, 2009. Also published in Tsinghua Science and Technology (清华大学学报英文版), vol. 14, no. 5, pp. 623-630, October 2009. (EI: 20094012358727) Paper
段全盛, 康世胤, 双志伟, 吴志勇, 蔡莲红, 秦勇, "一种适合HMM汉语语音合成的建模单元挑选算法," [in] 第十届全国人机语音通讯学术会议 (NCMMSC), pp. 434-439. 甘肃, 兰州, August 14-16, 2009. Paper
Honglei Cong, Zhiyong Wu, Lianhong Cai, Helen Meng, "A New Prosodic Strength Calculation Method for Prosody Reduction Modeling," [in] Proc. International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 53-56. Kunming, China, December 16-19, 2008. (EI: 20091011939031, Best Paper Finalist) Paper
Zhiyong Wu, Jiying Wu, Helen Meng, "The Use of Dynamic Deformable Templates for Lip Tracking in an Audio-Visual Corpus with Large Variations in Head Pose, Face Illumination and Lip Shapes," [in] Proc. International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 370-373. Kunming, China, December 16-19, 2008. (EI: 20091011939107) Paper
Xinxin Zhou, Zhiyong Wu, Chun Yuan, Yuzhuo Zhong, "Document Structure Analysis and Text Normalization for Chinese Putonghua and Cantonese Text-to-Speech Synthesis," [in] Proc. International Symposium on Intelligent Information Technology Application (IITA), pp. 477-481. Shanghai, China, December 20-22, 2008. (EI: 20091411996990) Paper
Yu Wang, Zhiyong Wu, Lianhong Cai, Helen Meng, "Modeling the Synchrony between Audio and Visual Modalities for Speaker Identification," [in] Proc. Phonetic Conference of China and the International Symposium on Phonetic Frontiers (PCC), pp. 1-5. Beijing, China, April 18-20, 2008. Paper
Shen Zhang, Zhiyong Wu, Helen Meng, Lianhong Cai, "Facial Expression Synthesis Using PAD Emotional Parameters for a Chinese Expressive Avatar," [in] Proc. International Conference on Affective Computing and Intelligent Interaction (ACII), pp. 24-35. Lisbon, Portugal, September 12-14, 2007. (EI: 20080311024879) Paper
Shen Zhang, Zhiyong Wu, Helen Meng, Lianhong Cai, "Head Movement Synthesis based on Semantic and Prosodic Features for a Chinese Expressive Avatar," [in] Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 837-840. Hawaii, USA, April 15-20, 2007. (EI: 20073210745929) Paper
Zhiyong Wu, Helen Meng, Hui Ning, Sam Tse, "A Corpus-based Approach for Cooperative Response Generation in a Dialog System," Lecture Notes in Computer Science, [in] Proc. International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 614-626. Singapore, December 13-16, 2006. (SCI: WOS:000244824800058, EI: 20100912736122) Paper
Hongwu Yang, Helen Meng, Zhiyong Wu, Lianhong Cai, "Modelling the Global Acoustic Correlates of Expressivity for Chinese Text-to-speech Synthesis," [in] IEEE/ACL Workshop on Spoken Language Technology (SLT), pp. 138-141. Palm Beach, Aruba, December 10-13, 2006. (EI: 20083311451167) Paper
Zhiyong Wu, Shen Zhang, Lianhong Cai, Helen Meng, "Real-time Synthesis of Chinese Visual Speech and Facial Expressions using MPEG-4 FAP Features in a Three-dimensional Avatar," [in] Proc. International Conference on Spoken Language Processing (INTERSPEECH - ICSLP), pp. 1802-1805. Pittsburgh, USA, September 17-21, 2006. (EI: 20082511324456) Paper
Zhiyong Wu, Lianhong Cai, Helen Meng, "Weight Estimation for Audio-Visual Multi-level Fusion in Bimodal Speaker Identification," Lecture Notes in Control and Information Science, [in] Proc. International Conference on Intelligent Computing (ICIC), pp. 1107-1112. Kunming, China, August 16-19, 2006. (SCI: WOS:000240385300144) Paper
吴志勇, 蔡莲红, "基于动态贝叶斯网络的音视频双模态说话人识别," 计算机研究与发展, 2006: 43(3), 470-475. 【Zhiyong Wu, Lianhong Cai, "Audio-Visual Bimodal Speaker Identification Using Dynamic Bayesian Networks," Journal of Computer Research and Development, vol.43, no.3, pp.470-475. March, 2006.】 (EI: 2006239925198) Paper
吴志勇, 蔡莲红, 马磊, 贾珈, "多生物特征识别平台的设计与实现," 小型微型计算机系统, 2006: 27(2), 375-379. 【Zhiyong Wu, Lianhong Cai, Lei Ma, Jia Jia, "Design and Implementation of a Multi-Biometric Platform," Mini-Micro Systems, vol.27, no.2, pp.375-379. February, 2006.】 Paper
Zhiyong Wu, Lianhong Cai, Helen Meng, "Multi-level Fusion of Audio and Visual Features for Speaker Identification," Lecture Notes in Computer Science, [in] Proc. International Conference on Biometrics (ICB), pp. 493-499. Hong Kong, China, January 5-7, 2006. (SCI: WOS:000235768300066, EI: 2006249940530) Paper
吴志勇, 蔡莲红, 蒙美玲, "可视语音合成中基于音视频关联模型的视位参数优化," [in] 第六届全国人机语音通讯学术会议 (NCMMSC), pp. 334-337. 北京, October 22-24, 2005. (Best Paper) Paper
吴志勇, 蔡莲红, 蔡锐, "语音合成中基于听辨指导的权重训练算法," 清华大学学报(自然科学版), 2005: 45(1), 52-56. 【Zhiyong Wu, Lianhong Cai, Rui Cai, "Perceptual Evaluation Weight Training for Text-to-Speech Synthesis," Journal of Tsinghua University, vol. 45, no. 1, pp. 52-56. January, 2005.】 (EI: 2005139014229) Paper
吴志勇, 蔡莲红, "语音合成中的韵律关联模型," 中文信息学报, 2004: 18(2), 44-50. 【Zhiyong Wu, Lianhong Cai, "Prosodic Correlation Model in Text-to-Speech Synthesis," Journal of Chinese Information Processing, vol. 18, no. 2, pp. 44-50. February, 2004.】 Paper
王志明, 蔡莲红, 吴志勇, 陶建华, "汉语文本-可视语音转换的研究," 小型微型计算机系统, 2002: 23(4), 474-477. 【Zhiming Wang, Lianhong Cai, Zhiyong Wu, Jianhua Tao, "Study of Text to Visual Speech in Chinese," Minimicro Systems, vol. 23, no. 4, pp. 474-477. April, 2002.】 Paper