Human-Computer Speech Interaction Lab at Tsinghua University (THUHCSI) targets at cutting-edge research in intelligent speech interaction technologies, including audio foundation models (speech, singing voice, audio), expressive and controllable speech generation (style, emotion, prosody, personalization), digital human generation (lip-sync, facial expressions, co-speech gestures, dance), natural language processing (understanding and generation), affective computing, and machine learning.
In partnership with the Faculty of Engineering at The Chinese University of Hong Kong, the lab jointly established the Tsinghua–CUHK Joint Research Center for Media Sciences, Technologies and Systems in 2006, serving as a long-term platform for scientific collaboration and academic exchange between Shenzhen, Hong Kong, and the Greater Bay Area. Over the years, THUHCSI has undertaken and contributed to major national and international research programs, including NSFC Key, General, and Young Scholar Projects, NSFC–RGC Joint Projects, the National High-Tech R&D Program (863 Program), the National Basic Research Program (973 Program), and major projects of the National Social Science Fund of China, yielding internationally recognized research achievements.
In recent years, the lab has made breakthrough contributions in expressive controllable speech generation, multimodal interaction, and large-scale intelligent speech models. Our research outcomes have been recognized with prestigious awards such as the Ministry of Education (MoE) Science and Technology Progress Award, the Beijing Science and Technology Progress Award, and the Shenzhen Science and Technology Progress Award. We have published more than 200 papers in leading international conferences, including AAAI, NeurIPS, IJCAI, ICLR, CVPR, ACM Multimedia, ICASSP, INTERSPEECH, and journals such as IEEE/ACM TASLP, IEEE TMM, and IEEE TPAMI. The lab has filed over 30 Chinese invention patents, with more than 10 granted, and holds multiple software copyrights.
THUHCSI maintains strong collaborations with both academia and industry, including ModelBest, Tencent, Microsoft, ByteDance, Alibaba and Xiaomi. Many of our research outputs have been widely transferred and applied in education, intelligent customer service, smart hardware, and digital human applications.
Talent cultivation is a central mission of the lab. Dozens of Ph.D. and Master students have graduated, many receiving honors such as the National Scholarship, the Excellent Dissertation Award of Tsinghua University, and the Outstanding Graduate Award of both Tsinghua University and Beijing. Our students have excelled in global competitions, winning top honors such as the Voice Spoofing & Anti-Spoofing Competition Championship at GeekPwn, the ICASSP Speech Enhancement Challenge (both tracks, 1st place), the INTERSPEECH Best Student Paper Award, CVPR Highlight Paper, the AAAI Digital Human Challenge (1st Prize), and the ICASSP Voice Cloning Challenge (1st place). Lab director has been recognized as a “Tsinghua University’s ‘Mentor and Friend’ Award” and received multiple Teaching Excellence Awards.
清华大学人机语音交互实验室(THUHCSI)长期聚焦于智能语音交互技术的前沿研究,涵盖通用音频大模型(语音、歌曲、音效)、表现力语音生成(风格、情感、韵律、个性化)、数字人生成(口型、表情、手势、舞蹈)、自然语言处理(理解与生成)、情感计算、机器学习等。
实验室与香港中文大学工程学院联合成立了“清华大学-香港中文大学媒体科学、技术与系统联合研究中心”,自2006年以来持续推动深港、粤港科技合作与学术交流。实验室先后承担和参与了国家自然科学基金(NSFC)重点项目、面上项目、青年基金项目、NSFC-RGC联合项目,国家高技术研究发展计划(863)、国家重点基础研究发展计划(973)、国家社会科学基金重大项目等,取得了一系列国际先进的研究成果。
近年来,实验室在表现力可控生成、多模态交互、智能语音大模型等方向取得突破性进展。研究成果多次获得教育部科学技术进步奖、北京市科学技术进步奖、深圳市科学技术进步奖等部委及省部级奖励,并在AAAI、NeurIPS、IJCAI、ICLR、CVPR、ACM Multimedia、ICASSP、INTERSPEECH等国际顶级会议和IEEE/ACM TASLP、IEEE TMM、IEEE TPMI等权威期刊发表论文200余篇。实验室申请中国发明专利30余项,已授权10余项,拥有多项软件著作权。
实验室与面壁智能、腾讯、微软、字节跳动、阿里巴巴、小米、虎牙、思必驰、标贝等互联网与智能语音产业公司保持紧密合作,科研成果在教育、智能客服、智能硬件与数字人等应用场景中广泛转化落地。
实验室注重人才培养,已培养博士、硕士研究生数十人,多人次获得国家奖学金、清华大学优秀学位论文、清华大学优秀毕业生、北京市优秀毕业生等荣誉。实验室学生在国际国内竞赛中表现突出,曾获得全球极客大赛AI仿声验声攻防赛冠军、ICASSP语音增强挑战赛双赛道冠军、INTERSPEECH最佳学生论文奖、CVPR Highlight论文、AAAI数字人生成挑战赛一等奖、ICASSP语音克隆挑战赛第一名等重量级奖项。实验室负责人为清华大学“良师益友”,多次获得清华大学教学优秀奖等荣誉。