作者与机构
Baiqin Wang
MAIS, Institute of Automation, Chinese Academy of Sciences
School of Artificial Intelligence, University of Chinese Academy of Sciences
Sen Chen
MAIS, Institute of Automation, Chinese Academy of Sciences
School of Artificial Intelligence, University of Chinese Academy of Sciences
Jiankuo Zhao
MAIS, Institute of Automation, Chinese Academy of Sciences
School of Artificial Intelligence, University of Chinese Academy of Sciences
Xiangyu Liu
MAIS, Institute of Automation, Chinese Academy of Sciences
School of Artificial Intelligence, University of Chinese Academy of Sciences
Zhen Lei
MAIS, Institute of Automation, Chinese Academy of Sciences
School of Artificial Intelligence, University of Chinese Academy of Sciences
CAIR, HKISI, Chinese Academy of Sciences
School of Computer Science and Engineering, Faculty of Innovation Engineering, Macau University of Science and Technology
Xiangyu Zhu
MAIS, Institute of Automation, Chinese Academy of Sciences
School of Artificial Intelligence, University of Chinese Academy of Sciences
解决了什么问题
论文解决“只会说话的视频生成”和真实会话之间的差距:任意人数、长会话、非语言反馈和低延迟必须同时成立。
关键结果
作者报告在保持 30 FPS 实时生成的同时提升交互质量,这是在线会话使用的关键门槛。
摘要
InterTalk 面向会话式说话人脸生成:多个参与者在多轮中说话、聆听并相互反馈。它采用运动驱动架构、参与者反馈、迭代生成、面部组件解耦、新多人物数据集和 3D 人脸增强,实现 30 FPS 实时生成。
研究出发点
说话人脸系统正从单段视频走向持续代理、导师、助手和会议头像;聆听行为和轮次切换与唇同步同样重要。
方法概述
框架按参与者建模会话动态,利用其他说话者/聆听者的反馈运动,迭代细化行为,并拆分面部组件,使唇动、眨眼和反应动作可以分别改进。
论文总结
InterTalk 把人脸替换/说话头技术栈推进到交互式数字人。实际问题从“能不能给一段视频对口型”变成“能不能在实时约束下维持多角色可信交流”。