Гибкое, естественное и эффективное взаимодействие для conversational talking face generation
Авторы и организации
Baiqin Wang
MAIS, Institute of Automation, Chinese Academy of Sciences
School of Artificial Intelligence, University of Chinese Academy of Sciences
Sen Chen
MAIS, Institute of Automation, Chinese Academy of Sciences
School of Artificial Intelligence, University of Chinese Academy of Sciences
Jiankuo Zhao
MAIS, Institute of Automation, Chinese Academy of Sciences
School of Artificial Intelligence, University of Chinese Academy of Sciences
Xiangyu Liu
MAIS, Institute of Automation, Chinese Academy of Sciences
School of Artificial Intelligence, University of Chinese Academy of Sciences
Zhen Lei
MAIS, Institute of Automation, Chinese Academy of Sciences
School of Artificial Intelligence, University of Chinese Academy of Sciences
CAIR, HKISI, Chinese Academy of Sciences
School of Computer Science and Engineering, Faculty of Innovation Engineering, Macau University of Science and Technology
Xiangyu Zhu
MAIS, Institute of Automation, Chinese Academy of Sciences
School of Artificial Intelligence, University of Chinese Academy of Sciences
Какую задачу решает
Работа закрывает gap между speaking-only generation и реальной conversation: любое число участников, long sessions, non-verbal feedback и low latency.
Ключевой результат
Авторы сообщают лучшую interaction quality при 30 FPS realtime generation — ключевой порог для online conversation.
Аннотация
InterTalk нацелен на conversational talking-face generation с несколькими участниками и многими раундами. Motion-based architecture, feedback, iterative generation, facial disentanglement, dataset и 3D augmentation дают 30 FPS.
Отправная точка исследования
Talking-face systems переходят от clips к persistent agents, tutors и avatars; listening behavior и turn-taking важны как lip sync.
Метод
Фреймворк моделирует dynamics по участникам, использует feedback motion других, iteratively refines behavior и разделяет facial components: lips, blinking, gestures.
Вывод по статье
InterTalk расширяет face-swapping/talking-head stack к interactive digital humans. Вопрос меняется: не только lip-sync clip, а sustained believable exchange in real time.