← ブログ一覧へ戻る

リサーチレーダー顔交換arXiv2026年6月

月次 arXiv レーダー

2026年6月の顔交換論文：会話型話者顔生成、高速ポートレートアニメーション、プライバシー保護

2026年6月のface swapping研究は、よりinteractiveなtalking faceと、無断identity transferへの強い防御という2方向に分かれる。今月は単一のswap modelより、speed、multi-person behavior、protectionという周辺system requirementsが重要だった。

本月の重要シグナル

今月は、synthesisがinteractive systemへ進み、防御はよりthreat-model-specificになっていることを示す。この組み合わせこそ、自然なmotion、低latency、misuse防止という購入要件の中心である。

論文 012026-06-30cs.CV

柔軟で自然かつ効率的な対話型話者顔生成に向けて

著者・所属

Baiqin Wang

MAIS, Institute of Automation, Chinese Academy of Sciences

School of Artificial Intelligence, University of Chinese Academy of Sciences

Sen Chen

MAIS, Institute of Automation, Chinese Academy of Sciences

School of Artificial Intelligence, University of Chinese Academy of Sciences

Jiankuo Zhao

MAIS, Institute of Automation, Chinese Academy of Sciences

School of Artificial Intelligence, University of Chinese Academy of Sciences

Xiangyu Liu

MAIS, Institute of Automation, Chinese Academy of Sciences

School of Artificial Intelligence, University of Chinese Academy of Sciences

Zhen Lei

MAIS, Institute of Automation, Chinese Academy of Sciences

School of Artificial Intelligence, University of Chinese Academy of Sciences

CAIR, HKISI, Chinese Academy of Sciences

School of Computer Science and Engineering, Faculty of Innovation Engineering, Macau University of Science and Technology

Xiangyu Zhu

MAIS, Institute of Automation, Chinese Academy of Sciences

School of Artificial Intelligence, University of Chinese Academy of Sciences

何を解決するか

この論文はspeaking-only generationと実会話のgapを扱う。任意人数、長時間session、non-verbal feedback、低latencyを同時に満たす必要がある。

主要結果

著者はinteraction qualityの向上と30 FPS real-time生成の両立を報告し、これはonline conversation用途の重要な閾値である。

要旨

InterTalkは、複数参加者が何ラウンドも話し、聞き、反応する会話型talking-face生成を対象にする。motion-based architecture、参加者feedback、iterative generation、顔componentのdisentanglement、新しいmulti-person dataset、3D face augmentationにより30 FPS real-time生成を目指す。

研究の出発点

talking-face systemは単発clipから継続的agent、tutor、assistant、meeting avatarへ移っており、lip syncだけでなく聞く動作やturn-takingも重要になる。

手法

frameworkは参加者ごとの会話dynamicsをモデル化し、他の話者・聞き手のfeedback motionを使い、反復的にbehaviorを改善する。lip motion、eye blinking、response gestureを独立に改善できるよう顔componentを分離する。

論文要点

InterTalkはface-swapping/talking-head stackをinteractive digital humanへ広げる。実務上の問いは「clipをlip-syncできるか」から「複数roleの believable exchange をreal-time制約下で維持できるか」へ移る。

論文 022026-06-29cs.CV

SyncCache：非対称ダイナミクスを活用した高速音声駆動ポートレートアニメーション

著者・所属

Juncheng Ma

Shenzhen Graduate School, Peking University, China

Yuxuan Du

Shenzhen Graduate School, Peking University, China

Yanan Sun

Shanghai AI Laboratory, China

Zhening Xing

Shanghai AI Laboratory, China

Changlin Li

Tencent Hunyuan, China

Zhenyu Tang

Shenzhen Graduate School, Peking University, China

Bo Li

vivo, China

Peng-Tao Jiang

vivo, China

Li Yuan

Shenzhen Graduate School, Peking University, China

Daquan Zhou

Shenzhen Graduate School, Peking University, China

Yonghong Tian

Shenzhen Graduate School, Peking University, China

何を解決するか

一般的なdiffusion cachingのmismatchを修正する。text-to-video前提ではaudio-driven faceの空間・modality imbalanceを捉えられない。

主要結果

HunyuanVideo-Avatarで最大4.12倍、Wan-S2Vで3.75倍の高速化を報告し、視覚品質とaudio alignmentはほぼ損なわない。

要旨

SyncCacheはDiTベースのaudio-driven portrait animation向けtraining-free高速化手法である。人間領域とaudio-conditioned motionは背景より動的であるため、軽量audio blockは再計算し、安定したinter-block residualをcacheする。

研究の出発点

portrait animation diffusion modelは強力だが遅い。production avatar systemにはlip syncや顔detailを壊さない高速化が必要である。

手法

SyncCacheはSpatially-Asymmetric Probing、Modality-Decoupled Caching、memory-adaptive offline cache selectionを組み合わせる。audio-sensitive部分は再計算し、residualが安定する高コストDiT blockをbypassする。

論文要点

SyncCacheはgeneratorを再学習せずにinference costを下げる点が価値である。avatar productでは、previewの高速化、cloud costの削減、interactive audio-driven portrait generationの現実味につながる。

論文 032026-06-30cs.CV

Phantom：潜在空間・空間制約による統一的な顔交換ディープフェイク保護フレームワーク

著者・所属

Jungkon Kim

Samsung Electronics, AI Platform Center

Cheolseung Jung

Samsung Electronics, AI Platform Center

Jong-Min Choi

Samsung Electronics, AI Platform Center

Juseong Lee

Samsung Electronics, AI Platform Center

何を解決するか

従来のadversarial protectionの弱点を狙う。random targetは曖昧なlatent directionを作り、制約のないnoiseはIDに無関係な領域に漏れる。

主要結果

UniFace、INSwapper、SimSwapでdodging protection successをそれぞれ27.8%、25.6%、16.6%改善する。impersonation protectionも最大10.2%改善し、perceptual fidelityも向上する。

要旨

Phantomはface-swap deepfake向けのproactive protection frameworkである。IDをずらしつつ属性を保つtargetを合成してlatent optimizationを導き、摂動を意味的に重要な顔領域に制限して、保護を強く視覚的にも自然にする。

研究の出発点

deepfake detectionは事後対応である。個人やbrandには、改ざんvideoが作られる前に無断face swapを失敗させるcontrolも必要だ。

手法

Phantomはlatent制約とspatial制約を同時に最適化する。属性保持targetでID-aware directionを作り、face swapに重要な意味領域だけにmasked perturbationを適用する。

論文要点

Phantomはface-swap defenseを独自のthreat modelとして扱い、face-recognition攻撃をそのまま借りない点が重要である。消費者photo serviceや有名人・brand protectionでは、source imageを明らかに損なわないspatial constraint設計が特に重要だ。