Authors & Institutions
Enrui Yang
School of Computer Science and Technology, Ocean University of China, Qingdao, China
Yuezun Li
School of Computer Science and Technology, Ocean University of China, Qingdao, China
What Problem It Solves
The problem is how to turn CLIP into a more generalizable face forgery detector that can separate useful forgery cues from background noise when the attack style changes.
Key Result
The paper reports competitive or better results under both cross-dataset and cross-method evaluation, which is the part buyers care about most when the next forgery style has not appeared in their training data yet.
Abstract
Generalizable Face Forgery Detection via Separable Prompt Learning adapts CLIP into a face forgery detector by focusing on the text side of the model, not just the visual encoder. It separates forgery-specific and forgery-irrelevant cues through prompt learning and cross-modality alignment to improve cross-dataset and cross-method generalization.
Research Starting Point
A lot of CLIP-style deepfake detection work borrows the visual encoder and then mostly ignores the part that makes vision-language models distinctive: the text modality and the alignment space around it. That leaves a lot of supervision unused and often weakens generalization to unseen forgeries. The paper is motivated by the idea that prompt design itself can become a real forensic tool rather than just a wrapper around a pretrained encoder.
Method
SePL introduces two prompt-learning paths, one for forgery-specific information and one for forgery-irrelevant information, then uses cross-modality alignment and dedicated objectives to disentangle them. By shifting more of the reasoning burden toward the text side of the model, the method treats language supervision as a first-class detection signal instead of an afterthought.
Paper Summary
The bigger takeaway is that prompt engineering is starting to matter for deepfake detection in a practical way. This paper makes the case that better use of multimodal priors can improve robustness without requiring a fully bespoke detector from scratch.