Flow Augmentation and Knowledge Distillation for Lightweight Face Presentation Attack Detection
Authors & Institutions
Muhammad Shahid Jabbar
SDAIA-KFUPM Joint Research Center for Artificial Intelligence, King Fahd University of Petroleum & Minerals, Dhahran, Saudi Arabia
Muhammad Sohail Ibrahim
Interdisciplinary Research Center for Intelligent Secure Systems (IRC-ISS), King Fahd University of Petroleum & Minerals, Dhahran, Saudi Arabia
Taha Hasan Masood Siddique
College of Information Science & Electronic Engineering, Zhejiang University, Hangzhou, China
Kejie Huang
College of Information Science & Electronic Engineering, Zhejiang University, Hangzhou, China
Shujaat Khan
SDAIA-KFUPM Joint Research Center for Artificial Intelligence, King Fahd University of Petroleum & Minerals, Dhahran, Saudi Arabia
Department of Computer Engineering, College of Computing and Mathematics, King Fahd University of Petroleum & Minerals, Dhahran, Saudi Arabia
What Problem It Solves
The paper solves the inference-time cost problem created by optical-flow-based FacePAD methods.
Key Result
The distilled model reaches 0.0% HTER on Replay-Attack and Replay-Mobile, 0.94% HTER on ROSE-Youtu, 5.65% HTER on SiW-Mv2, 0.42% ACER on OULU-NPU, and 52 FPS on an NVIDIA Jetson Orin Nano.
Abstract
Face presentation attack detection (FacePAD) remains challenging under diverse spoofing representation, including 2D print and replay, 3D mask-based spoofing, makeup-induced appearance manipulation, and physical occlusions, as well as under varying capture conditions. Motion cues are highly discriminative for FacePAD but typically require explicit optical flow estimation, which introduces substantial computational overhead and limits real-time deployment. In this work, we leverage optical flow to enhance motion representation during training while eliminating the need for flow computation at inference. We propose a dual-branch teacher model that fuses appearance cues from RGB frames with motion cues derived from colorwheel-encoded optical flow, enabling effective modeling of micro-motions and temporal consistency. To enable efficient deployment, we introduce a knowledge distillation framework that transfers motion-aware knowledge from the flow-augmented teacher to a lightweight RGB-only student via logit distillation. As a result, the student implicitly learns motion-sensitive representations without requiring explicit flow estimation or additional feature extraction blocks at inference. Extensive experiments demonstrate strong performance across multiple benchmarks, achieving 0.0% HTER on Replay-Attack and Replay-Mobile, 0.94% HTER on ROSE-Youtu, 5.65% HTER on SiW-Mv2, and 0.42% ACER on OULU-NPU. The distilled student achieves performance comparable to or better than the teacher while significantly reducing parameters and FLOPs, achieving 52 FPS on an NVIDIA Jetson Orin Nano, indicating its suitability for real-time and resource-constrained FacePAD deployment.
Research Starting Point
Presentation attack detection has to recognize subtle motion cues while still running on embedded devices and camera-side hardware.
Method
A dual-branch teacher learns from RGB appearance and colorwheel-encoded optical flow, then a lightweight RGB-only student receives motion-aware knowledge through logit distillation.
Paper Summary
The practical contribution is that motion-aware presentation attack detection no longer has to pay the full inference cost of optical flow. A flow-augmented teacher transfers temporal liveness cues into a lightweight RGB student, making the approach more realistic for kiosks, mobile onboarding, and edge cameras that need fast spoof protection without server round trips.