SteerFace: Debiasing Synthetic Face Generation via Adaptive Residue Perturbation
Authors & Institutions
Yuxi Mi
Fudan University, Shanghai, China
Qiuyang Yuan
Fudan University, Shanghai, China
Jianqing Xu
Youtu Lab, Tencent, Shanghai, China
Yichun Zhou
Fudan University, Shanghai, China
Xuan Zhao
Fudan University, Shanghai, China
Jun Wang
WeChat Pay Lab33, Tencent, Shenzhen, China
Rizen Guo
WeChat Pay Lab33, Tencent, Shenzhen, China
Shuigeng Zhou
Fudan University, Shanghai, China
What Problem It Solves
The paper addresses the synthetic-real gap created when identity-conditioned generators absorb non-identity cues into the learned identity representation.
Key Result
The authors report that SteerFace mitigates visual tendency, improves downstream face recognition over prior synthetic-data methods, and generalizes across training datasets and generation pipelines.
Abstract
The shortage of legally compliant data for face recognition training has sparked growing interest in using synthetic data as an alternative. While recent diffusion-based methods enable the generation of photorealistic face images with strong identity adherence and data diversity, their downstream recognition performance still exhibits a significant synthetic-real gap. This paper identifies visual tendency as a previously underexplored limitation, whereby synthetic data exhibit an unrealistic prevalence of visual attributes and thus deviate from the real-data distribution. Visual tendency can be attributed to the generator's conditioning on identity embeddings, through which co-occurring residual visual cues are unintentionally absorbed into learned identity semantics. To discourage the generator from exploiting such visual cues, this paper proposes SteerFace, a simple and efficient training framework that perturbs identity embeddings by steering them toward random orthogonal directions on the embedding hypersphere. The perturbation serves as an identity-preserving regularizer that penalizes the generator's reliance on non-identity components, as supported by theoretical analysis. This paper further introduces an adaptive strategy that learns perturbation strengths with both sample-wise preference and favorable overall statistics. Extensive experiments show that SteerFace effectively mitigates visual tendency, outperforms prior methods in downstream face recognition, and generalizes well across different training datasets and generation pipelines.
Research Starting Point
Face recognition buyers increasingly need legally compliant training data, but synthetic faces can inherit unrealistic visual tendencies that hurt downstream verification.
Method
SteerFace perturbs identity embeddings toward random orthogonal directions on the embedding hypersphere, then learns adaptive perturbation strengths so the generator is regularized away from residual non-identity cues while preserving identity.
Paper Summary
SteerFace is useful because it treats synthetic face generation as a training-data quality problem, not only an image realism problem. By perturbing identity embeddings away from residual visual cues, it gives teams a way to reduce synthetic-real mismatch before models are trained, which is directly relevant for compliant dataset expansion and bias auditing.