Authors & Institutions
Kartik Narayan
Johns Hopkins University
Vishal M. Patel
Johns Hopkins University
What Problem It Solves
The paper addresses the weakness of a single shared encoder: after fine-tuning on low-resolution data it may underfit degraded regions and lose high-resolution discriminative knowledge.
Key Result
Across eleven high-resolution, mixed-quality, and low-resolution benchmarks, the authors report clear gains over state-of-the-art low-resolution face-recognition methods while keeping sparse expert activation.
Abstract
Low-resolution face recognition (LR-FR) remains a challenging task due to poor feature extraction and aggregation, as probe images often contain limited identity information resulting from extreme degradations such as blur, occlusion, and low contrast. Additionally, the domain gap between high-resolution (HR) gallery images and low-resolution (LR) probe images poses a significant challenge. A single feature encoder struggles to generalize effectively across both domains when fine-tuned on an LR dataset, and this issue is further magnified by catastrophic forgetting. To address these challenges, we propose FaceMoE, an effective adaptation of Mixture of Experts (MoE) transfomer architecture for low-resolution face-recognition . Specifically, we introduce multiple specialized feed-forward network (FFN) experts and incorporate a top-k router, which dynamically assigns tokens to appropriate experts. This design emergently promotes specialization across experts for different semantic regions of the face, which enables FaceMoE to perform resolution-aware feature extraction. Moreover, the top-k router facilitates sparse expert activation, enabling the model to preserve pretrained knowledge when finetuned on a LR dataset, while increasing model capacity without proportional computational overhead. FaceMoE is trained with a combined face recognition loss, router z-loss, and load balancing loss to ensure expert specialization and stable training. To the best of our knowledge, this is the first work leveraging MoE for LR-FR. Extensive experiments across eleven datasets, spanning HR, mixed-quality, and LR benchmarks, demonstrate that FaceMoE significantly outperforms state-of-the-art methods. Code: https://github.com/Kartik-3004/FaceMoE
Research Starting Point
Surveillance, access-control, and border workflows often compare degraded probe images with cleaner enrollment images; the failure mode is not just less detail, but a domain gap that can cause an adapted encoder to forget high-quality recognition behavior.
Method
FaceMoE inserts specialized feed-forward experts into a transformer and uses top-k routing so each token can select a small set of experts. The training objective combines face-recognition loss with router z-loss and load-balancing loss, which encourages stable specialization without making every expert active for every image.
Paper Summary
FaceMoE is useful for teams that cannot control image quality at capture time. Its main product implication is a routing-based way to add capacity for degraded faces without retraining a completely separate low-resolution system or paying the full cost of a larger dense model.