June 2026 Face Recognition Papers: Low-Resolution MoE, Efficient ViTs, and 1024-Byte Travel Documents

June 2026 face recognition work was unusually deployment-focused. The strongest papers ask how recognition survives bad capture conditions, tight compute budgets, and extreme storage limits rather than assuming clean enrollment photos and unconstrained servers.

What This Month Signals

Together, the papers push recognition toward a more resilient product stack: adapt capacity for degraded faces, expose latency-quality trade-offs in ViTs, and engineer document images for severe byte budgets.

Paper 012026-06-30cs.CV

FaceMoE: Mixture of Experts for Low-Resolution Face Recognition

arXiv PDF

Authors & Institutions

Kartik Narayan

Johns Hopkins University

Vishal M. Patel

Johns Hopkins University

What Problem It Solves

The paper addresses the weakness of a single shared encoder: after fine-tuning on low-resolution data it may underfit degraded regions and lose high-resolution discriminative knowledge.

Key Result

Across eleven high-resolution, mixed-quality, and low-resolution benchmarks, the authors report clear gains over state-of-the-art low-resolution face-recognition methods while keeping sparse expert activation.

Abstract

Low-resolution face recognition (LR-FR) remains a challenging task due to poor feature extraction and aggregation, as probe images often contain limited identity information resulting from extreme degradations such as blur, occlusion, and low contrast. Additionally, the domain gap between high-resolution (HR) gallery images and low-resolution (LR) probe images poses a significant challenge. A single feature encoder struggles to generalize effectively across both domains when fine-tuned on an LR dataset, and this issue is further magnified by catastrophic forgetting. To address these challenges, we propose FaceMoE, an effective adaptation of Mixture of Experts (MoE) transfomer architecture for low-resolution face-recognition . Specifically, we introduce multiple specialized feed-forward network (FFN) experts and incorporate a top-k router, which dynamically assigns tokens to appropriate experts. This design emergently promotes specialization across experts for different semantic regions of the face, which enables FaceMoE to perform resolution-aware feature extraction. Moreover, the top-k router facilitates sparse expert activation, enabling the model to preserve pretrained knowledge when finetuned on a LR dataset, while increasing model capacity without proportional computational overhead. FaceMoE is trained with a combined face recognition loss, router z-loss, and load balancing loss to ensure expert specialization and stable training. To the best of our knowledge, this is the first work leveraging MoE for LR-FR. Extensive experiments across eleven datasets, spanning HR, mixed-quality, and LR benchmarks, demonstrate that FaceMoE significantly outperforms state-of-the-art methods. Code: https://github.com/Kartik-3004/FaceMoE

Research Starting Point

Surveillance, access-control, and border workflows often compare degraded probe images with cleaner enrollment images; the failure mode is not just less detail, but a domain gap that can cause an adapted encoder to forget high-quality recognition behavior.

Method

FaceMoE inserts specialized feed-forward experts into a transformer and uses top-k routing so each token can select a small set of experts. The training objective combines face-recognition loss with router z-loss and load-balancing loss, which encourages stable specialization without making every expert active for every image.

Paper Summary

FaceMoE is useful for teams that cannot control image quality at capture time. Its main product implication is a routing-based way to add capacity for degraded faces without retraining a completely separate low-resolution system or paying the full cost of a larger dense model.

Paper 022026-06-10cs.CV