Related papers: TransFace++: Rethinking the Face Recognition Paradigm with a Focus on Accuracy, Efficiency, and Security

TransFace++: Rethinking the Face Recognition Paradigm with a Focus on Accuracy, Efficiency, and Security

URL: http://arxiv.org/abs/2308.10133v2
Date: Sat, 25 Oct 2025 03:27:01 GMT
Title: TransFace++: Rethinking the Face Recognition Paradigm with a Focus on Accuracy, Efficiency, and Security
Authors: Jun Dan, Yang Liu, Baigui Sun, Jiankang Deng, Shan Luo,
Abstract summary: Face Recognition (FR) technology has made significant strides with the emergence of deep learning.<n>Most existing FR models are built upon Convolutional Neural Networks (CNN) and take RGB face images as the model's input.<n>We propose two novel FR frameworks, i.e., TransFace and TransFace++, which successfully explore the feasibility of applying ViTs and image bytes to FR tasks.
Score: 56.24794071698785
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Face Recognition (FR) technology has made significant strides with the emergence of deep learning. Typically, most existing FR models are built upon Convolutional Neural Networks (CNN) and take RGB face images as the model's input. In this work, we take a closer look at existing FR paradigms from high-efficiency, security, and precision perspectives, and identify the following three problems: (i) CNN frameworks are vulnerable in capturing global facial features and modeling the correlations between local facial features. (ii) Selecting RGB face images as the model's input greatly degrades the model's inference efficiency, increasing the extra computation costs. (iii) In the real-world FR system that operates on RGB face images, the integrity of user privacy may be compromised if hackers successfully penetrate and gain access to the input of this model. To solve these three issues, we propose two novel FR frameworks, i.e., TransFace and TransFace++, which successfully explore the feasibility of applying ViTs and image bytes to FR tasks, respectively. Experiments on popular face benchmarks demonstrate the superiority of our TransFace and TransFace++. Code is available at https://github.com/DanJun6737/TransFace_pp.

Related papers

Scalable Face Security Vision Foundation Model for Deepfake, Diffusion, and Spoofing Detection [23.328598687742712]
We make the first attempt and propose FS-VFM to learn fundamental representations of real face images.<n>We introduce three learning objectives, namely 3C, that synergize masked image modeling (MIM) and instance discrimination (ID)<n>We present a reliable self-distillation mechanism that seamlessly couples MIM with ID to establish underlying local-to-global correspondence.<n>Experiments on 11 public benchmarks demonstrate that our FS-VFM consistently generalizes better than diverse VFMs.
arXiv Detail & Related papers (2025-10-12T15:38:03Z)
ViTaPEs: Visuotactile Position Encodings for Cross-Modal Alignment in Multimodal Transformers [7.505873965164197]
We introduce ViTaPEs, a framework to learn task-agnostic representations for visuotactile perception.<n>Our approach exploits a novel multi-scale positional encoding scheme to capture intra-modal structures.<n>We show that ViTaPEs surpasses state-of-the-art baselines across various recognition tasks.
arXiv Detail & Related papers (2025-05-26T14:19:29Z)
OSDFace: One-Step Diffusion Model for Face Restoration [72.5045389847792]
Diffusion models have demonstrated impressive performance in face restoration.<n>We propose OSDFace, a novel one-step diffusion model for face restoration.<n>Results demonstrate that OSDFace surpasses current state-of-the-art (SOTA) methods in both visual quality and quantitative metrics.
arXiv Detail & Related papers (2024-11-26T07:07:48Z)
Transferable Adversarial Facial Images for Privacy Protection [15.211743719312613]
We present a novel face privacy protection scheme with improved transferability while maintain high visual quality. We first exploit global adversarial latent search to traverse the latent space of the generative model. We then introduce a key landmark regularization module to preserve the visual identity information.
arXiv Detail & Related papers (2024-07-18T02:16:11Z)
Text-Guided Face Recognition using Multi-Granularity Cross-Modal Contrastive Learning [0.0]
We introduce text-guided face recognition (TGFR) to analyze the impact of integrating facial attributes in the form of natural language descriptions. TGFR demonstrates remarkable improvements, particularly on low-quality images, over existing face recognition models.
arXiv Detail & Related papers (2023-12-14T22:04:22Z)
Generalized Face Forgery Detection via Adaptive Learning for Pre-trained Vision Transformer [54.32283739486781]
We present a textbfForgery-aware textbfAdaptive textbfVision textbfTransformer (FA-ViT) under the adaptive learning paradigm. FA-ViT achieves 93.83% and 78.32% AUC scores on Celeb-DF and DFDC datasets in the cross-dataset evaluation.
arXiv Detail & Related papers (2023-09-20T06:51:11Z)
FactoFormer: Factorized Hyperspectral Transformers with Self-Supervised Pretraining [36.44039681893334]
Hyperspectral images (HSIs) contain rich spectral and spatial information. Current state-of-the-art hyperspectral transformers only tokenize the input HSI sample along the spectral dimension. We propose a novel factorized spectral-spatial transformer that incorporates factorized self-supervised pretraining procedures.
arXiv Detail & Related papers (2023-09-18T02:05:52Z)
S-Adapter: Generalizing Vision Transformer for Face Anti-Spoofing with Statistical Tokens [45.06704981913823]
Face Anti-Spoofing (FAS) aims to detect malicious attempts to invade a face recognition system by presenting spoofed faces. We propose a novel Statistical Adapter (S-Adapter) that gathers local discriminative and statistical information from localized token histograms. To further improve the generalization of the statistical tokens, we propose a novel Token Style Regularization (TSR) Our experimental results demonstrate that our proposed S-Adapter and TSR provide significant benefits in both zero-shot and few-shot cross-domain testing, outperforming state-of-the-art methods on several benchmark tests.
arXiv Detail & Related papers (2023-09-07T22:36:22Z)
Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust. Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model. We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z)
Cluster-level pseudo-labelling for source-free cross-domain facial expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER) Our method exploits self-supervised pretraining to learn good feature representations from the target data. We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z)
HiMFR: A Hybrid Masked Face Recognition Through Face Inpainting [0.7868449549351486]
We propose an end-to-end hybrid masked face recognition system, namely HiMFR. Masked face detector module applies a pretrained Vision Transformer to detect whether faces are covered with masked or not. Inpainting module uses a fine-tune image inpainting model based on a Generative Adversarial Network (GAN) to restore faces. Finally, the hybrid face recognition module based on ViT with an EfficientNetB3 backbone recognizes the faces.
arXiv Detail & Related papers (2022-09-19T11:26:49Z)
Multi-Prior Learning via Neural Architecture Search for Blind Face Restoration [61.27907052910136]
Blind Face Restoration (BFR) aims to recover high-quality face images from low-quality ones. Current methods still suffer from two major difficulties: 1) how to derive a powerful network architecture without extensive hand tuning; 2) how to capture complementary information from multiple facial priors in one network to improve restoration performance. We propose a Face Restoration Searching Network (FRSNet) to adaptively search the suitable feature extraction architecture within our specified search space.
arXiv Detail & Related papers (2022-06-28T12:29:53Z)
Towards Data-Efficient Detection Transformers [77.43470797296906]
We show most detection transformers suffer from significant performance drops on small-size datasets. We empirically analyze the factors that affect data efficiency, through a step-by-step transition from a data-efficient RCNN variant to the representative DETR. We introduce a simple yet effective label augmentation method to provide richer supervision and improve data efficiency.
arXiv Detail & Related papers (2022-03-17T17:56:34Z)
GMFIM: A Generative Mask-guided Facial Image Manipulation Model for Privacy Preservation [0.7734726150561088]
We propose a Generative Mask-guided Face Image Manipulation model based on GANs to apply imperceptible editing to the input face image. Our model can achieve better performance against automated face recognition systems in comparison to the state-of-the-art methods.
arXiv Detail & Related papers (2022-01-10T14:09:14Z)
End2End Occluded Face Recognition by Masking Corrupted Features [82.27588990277192]
State-of-the-art general face recognition models do not generalize well to occluded face images. This paper presents a novel face recognition method that is robust to occlusions based on a single end-to-end deep neural network. Our approach, named FROM (Face Recognition with Occlusion Masks), learns to discover the corrupted features from the deep convolutional neural networks, and clean them by the dynamically learned masks.
arXiv Detail & Related papers (2021-08-21T09:08:41Z)
Vision Transformers are Robust Learners [65.91359312429147]
We study the robustness of the Vision Transformer (ViT) against common corruptions and perturbations, distribution shifts, and natural adversarial examples. We present analyses that provide both quantitative and qualitative indications to explain why ViTs are indeed more robust learners.
arXiv Detail & Related papers (2021-05-17T02:39:22Z)
FedFace: Collaborative Learning of Face Recognition Model [66.84737075622421]
FedFace is a framework for collaborative learning of face recognition models. It learns an accurate and generalizable face recognition model where the face images stored at each client are neither shared with other clients nor the central host. Our code and pre-trained models will be publicly available.
arXiv Detail & Related papers (2021-04-07T09:25:32Z)
DotFAN: A Domain-transferred Face Augmentation Network for Pose and Illumination Invariant Face Recognition [94.96686189033869]
We propose a 3D model-assisted domain-transferred face augmentation network (DotFAN) DotFAN can generate a series of variants of an input face based on the knowledge distilled from existing rich face datasets collected from other domains. Experiments show that DotFAN is beneficial for augmenting small face datasets to improve their within-class diversity.
arXiv Detail & Related papers (2020-02-23T08:16:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.