TransFace: Calibrating Transformer Training for Face Recognition from a
Data-Centric Perspective
- URL: http://arxiv.org/abs/2308.10133v1
- Date: Sun, 20 Aug 2023 02:02:16 GMT
- Title: TransFace: Calibrating Transformer Training for Face Recognition from a
Data-Centric Perspective
- Authors: Jun Dan, Yang Liu, Haoyu Xie, Jiankang Deng, Haoran Xie, Xuansong Xie
and Baigui Sun
- Abstract summary: Vision Transformers (ViTs) have demonstrated powerful representation ability in various visual tasks thanks to their intrinsic data-hungry nature.
However, we unexpectedly find that ViTs perform vulnerably when applied to face recognition (FR) scenarios with extremely large datasets.
This paper proposes a superior FR model called TransFace, which employs a patch-level data augmentation strategy named DPAP and a hard sample mining strategy named EHSM.
- Score: 40.521854111639094
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vision Transformers (ViTs) have demonstrated powerful representation ability
in various visual tasks thanks to their intrinsic data-hungry nature. However,
we unexpectedly find that ViTs perform vulnerably when applied to face
recognition (FR) scenarios with extremely large datasets. We investigate the
reasons for this phenomenon and discover that the existing data augmentation
approach and hard sample mining strategy are incompatible with ViTs-based FR
backbone due to the lack of tailored consideration on preserving face
structural information and leveraging each local token information. To remedy
these problems, this paper proposes a superior FR model called TransFace, which
employs a patch-level data augmentation strategy named DPAP and a hard sample
mining strategy named EHSM. Specially, DPAP randomly perturbs the amplitude
information of dominant patches to expand sample diversity, which effectively
alleviates the overfitting problem in ViTs. EHSM utilizes the information
entropy in the local tokens to dynamically adjust the importance weight of easy
and hard samples during training, leading to a more stable prediction.
Experiments on several benchmarks demonstrate the superiority of our TransFace.
Code and models are available at https://github.com/DanJun6737/TransFace.
Related papers
- Generalized Face Forgery Detection via Adaptive Learning for Pre-trained Vision Transformer [54.32283739486781]
We present a textbfForgery-aware textbfAdaptive textbfVision textbfTransformer (FA-ViT) under the adaptive learning paradigm.
FA-ViT achieves 93.83% and 78.32% AUC scores on Celeb-DF and DFDC datasets in the cross-dataset evaluation.
arXiv Detail & Related papers (2023-09-20T06:51:11Z) - FactoFormer: Factorized Hyperspectral Transformers with Self-Supervised
Pretraining [36.44039681893334]
Hyperspectral images (HSIs) contain rich spectral and spatial information.
Current state-of-the-art hyperspectral transformers only tokenize the input HSI sample along the spectral dimension.
We propose a novel factorized spectral-spatial transformer that incorporates factorized self-supervised pretraining procedures.
arXiv Detail & Related papers (2023-09-18T02:05:52Z) - S-Adapter: Generalizing Vision Transformer for Face Anti-Spoofing with Statistical Tokens [45.06704981913823]
Face Anti-Spoofing (FAS) aims to detect malicious attempts to invade a face recognition system by presenting spoofed faces.
We propose a novel Statistical Adapter (S-Adapter) that gathers local discriminative and statistical information from localized token histograms.
To further improve the generalization of the statistical tokens, we propose a novel Token Style Regularization (TSR)
Our experimental results demonstrate that our proposed S-Adapter and TSR provide significant benefits in both zero-shot and few-shot cross-domain testing, outperforming state-of-the-art methods on several benchmark tests.
arXiv Detail & Related papers (2023-09-07T22:36:22Z) - Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust.
Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model.
We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z) - Cluster-level pseudo-labelling for source-free cross-domain facial
expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER)
Our method exploits self-supervised pretraining to learn good feature representations from the target data.
We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z) - Towards Data-Efficient Detection Transformers [77.43470797296906]
We show most detection transformers suffer from significant performance drops on small-size datasets.
We empirically analyze the factors that affect data efficiency, through a step-by-step transition from a data-efficient RCNN variant to the representative DETR.
We introduce a simple yet effective label augmentation method to provide richer supervision and improve data efficiency.
arXiv Detail & Related papers (2022-03-17T17:56:34Z) - Vision Transformers are Robust Learners [65.91359312429147]
We study the robustness of the Vision Transformer (ViT) against common corruptions and perturbations, distribution shifts, and natural adversarial examples.
We present analyses that provide both quantitative and qualitative indications to explain why ViTs are indeed more robust learners.
arXiv Detail & Related papers (2021-05-17T02:39:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.