S-Adapter: Generalizing Vision Transformer for Face Anti-Spoofing with Statistical Tokens
- URL: http://arxiv.org/abs/2309.04038v2
- Date: Wed, 19 Jun 2024 08:46:23 GMT
- Title: S-Adapter: Generalizing Vision Transformer for Face Anti-Spoofing with Statistical Tokens
- Authors: Rizhao Cai, Zitong Yu, Chenqi Kong, Haoliang Li, Changsheng Chen, Yongjian Hu, Alex Kot,
- Abstract summary: Face Anti-Spoofing (FAS) aims to detect malicious attempts to invade a face recognition system by presenting spoofed faces.
We propose a novel Statistical Adapter (S-Adapter) that gathers local discriminative and statistical information from localized token histograms.
To further improve the generalization of the statistical tokens, we propose a novel Token Style Regularization (TSR)
Our experimental results demonstrate that our proposed S-Adapter and TSR provide significant benefits in both zero-shot and few-shot cross-domain testing, outperforming state-of-the-art methods on several benchmark tests.
- Score: 45.06704981913823
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Face Anti-Spoofing (FAS) aims to detect malicious attempts to invade a face recognition system by presenting spoofed faces. State-of-the-art FAS techniques predominantly rely on deep learning models but their cross-domain generalization capabilities are often hindered by the domain shift problem, which arises due to different distributions between training and testing data. In this study, we develop a generalized FAS method under the Efficient Parameter Transfer Learning (EPTL) paradigm, where we adapt the pre-trained Vision Transformer models for the FAS task. During training, the adapter modules are inserted into the pre-trained ViT model, and the adapters are updated while other pre-trained parameters remain fixed. We find the limitations of previous vanilla adapters in that they are based on linear layers, which lack a spoofing-aware inductive bias and thus restrict the cross-domain generalization. To address this limitation and achieve cross-domain generalized FAS, we propose a novel Statistical Adapter (S-Adapter) that gathers local discriminative and statistical information from localized token histograms. To further improve the generalization of the statistical tokens, we propose a novel Token Style Regularization (TSR), which aims to reduce domain style variance by regularizing Gram matrices extracted from tokens across different domains. Our experimental results demonstrate that our proposed S-Adapter and TSR provide significant benefits in both zero-shot and few-shot cross-domain testing, outperforming state-of-the-art methods on several benchmark tests. We will release the source code upon acceptance.
Related papers
- Enhancing Test Time Adaptation with Few-shot Guidance [35.13317598777832]
Deep neural networks often encounter significant performance drops while facing with domain shifts between training (source) and test (target) data.
Test Time Adaptation (TTA) methods have been proposed to adapt pre-trained source model to handle out-of-distribution streaming target data.
We develop Few-Shot Test Time Adaptation (FS-TTA), a novel and practical setting that utilizes a few-shot support set on top of TTA.
arXiv Detail & Related papers (2024-09-02T15:50:48Z) - Open-Set Deepfake Detection: A Parameter-Efficient Adaptation Method with Forgery Style Mixture [58.60915132222421]
We introduce an approach that is both general and parameter-efficient for face forgery detection.
We design a forgery-style mixture formulation that augments the diversity of forgery source domains.
We show that the designed model achieves state-of-the-art generalizability with significantly reduced trainable parameters.
arXiv Detail & Related papers (2024-08-23T01:53:36Z) - Test-Time Domain Generalization for Face Anti-Spoofing [60.94384914275116]
Face Anti-Spoofing (FAS) is pivotal in safeguarding facial recognition systems against presentation attacks.
We introduce a novel Test-Time Domain Generalization framework for FAS, which leverages the testing data to boost the model's generalizability.
Our method, consisting of Test-Time Style Projection (TTSP) and Diverse Style Shifts Simulation (DSSS), effectively projects the unseen data to the seen domain space.
arXiv Detail & Related papers (2024-03-28T11:50:23Z) - FLIP: Cross-domain Face Anti-spoofing with Language Guidance [19.957293190322332]
Face anti-spoofing (FAS) or presentation attack detection is an essential component of face recognition systems.
Recent vision transformer (ViT) models have been shown to be effective for the FAS task.
We propose a novel approach for robust cross-domain FAS by grounding visual representations with the help of natural language.
arXiv Detail & Related papers (2023-09-28T17:53:20Z) - Generalized Face Forgery Detection via Adaptive Learning for Pre-trained Vision Transformer [54.32283739486781]
We present a textbfForgery-aware textbfAdaptive textbfVision textbfTransformer (FA-ViT) under the adaptive learning paradigm.
FA-ViT achieves 93.83% and 78.32% AUC scores on Celeb-DF and DFDC datasets in the cross-dataset evaluation.
arXiv Detail & Related papers (2023-09-20T06:51:11Z) - Consistency Regularization for Generalizable Source-free Domain
Adaptation [62.654883736925456]
Source-free domain adaptation (SFDA) aims to adapt a well-trained source model to an unlabelled target domain without accessing the source dataset.
Existing SFDA methods ONLY assess their adapted models on the target training set, neglecting the data from unseen but identically distributed testing sets.
We propose a consistency regularization framework to develop a more generalizable SFDA method.
arXiv Detail & Related papers (2023-08-03T07:45:53Z) - Enhancing General Face Forgery Detection via Vision Transformer with
Low-Rank Adaptation [31.780516471483985]
forgery faces pose pressing security concerns over fake news, fraud, impersonation, etc.
This paper designs a more general fake face detection model based on the vision transformer(ViT) architecture.
The proposed method achieves state-of-the-arts detection performances in both cross-manipulation and cross-dataset evaluations.
arXiv Detail & Related papers (2023-03-02T02:26:04Z) - One-Class Knowledge Distillation for Face Presentation Attack Detection [53.30584138746973]
This paper introduces a teacher-student framework to improve the cross-domain performance of face PAD with one-class domain adaptation.
Student networks are trained to mimic the teacher network and learn similar representations for genuine face samples of the target domain.
In the test phase, the similarity score between the representations of the teacher and student networks is used to distinguish attacks from genuine ones.
arXiv Detail & Related papers (2022-05-08T06:20:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.