Denoising and Alignment: Rethinking Domain Generalization for Multimodal Face Anti-Spoofing
- URL: http://arxiv.org/abs/2505.09484v1
- Date: Wed, 14 May 2025 15:36:44 GMT
- Title: Denoising and Alignment: Rethinking Domain Generalization for Multimodal Face Anti-Spoofing
- Authors: Yingjie Ma, Xun Lin, Zitong Yu, Xin Liu, Xiaochen Yuan, Weicheng Xie, Linlin Shen,
- Abstract summary: Face Anti-Spoofing (FAS) is essential for the security of facial recognition systems in diverse scenarios.<n>We introduce the textbfMultitextbfmodal textbfDenoising and textbfAlignment (textbfMMDA) framework.<n>By leveraging the zero-shot generalization capability of CLIP, the MMDA framework effectively suppresses noise in multimodal data.
- Score: 47.24147617685829
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Face Anti-Spoofing (FAS) is essential for the security of facial recognition systems in diverse scenarios such as payment processing and surveillance. Current multimodal FAS methods often struggle with effective generalization, mainly due to modality-specific biases and domain shifts. To address these challenges, we introduce the \textbf{M}ulti\textbf{m}odal \textbf{D}enoising and \textbf{A}lignment (\textbf{MMDA}) framework. By leveraging the zero-shot generalization capability of CLIP, the MMDA framework effectively suppresses noise in multimodal data through denoising and alignment mechanisms, thereby significantly enhancing the generalization performance of cross-modal alignment. The \textbf{M}odality-\textbf{D}omain Joint \textbf{D}ifferential \textbf{A}ttention (\textbf{MD2A}) module in MMDA concurrently mitigates the impacts of domain and modality noise by refining the attention mechanism based on extracted common noise features. Furthermore, the \textbf{R}epresentation \textbf{S}pace \textbf{S}oft (\textbf{RS2}) Alignment strategy utilizes the pre-trained CLIP model to align multi-domain multimodal data into a generalized representation space in a flexible manner, preserving intricate representations and enhancing the model's adaptability to various unseen conditions. We also design a \textbf{U}-shaped \textbf{D}ual \textbf{S}pace \textbf{A}daptation (\textbf{U-DSA}) module to enhance the adaptability of representations while maintaining generalization performance. These improvements not only enhance the framework's generalization capabilities but also boost its ability to represent complex representations. Our experimental results on four benchmark datasets under different evaluation protocols demonstrate that the MMDA framework outperforms existing state-of-the-art methods in terms of cross-domain generalization and multimodal detection accuracy. The code will be released soon.
Related papers
- Domain Generalized Stereo Matching with Uncertainty-guided Data Augmentation [11.938635624781313]
State-of-the-art stereo matching (SM) models often fail to generalize to real data domains due to domain differences.<n>We leverage data augmentation to expand the training domain, encouraging the model to acquire robust cross-domain feature representations.<n>Our approach is simple, architecture-agnostic, and can be integrated into any SM networks.
arXiv Detail & Related papers (2025-08-02T10:26:53Z) - FindRec: Stein-Guided Entropic Flow for Multi-Modal Sequential Recommendation [50.438552588818]
We propose textbfFindRec (textbfFlexible unified textbfinformation textbfdisentanglement for multi-modal sequential textbfRecommendation)<n>A Stein kernel-based Integrated Information Coordination Module (IICM) theoretically guarantees distribution consistency between multimodal features and ID streams.<n>A cross-modal expert routing mechanism that adaptively filters and combines multimodal features based on their contextual relevance.
arXiv Detail & Related papers (2025-07-07T04:09:45Z) - BiXFormer: A Robust Framework for Maximizing Modality Effectiveness in Multi-Modal Semantic Segmentation [55.486872677160015]
We reformulate multi-modal semantic segmentation as a mask-level classification task.<n>We propose BiXFormer, which integrates Unified Modality Matching (UMM) and Cross Modality Alignment (CMA)<n> Experiments on both synthetic and real-world multi-modal benchmarks demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2025-06-04T08:04:58Z) - Activation-Guided Consensus Merging for Large Language Models [25.68958388022476]
We present textbfActivation-Guided textbfConsensus textbfMerging (textbfACM), a plug-and-play merging framework that determines layer-specific merging coefficients.<n>Experiments on Long-to-Short (L2S) and general merging tasks demonstrate that ACM consistently outperforms all baseline methods.
arXiv Detail & Related papers (2025-05-20T07:04:01Z) - Noise Optimized Conditional Diffusion for Domain Adaptation [7.414646586981638]
Pseudo-labeling is a cornerstone of Unsupervised Domain Adaptation (UDA)<n>We propose textbfNoise textbfOptimized textbfConditional textbfDiffusion for textbfDomain textbfAdaptation (textbfNOCDDA)
arXiv Detail & Related papers (2025-05-12T13:28:31Z) - DADM: Dual Alignment of Domain and Modality for Face Anti-spoofing [58.62312400472865]
Multi-modal face anti-spoofing (FAS) has emerged as a prominent research focus.<n>We propose a alignment module between modalities based on mutual information.<n>We employ a dual alignment optimization method that aligns both sub-domain hyperplanes and modality angle margins.
arXiv Detail & Related papers (2025-03-01T10:12:00Z) - Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models [58.936893810674896]
Face Anti-Spoofing (FAS) is essential for ensuring the security and reliability of facial recognition systems.<n>We introduce a multimodal large language model framework for FAS, termed Interpretable Face Anti-Spoofing (I-FAS)<n>We propose a Spoof-aware Captioning and Filtering (SCF) strategy to generate high-quality captions for FAS images.
arXiv Detail & Related papers (2025-01-03T09:25:04Z) - Towards Robust Online Domain Adaptive Semantic Segmentation under Adverse Weather Conditions [43.58583290714884]
We present textbfRODASS, a textbfRobust textbfOnline textbfDomain textbfAdaptive textbfSemantic textbfSegmentation framework.
Our approach outperforms state-of-the-art methods on widely used OnDA benchmarks while maintaining approximately 40 frames per second (FPS)
arXiv Detail & Related papers (2024-09-02T08:53:08Z) - Enhancing Multimodal Unified Representations for Cross Modal Generalization [52.16653133604068]
We propose Training-free Optimization of Codebook (TOC) and Fine and Coarse cross-modal Information Disentangling (FCID)<n>These methods refine the unified discrete representations from pretraining and perform fine- and coarse-grained information disentanglement tailored to the specific characteristics of each modality.
arXiv Detail & Related papers (2024-03-08T09:16:47Z) - TeG-DG: Textually Guided Domain Generalization for Face Anti-Spoofing [8.830873674673828]
Existing methods are dedicated to extracting domain-invariant features from various training domains.
The extracted features inevitably contain residual style feature bias, resulting in inferior generalization performance.
We propose the Textually Guided Domain Generalization framework, which can effectively leverage text information for cross-domain alignment.
arXiv Detail & Related papers (2023-11-30T10:13:46Z) - Posterior Differential Regularization with f-divergence for Improving
Model Robustness [95.05725916287376]
We focus on methods that regularize the model posterior difference between clean and noisy inputs.
We generalize the posterior differential regularization to the family of $f$-divergences.
Our experiments show that regularizing the posterior differential with $f$-divergence can result in well-improved model robustness.
arXiv Detail & Related papers (2020-10-23T19:58:01Z) - Patch-level Neighborhood Interpolation: A General and Effective
Graph-based Regularization Strategy [77.34280933613226]
We propose a general regularizer called textbfPatch-level Neighborhood Interpolation(Pani) that conducts a non-local representation in the computation of networks.
Our proposal explicitly constructs patch-level graphs in different layers and then linearly interpolates neighborhood patch features, serving as a general and effective regularization strategy.
arXiv Detail & Related papers (2019-11-21T06:31:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.