DADM: Dual Alignment of Domain and Modality for Face Anti-spoofing
- URL: http://arxiv.org/abs/2503.00429v2
- Date: Tue, 27 May 2025 16:28:52 GMT
- Title: DADM: Dual Alignment of Domain and Modality for Face Anti-spoofing
- Authors: Jingyi Yang, Xun Lin, Zitong Yu, Liepiao Zhang, Xin Liu, Hui Li, Xiaochen Yuan, Xiaochun Cao,
- Abstract summary: Multi-modal face anti-spoofing (FAS) has emerged as a prominent research focus.<n>We propose a alignment module between modalities based on mutual information.<n>We employ a dual alignment optimization method that aligns both sub-domain hyperplanes and modality angle margins.
- Score: 58.62312400472865
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the availability of diverse sensor modalities (i.e., RGB, Depth, Infrared) and the success of multi-modal learning, multi-modal face anti-spoofing (FAS) has emerged as a prominent research focus. The intuition behind it is that leveraging multiple modalities can uncover more intrinsic spoofing traces. However, this approach presents more risk of misalignment. We identify two main types of misalignment: (1) \textbf{Intra-domain modality misalignment}, where the importance of each modality varies across different attacks. For instance, certain modalities (e.g., Depth) may be non-defensive against specific attacks (e.g., 3D mask), indicating that each modality has unique strengths and weaknesses in countering particular attacks. Consequently, simple fusion strategies may fall short. (2) \textbf{Inter-domain modality misalignment}, where the introduction of additional modalities exacerbates domain shifts, potentially overshadowing the benefits of complementary fusion. To tackle (1), we propose a alignment module between modalities based on mutual information, which adaptively enhances favorable modalities while suppressing unfavorable ones. To address (2), we employ a dual alignment optimization method that aligns both sub-domain hyperplanes and modality angle margins, thereby mitigating domain gaps. Our method, dubbed \textbf{D}ual \textbf{A}lignment of \textbf{D}omain and \textbf{M}odality (DADM), achieves state-of-the-art performance in extensive experiments across four challenging protocols demonstrating its robustness in multi-modal domain generalization scenarios. The codes will be released soon.
Related papers
- Toward Effective Multimodal Graph Foundation Model: A Divide-and-Conquer Based Approach [42.970648490410504]
Multimodal Graph Foundation Models (MGFMs) allow for leveraging the rich multimodal information in Multimodal-Attributed Graphs (MAGs)<n>We propose PLANET, a novel framework employing a Divide-and-Conquer strategy to decouple modality interaction and alignment across distinct granularities.<n>We show that PLANET significantly outperforms state-of-the-art baselines across diverse graph-centric and multimodal generative tasks.
arXiv Detail & Related papers (2026-02-04T01:05:12Z) - CAMO: Causality-Guided Adversarial Multimodal Domain Generalization for Crisis Classification [16.165585394745786]
Crisis classification in social media aims to extract actionable disaster-related information from posts.<n>Existing approaches primarily leverage deep learning to fuse textual and visual cues for crisis classification.<n>We introduce a causality-guided multimodal domain generalization framework that combines adversarial disentanglement with unified representation learning.
arXiv Detail & Related papers (2025-12-08T22:12:27Z) - When Alignment Fails: Multimodal Adversarial Attacks on Vision-Language-Action Models [75.16145284285456]
We introduce VLA-Fool, a comprehensive study of multimodal adversarial robustness in embodied VLA models under both white-box and black-box settings.<n>We develop the first automatically crafted and semantically guided prompting framework.<n> Experiments on the LIBERO benchmark reveal that even minor multimodal perturbations can cause significant behavioral deviations.
arXiv Detail & Related papers (2025-11-20T10:14:32Z) - Learning Representation and Synergy Invariances: A Povable Framework for Generalized Multimodal Face Anti-Spoofing [85.00865662325954]
Multimodal Face Anti-Spoofing (FAS) methods, which integrate multiple visual modalities, often suffer even more severe performance degradation when deployed in unseen domains.<n>This is mainly due to two overlooked risks that affect cross-domain multimodal generalization.<n>We propose a provable framework, namely Multimodal Representation and Synergy Invariance Learning (RiSe)
arXiv Detail & Related papers (2025-11-18T05:37:06Z) - Align-for-Fusion: Harmonizing Triple Preferences via Dual-oriented Diffusion for Cross-domain Sequential Recommendation [5.661192070842017]
Cross-domain sequential recommendation (CDSR) methods often follow an align-then-fusion paradigm.<n>We propose an align-for-fusion framework for CDSR to harmonize triple preferences via dual-oriented DMs, HorizonRec.<n>Experiments on four CDSR datasets from two distinct platforms demonstrate the effectiveness and robustness of HorizonRec.
arXiv Detail & Related papers (2025-08-07T07:00:29Z) - Principled Multimodal Representation Learning [70.60542106731813]
Multimodal representation learning seeks to create a unified representation space by integrating diverse data modalities.<n>Recent advances have investigated the simultaneous alignment of multiple modalities, yet several challenges remain.<n>We propose Principled Multimodal Representation Learning (PMRL), a novel framework that achieves simultaneous alignment of multiple modalities.
arXiv Detail & Related papers (2025-07-23T09:12:25Z) - BiXFormer: A Robust Framework for Maximizing Modality Effectiveness in Multi-Modal Semantic Segmentation [55.486872677160015]
We reformulate multi-modal semantic segmentation as a mask-level classification task.<n>We propose BiXFormer, which integrates Unified Modality Matching (UMM) and Cross Modality Alignment (CMA)<n> Experiments on both synthetic and real-world multi-modal benchmarks demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2025-06-04T08:04:58Z) - Denoising and Alignment: Rethinking Domain Generalization for Multimodal Face Anti-Spoofing [47.24147617685829]
Face Anti-Spoofing (FAS) is essential for the security of facial recognition systems in diverse scenarios.<n>We introduce the textbfMultitextbfmodal textbfDenoising and textbfAlignment (textbfMMDA) framework.<n>By leveraging the zero-shot generalization capability of CLIP, the MMDA framework effectively suppresses noise in multimodal data.
arXiv Detail & Related papers (2025-05-14T15:36:44Z) - Aligning First, Then Fusing: A Novel Weakly Supervised Multimodal Violence Detection Method [11.01048485795428]
We propose a new weakly supervised violence detection framework.<n>It consists of unimodal multiple-instance learning for extracting unimodal semantic features, multimodal alignment, multimodal fusion, and final detection.<n> Experimental results on benchmark datasets demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2025-01-13T17:14:25Z) - Boosting Adversarial Transferability with Spatial Adversarial Alignment [56.97809949196889]
Deep neural networks are vulnerable to adversarial examples that exhibit transferability across various models.<n>We propose a technique that employs an alignment loss and leverages a witness model to fine-tune the surrogate model.<n>Experiments on various architectures on ImageNet show that aligned surrogate models based on SAA can provide higher transferable adversarial examples.
arXiv Detail & Related papers (2025-01-02T02:35:47Z) - Multi-Modality Driven LoRA for Adverse Condition Depth Estimation [61.525312117638116]
We propose Multi-Modality Driven LoRA (MMD-LoRA) for Adverse Condition Depth Estimation.<n>It consists of two core components: Prompt Driven Domain Alignment (PDDA) and Visual-Text Consistent Contrastive Learning (VTCCL)<n>It achieves state-of-the-art performance on the nuScenes and Oxford RobotCar datasets.
arXiv Detail & Related papers (2024-12-28T14:23:58Z) - Suppress and Rebalance: Towards Generalized Multi-Modal Face
Anti-Spoofing [26.901402236963374]
Face Anti-Spoofing (FAS) is crucial for securing face recognition systems against presentation attacks.
Many multi-modal FAS approaches have emerged, but they face challenges in generalizing to unseen attacks and deployment conditions.
arXiv Detail & Related papers (2024-02-29T16:06:36Z) - Object Segmentation by Mining Cross-Modal Semantics [68.88086621181628]
We propose a novel approach by mining the Cross-Modal Semantics to guide the fusion and decoding of multimodal features.
Specifically, we propose a novel network, termed XMSNet, consisting of (1) all-round attentive fusion (AF), (2) coarse-to-fine decoder (CFD), and (3) cross-layer self-supervision.
arXiv Detail & Related papers (2023-05-17T14:30:11Z) - Source-free Domain Adaptation Requires Penalized Diversity [60.04618512479438]
Source-free domain adaptation (SFDA) was introduced to address knowledge transfer between different domains in the absence of source data.
In unsupervised SFDA, the diversity is limited to learning a single hypothesis on the source or learning multiple hypotheses with a shared feature extractor.
We propose a novel unsupervised SFDA algorithm that promotes representational diversity through the use of separate feature extractors.
arXiv Detail & Related papers (2023-04-06T00:20:19Z) - Understanding and Constructing Latent Modality Structures in Multi-modal
Representation Learning [53.68371566336254]
We argue that the key to better performance lies in meaningful latent modality structures instead of perfect modality alignment.
Specifically, we design 1) a deep feature separation loss for intra-modality regularization; 2) a Brownian-bridge loss for inter-modality regularization; and 3) a geometric consistency loss for both intra- and inter-modality regularization.
arXiv Detail & Related papers (2023-03-10T14:38:49Z) - CDA: Contrastive-adversarial Domain Adaptation [11.354043674822451]
We propose a two-stage model for domain adaptation called textbfContrastive-adversarial textbfDomain textbfAdaptation textbf(CDA).
While the adversarial component facilitates domain-level alignment, two-stage contrastive learning exploits class information to achieve higher intra-class compactness across domains.
arXiv Detail & Related papers (2023-01-10T07:43:21Z) - Weakly Aligned Feature Fusion for Multimodal Object Detection [52.15436349488198]
multimodal data often suffer from the position shift problem, i.e., the image pair is not strictly aligned.
This problem makes it difficult to fuse multimodal features and puzzles the convolutional neural network (CNN) training.
In this article, we propose a general multimodal detector named aligned region CNN (AR-CNN) to tackle the position shift problem.
arXiv Detail & Related papers (2022-04-21T02:35:23Z) - A New Bidirectional Unsupervised Domain Adaptation Segmentation
Framework [27.13101555533594]
unsupervised domain adaptation (UDA) techniques are proposed to bridge the gap between different domains.
In this paper, we propose a bidirectional UDA framework based on disentangled representation learning for equally competent two-way UDA performances.
arXiv Detail & Related papers (2021-08-18T05:25:11Z) - MISA: Modality-Invariant and -Specific Representations for Multimodal
Sentiment Analysis [48.776247141839875]
We propose a novel framework, MISA, which projects each modality to two distinct subspaces.
The first subspace is modality-invariant, where the representations across modalities learn their commonalities and reduce the modality gap.
Our experiments on popular sentiment analysis benchmarks, MOSI and MOSEI, demonstrate significant gains over state-of-the-art models.
arXiv Detail & Related papers (2020-05-07T15:13:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.