DADM: Dual Alignment of Domain and Modality for Face Anti-spoofing
- URL: http://arxiv.org/abs/2503.00429v2
- Date: Tue, 27 May 2025 16:28:52 GMT
- Title: DADM: Dual Alignment of Domain and Modality for Face Anti-spoofing
- Authors: Jingyi Yang, Xun Lin, Zitong Yu, Liepiao Zhang, Xin Liu, Hui Li, Xiaochen Yuan, Xiaochun Cao,
- Abstract summary: Multi-modal face anti-spoofing (FAS) has emerged as a prominent research focus.<n>We propose a alignment module between modalities based on mutual information.<n>We employ a dual alignment optimization method that aligns both sub-domain hyperplanes and modality angle margins.
- Score: 58.62312400472865
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the availability of diverse sensor modalities (i.e., RGB, Depth, Infrared) and the success of multi-modal learning, multi-modal face anti-spoofing (FAS) has emerged as a prominent research focus. The intuition behind it is that leveraging multiple modalities can uncover more intrinsic spoofing traces. However, this approach presents more risk of misalignment. We identify two main types of misalignment: (1) \textbf{Intra-domain modality misalignment}, where the importance of each modality varies across different attacks. For instance, certain modalities (e.g., Depth) may be non-defensive against specific attacks (e.g., 3D mask), indicating that each modality has unique strengths and weaknesses in countering particular attacks. Consequently, simple fusion strategies may fall short. (2) \textbf{Inter-domain modality misalignment}, where the introduction of additional modalities exacerbates domain shifts, potentially overshadowing the benefits of complementary fusion. To tackle (1), we propose a alignment module between modalities based on mutual information, which adaptively enhances favorable modalities while suppressing unfavorable ones. To address (2), we employ a dual alignment optimization method that aligns both sub-domain hyperplanes and modality angle margins, thereby mitigating domain gaps. Our method, dubbed \textbf{D}ual \textbf{A}lignment of \textbf{D}omain and \textbf{M}odality (DADM), achieves state-of-the-art performance in extensive experiments across four challenging protocols demonstrating its robustness in multi-modal domain generalization scenarios. The codes will be released soon.
Related papers
- Align-for-Fusion: Harmonizing Triple Preferences via Dual-oriented Diffusion for Cross-domain Sequential Recommendation [5.661192070842017]
Cross-domain sequential recommendation (CDSR) methods often follow an align-then-fusion paradigm.<n>We propose an align-for-fusion framework for CDSR to harmonize triple preferences via dual-oriented DMs, HorizonRec.<n>Experiments on four CDSR datasets from two distinct platforms demonstrate the effectiveness and robustness of HorizonRec.
arXiv Detail & Related papers (2025-08-07T07:00:29Z) - Principled Multimodal Representation Learning [70.60542106731813]
Multimodal representation learning seeks to create a unified representation space by integrating diverse data modalities.<n>Recent advances have investigated the simultaneous alignment of multiple modalities, yet several challenges remain.<n>We propose Principled Multimodal Representation Learning (PMRL), a novel framework that achieves simultaneous alignment of multiple modalities.
arXiv Detail & Related papers (2025-07-23T09:12:25Z) - BiXFormer: A Robust Framework for Maximizing Modality Effectiveness in Multi-Modal Semantic Segmentation [55.486872677160015]
We reformulate multi-modal semantic segmentation as a mask-level classification task.<n>We propose BiXFormer, which integrates Unified Modality Matching (UMM) and Cross Modality Alignment (CMA)<n> Experiments on both synthetic and real-world multi-modal benchmarks demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2025-06-04T08:04:58Z) - Denoising and Alignment: Rethinking Domain Generalization for Multimodal Face Anti-Spoofing [47.24147617685829]
Face Anti-Spoofing (FAS) is essential for the security of facial recognition systems in diverse scenarios.<n>We introduce the textbfMultitextbfmodal textbfDenoising and textbfAlignment (textbfMMDA) framework.<n>By leveraging the zero-shot generalization capability of CLIP, the MMDA framework effectively suppresses noise in multimodal data.
arXiv Detail & Related papers (2025-05-14T15:36:44Z) - Aligning First, Then Fusing: A Novel Weakly Supervised Multimodal Violence Detection Method [11.01048485795428]
We propose a new weakly supervised violence detection framework.<n>It consists of unimodal multiple-instance learning for extracting unimodal semantic features, multimodal alignment, multimodal fusion, and final detection.<n> Experimental results on benchmark datasets demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2025-01-13T17:14:25Z) - Multi-Modality Driven LoRA for Adverse Condition Depth Estimation [61.525312117638116]
We propose Multi-Modality Driven LoRA (MMD-LoRA) for Adverse Condition Depth Estimation.<n>It consists of two core components: Prompt Driven Domain Alignment (PDDA) and Visual-Text Consistent Contrastive Learning (VTCCL)<n>It achieves state-of-the-art performance on the nuScenes and Oxford RobotCar datasets.
arXiv Detail & Related papers (2024-12-28T14:23:58Z) - Suppress and Rebalance: Towards Generalized Multi-Modal Face
Anti-Spoofing [26.901402236963374]
Face Anti-Spoofing (FAS) is crucial for securing face recognition systems against presentation attacks.
Many multi-modal FAS approaches have emerged, but they face challenges in generalizing to unseen attacks and deployment conditions.
arXiv Detail & Related papers (2024-02-29T16:06:36Z) - Object Segmentation by Mining Cross-Modal Semantics [68.88086621181628]
We propose a novel approach by mining the Cross-Modal Semantics to guide the fusion and decoding of multimodal features.
Specifically, we propose a novel network, termed XMSNet, consisting of (1) all-round attentive fusion (AF), (2) coarse-to-fine decoder (CFD), and (3) cross-layer self-supervision.
arXiv Detail & Related papers (2023-05-17T14:30:11Z) - Source-free Domain Adaptation Requires Penalized Diversity [60.04618512479438]
Source-free domain adaptation (SFDA) was introduced to address knowledge transfer between different domains in the absence of source data.
In unsupervised SFDA, the diversity is limited to learning a single hypothesis on the source or learning multiple hypotheses with a shared feature extractor.
We propose a novel unsupervised SFDA algorithm that promotes representational diversity through the use of separate feature extractors.
arXiv Detail & Related papers (2023-04-06T00:20:19Z) - Understanding and Constructing Latent Modality Structures in Multi-modal
Representation Learning [53.68371566336254]
We argue that the key to better performance lies in meaningful latent modality structures instead of perfect modality alignment.
Specifically, we design 1) a deep feature separation loss for intra-modality regularization; 2) a Brownian-bridge loss for inter-modality regularization; and 3) a geometric consistency loss for both intra- and inter-modality regularization.
arXiv Detail & Related papers (2023-03-10T14:38:49Z) - CDA: Contrastive-adversarial Domain Adaptation [11.354043674822451]
We propose a two-stage model for domain adaptation called textbfContrastive-adversarial textbfDomain textbfAdaptation textbf(CDA).
While the adversarial component facilitates domain-level alignment, two-stage contrastive learning exploits class information to achieve higher intra-class compactness across domains.
arXiv Detail & Related papers (2023-01-10T07:43:21Z) - Weakly Aligned Feature Fusion for Multimodal Object Detection [52.15436349488198]
multimodal data often suffer from the position shift problem, i.e., the image pair is not strictly aligned.
This problem makes it difficult to fuse multimodal features and puzzles the convolutional neural network (CNN) training.
In this article, we propose a general multimodal detector named aligned region CNN (AR-CNN) to tackle the position shift problem.
arXiv Detail & Related papers (2022-04-21T02:35:23Z) - A New Bidirectional Unsupervised Domain Adaptation Segmentation
Framework [27.13101555533594]
unsupervised domain adaptation (UDA) techniques are proposed to bridge the gap between different domains.
In this paper, we propose a bidirectional UDA framework based on disentangled representation learning for equally competent two-way UDA performances.
arXiv Detail & Related papers (2021-08-18T05:25:11Z) - MISA: Modality-Invariant and -Specific Representations for Multimodal
Sentiment Analysis [48.776247141839875]
We propose a novel framework, MISA, which projects each modality to two distinct subspaces.
The first subspace is modality-invariant, where the representations across modalities learn their commonalities and reduce the modality gap.
Our experiments on popular sentiment analysis benchmarks, MOSI and MOSEI, demonstrate significant gains over state-of-the-art models.
arXiv Detail & Related papers (2020-05-07T15:13:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.