DADM: Dual Alignment of Domain and Modality for Face Anti-spoofing
- URL: http://arxiv.org/abs/2503.00429v1
- Date: Sat, 01 Mar 2025 10:12:00 GMT
- Title: DADM: Dual Alignment of Domain and Modality for Face Anti-spoofing
- Authors: Jingyi Yang, Xun Lin, Zitong Yu, Liepiao Zhang, Xin Liu, Hui Li, Xiaochen Yuan, Xiaochun Cao,
- Abstract summary: Multi-modal face anti-spoofing (FAS) has emerged as a prominent research focus.<n>We propose a alignment module between modalities based on mutual information.<n>We employ a dual alignment optimization method that aligns both sub-domain hyperplanes and modality angle margins.
- Score: 58.62312400472865
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the availability of diverse sensor modalities (i.e., RGB, Depth, Infrared) and the success of multi-modal learning, multi-modal face anti-spoofing (FAS) has emerged as a prominent research focus. The intuition behind it is that leveraging multiple modalities can uncover more intrinsic spoofing traces. However, this approach presents more risk of misalignment. We identify two main types of misalignment: (1) \textbf{Intra-domain modality misalignment}, where the importance of each modality varies across different attacks. For instance, certain modalities (e.g., Depth) may be non-defensive against specific attacks (e.g., 3D mask), indicating that each modality has unique strengths and weaknesses in countering particular attacks. Consequently, simple fusion strategies may fall short. (2) \textbf{Inter-domain modality misalignment}, where the introduction of additional modalities exacerbates domain shifts, potentially overshadowing the benefits of complementary fusion. To tackle (1), we propose a alignment module between modalities based on mutual information, which adaptively enhances favorable modalities while suppressing unfavorable ones. To address (2), we employ a dual alignment optimization method that aligns both sub-domain hyperplanes and modality angle margins, thereby mitigating domain gaps. Our method, dubbed \textbf{D}ual \textbf{A}lignment of \textbf{D}omain and \textbf{M}odality (DADM), achieves state-of-the-art performance in extensive experiments across four challenging protocols demonstrating its robustness in multi-modal domain generalization scenarios. The codes will be released soon.
Related papers
- Aligning First, Then Fusing: A Novel Weakly Supervised Multimodal Violence Detection Method [11.01048485795428]
We propose a new weakly supervised violence detection framework.<n>It consists of unimodal multiple-instance learning for extracting unimodal semantic features, multimodal alignment, multimodal fusion, and final detection.<n> Experimental results on benchmark datasets demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2025-01-13T17:14:25Z) - Multi-Modality Driven LoRA for Adverse Condition Depth Estimation [61.525312117638116]
We propose Multi-Modality Driven LoRA (MMD-LoRA) for Adverse Condition Depth Estimation.<n>It consists of two core components: Prompt Driven Domain Alignment (PDDA) and Visual-Text Consistent Contrastive Learning (VTCCL)<n>It achieves state-of-the-art performance on the nuScenes and Oxford RobotCar datasets.
arXiv Detail & Related papers (2024-12-28T14:23:58Z) - Suppress and Rebalance: Towards Generalized Multi-Modal Face
Anti-Spoofing [26.901402236963374]
Face Anti-Spoofing (FAS) is crucial for securing face recognition systems against presentation attacks.
Many multi-modal FAS approaches have emerged, but they face challenges in generalizing to unseen attacks and deployment conditions.
arXiv Detail & Related papers (2024-02-29T16:06:36Z) - Object Segmentation by Mining Cross-Modal Semantics [68.88086621181628]
We propose a novel approach by mining the Cross-Modal Semantics to guide the fusion and decoding of multimodal features.
Specifically, we propose a novel network, termed XMSNet, consisting of (1) all-round attentive fusion (AF), (2) coarse-to-fine decoder (CFD), and (3) cross-layer self-supervision.
arXiv Detail & Related papers (2023-05-17T14:30:11Z) - Understanding and Constructing Latent Modality Structures in Multi-modal
Representation Learning [53.68371566336254]
We argue that the key to better performance lies in meaningful latent modality structures instead of perfect modality alignment.
Specifically, we design 1) a deep feature separation loss for intra-modality regularization; 2) a Brownian-bridge loss for inter-modality regularization; and 3) a geometric consistency loss for both intra- and inter-modality regularization.
arXiv Detail & Related papers (2023-03-10T14:38:49Z) - CDA: Contrastive-adversarial Domain Adaptation [11.354043674822451]
We propose a two-stage model for domain adaptation called textbfContrastive-adversarial textbfDomain textbfAdaptation textbf(CDA).
While the adversarial component facilitates domain-level alignment, two-stage contrastive learning exploits class information to achieve higher intra-class compactness across domains.
arXiv Detail & Related papers (2023-01-10T07:43:21Z) - Weakly Aligned Feature Fusion for Multimodal Object Detection [52.15436349488198]
multimodal data often suffer from the position shift problem, i.e., the image pair is not strictly aligned.
This problem makes it difficult to fuse multimodal features and puzzles the convolutional neural network (CNN) training.
In this article, we propose a general multimodal detector named aligned region CNN (AR-CNN) to tackle the position shift problem.
arXiv Detail & Related papers (2022-04-21T02:35:23Z) - A New Bidirectional Unsupervised Domain Adaptation Segmentation
Framework [27.13101555533594]
unsupervised domain adaptation (UDA) techniques are proposed to bridge the gap between different domains.
In this paper, we propose a bidirectional UDA framework based on disentangled representation learning for equally competent two-way UDA performances.
arXiv Detail & Related papers (2021-08-18T05:25:11Z) - MISA: Modality-Invariant and -Specific Representations for Multimodal
Sentiment Analysis [48.776247141839875]
We propose a novel framework, MISA, which projects each modality to two distinct subspaces.
The first subspace is modality-invariant, where the representations across modalities learn their commonalities and reduce the modality gap.
Our experiments on popular sentiment analysis benchmarks, MOSI and MOSEI, demonstrate significant gains over state-of-the-art models.
arXiv Detail & Related papers (2020-05-07T15:13:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.