Learning Representation and Synergy Invariances: A Povable Framework for Generalized Multimodal Face Anti-Spoofing
- URL: http://arxiv.org/abs/2511.14157v1
- Date: Tue, 18 Nov 2025 05:37:06 GMT
- Title: Learning Representation and Synergy Invariances: A Povable Framework for Generalized Multimodal Face Anti-Spoofing
- Authors: Xun Lin, Shuai Wang, Yi Yu, Zitong Yu, Jiale Zhou, Yizhong Liu, Xiaochun Cao, Alex Kot, Yefeng Zheng,
- Abstract summary: Multimodal Face Anti-Spoofing (FAS) methods, which integrate multiple visual modalities, often suffer even more severe performance degradation when deployed in unseen domains.<n>This is mainly due to two overlooked risks that affect cross-domain multimodal generalization.<n>We propose a provable framework, namely Multimodal Representation and Synergy Invariance Learning (RiSe)
- Score: 85.00865662325954
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Multimodal Face Anti-Spoofing (FAS) methods, which integrate multiple visual modalities, often suffer even more severe performance degradation than unimodal FAS when deployed in unseen domains. This is mainly due to two overlooked risks that affect cross-domain multimodal generalization. The first is the modal representation invariant risk, i.e., whether representations remain generalizable under domain shift. We theoretically show that the inherent class asymmetry in FAS (diverse spoofs vs. compact reals) enlarges the upper bound of generalization error, and this effect is further amplified in multimodal settings. The second is the modal synergy invariant risk, where models overfit to domain-specific inter-modal correlations. Such spurious synergy cannot generalize to unseen attacks in target domains, leading to performance drops. To solve these issues, we propose a provable framework, namely Multimodal Representation and Synergy Invariance Learning (RiSe). For representation risk, RiSe introduces Asymmetric Invariant Risk Minimization (AsyIRM), which learns an invariant spherical decision boundary in radial space to fit asymmetric distributions, while preserving domain cues in angular space. For synergy risk, RiSe employs Multimodal Synergy Disentanglement (MMSD), a self-supervised task enhancing intrinsic, generalizable modal features via cross-sample mixing and disentanglement. Theoretical analysis and experiments verify RiSe, which achieves state-of-the-art cross-domain performance.
Related papers
- Modality-Collaborative Low-Rank Decomposers for Few-Shot Video Domain Adaptation [74.16390314862801]
We study the challenging task of Few-Shot Video Domain Adaptation (FSVDA)<n>We introduce a novel framework of Modality-Collaborative LowRank Decomposers (MC-LRD) to decompose modality-unique and modality-shared features.<n>Our model achieves significant improvements over existing methods.
arXiv Detail & Related papers (2025-11-24T03:09:59Z) - PA-FAS: Towards Interpretable and Generalizable Multimodal Face Anti-Spoofing via Path-Augmented Reinforcement Learning [42.24912525813944]
Face anti-spoofing (FAS) has recently advanced in multimodal fusion, cross-domain generalization, and interpretability.<n>We propose PA-FAS, which enhances reasoning paths by constructing high-quality extended reasoning sequences from limited annotations.<n>We also introduce an answer-shuffling mechanism during SFT to force comprehensive multimodal analysis instead of using superficial cues.
arXiv Detail & Related papers (2025-11-22T05:55:08Z) - Filling the Gaps: A Multitask Hybrid Multiscale Generative Framework for Missing Modality in Remote Sensing Semantic Segmentation [28.992992584085787]
Multimodal learning has shown significant performance boost compared to ordinary unimodal models.<n>In real-world scenarios, multimodal signals are susceptible to missing because of sensor failures and adverse weather conditions.<n>We propose a novel Generative-Enhanced MultiModal learning Network (GEMMNet) to tackle these limitations.
arXiv Detail & Related papers (2025-09-14T05:40:35Z) - Principled Multimodal Representation Learning [99.53621521696051]
Multimodal representation learning seeks to create a unified representation space by integrating diverse data modalities.<n>Recent advances have investigated the simultaneous alignment of multiple modalities, yet several challenges remain.<n>We propose Principled Multimodal Representation Learning (PMRL), a novel framework that achieves simultaneous alignment of multiple modalities.
arXiv Detail & Related papers (2025-07-23T09:12:25Z) - Denoising and Alignment: Rethinking Domain Generalization for Multimodal Face Anti-Spoofing [47.24147617685829]
Face Anti-Spoofing (FAS) is essential for the security of facial recognition systems in diverse scenarios.<n>We introduce the textbfMultitextbfmodal textbfDenoising and textbfAlignment (textbfMMDA) framework.<n>By leveraging the zero-shot generalization capability of CLIP, the MMDA framework effectively suppresses noise in multimodal data.
arXiv Detail & Related papers (2025-05-14T15:36:44Z) - DADM: Dual Alignment of Domain and Modality for Face Anti-spoofing [58.62312400472865]
Multi-modal face anti-spoofing (FAS) has emerged as a prominent research focus.<n>We propose a alignment module between modalities based on mutual information.<n>We employ a dual alignment optimization method that aligns both sub-domain hyperplanes and modality angle margins.
arXiv Detail & Related papers (2025-03-01T10:12:00Z) - Invariance Principle Meets Vicinal Risk Minimization [2.026281591452464]
Invariant Risk Minimization (IRM) aims to address OOD generalization by learning domain-invariant features.<n>We propose a domain-shared Semantic Data Augmentation (SDA) module, designed to enhance dataset diversity while maintaining label consistency.
arXiv Detail & Related papers (2024-07-08T09:16:42Z) - Suppress and Rebalance: Towards Generalized Multi-Modal Face
Anti-Spoofing [26.901402236963374]
Face Anti-Spoofing (FAS) is crucial for securing face recognition systems against presentation attacks.
Many multi-modal FAS approaches have emerged, but they face challenges in generalizing to unseen attacks and deployment conditions.
arXiv Detail & Related papers (2024-02-29T16:06:36Z) - Domain-Specific Risk Minimization for Out-of-Distribution Generalization [104.17683265084757]
We first establish a generalization bound that explicitly considers the adaptivity gap.
We propose effective gap estimation methods for guiding the selection of a better hypothesis for the target.
The other method is minimizing the gap directly by adapting model parameters using online target samples.
arXiv Detail & Related papers (2022-08-18T06:42:49Z) - Learning Invariant Representations and Risks for Semi-supervised Domain
Adaptation [109.73983088432364]
We propose the first method that aims to simultaneously learn invariant representations and risks under the setting of semi-supervised domain adaptation (Semi-DA)
We introduce the LIRR algorithm for jointly textbfLearning textbfInvariant textbfRepresentations and textbfRisks.
arXiv Detail & Related papers (2020-10-09T15:42:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.