BIG-MoE: Bypass Isolated Gating MoE for Generalized Multimodal Face Anti-Spoofing
- URL: http://arxiv.org/abs/2412.18065v1
- Date: Tue, 24 Dec 2024 00:28:28 GMT
- Title: BIG-MoE: Bypass Isolated Gating MoE for Generalized Multimodal Face Anti-Spoofing
- Authors: Yingjie Ma, Zitong Yu, Xun Lin, Weicheng Xie, Linlin Shen,
- Abstract summary: multimodal Face Anti-Spoofing (FAS) is essential for countering presentation attacks.
Existing technologies encounter challenges due to modality biases and imbalances, as well as domain shifts.
Our research introduces a Mixture of Experts (MoE) model to address these issues effectively.
- Score: 45.59998158610864
- License:
- Abstract: In the domain of facial recognition security, multimodal Face Anti-Spoofing (FAS) is essential for countering presentation attacks. However, existing technologies encounter challenges due to modality biases and imbalances, as well as domain shifts. Our research introduces a Mixture of Experts (MoE) model to address these issues effectively. We identified three limitations in traditional MoE approaches to multimodal FAS: (1) Coarse-grained experts' inability to capture nuanced spoofing indicators; (2) Gated networks' susceptibility to input noise affecting decision-making; (3) MoE's sensitivity to prompt tokens leading to overfitting with conventional learning methods. To mitigate these, we propose the Bypass Isolated Gating MoE (BIG-MoE) framework, featuring: (1) Fine-grained experts for enhanced detection of subtle spoofing cues; (2) An isolation gating mechanism to counteract input noise; (3) A novel differential convolutional prompt bypass enriching the gating network with critical local features, thereby improving perceptual capabilities. Extensive experiments on four benchmark datasets demonstrate significant generalization performance improvement in multimodal FAS task. The code is released at https://github.com/murInJ/BIG-MoE.
Related papers
- Mixture-of-Noises Enhanced Forgery-Aware Predictor for Multi-Face Manipulation Detection and Localization [52.87635234206178]
This paper proposes a new framework, namely MoNFAP, specifically tailored for multi-face manipulation detection and localization.
The framework incorporates two novel modules: the Forgery-aware Unified Predictor (FUP) Module and the Mixture-of-Noises Module (MNM)
arXiv Detail & Related papers (2024-08-05T08:35:59Z) - MoE-FFD: Mixture of Experts for Generalized and Parameter-Efficient Face Forgery Detection [54.545054873239295]
Deepfakes have recently raised significant trust issues and security concerns among the public.
ViT-based methods take advantage of the expressivity of transformers, achieving superior detection performance.
This work introduces Mixture-of-Experts modules for Face Forgery Detection (MoE-FFD), a generalized yet parameter-efficient ViT-based approach.
arXiv Detail & Related papers (2024-04-12T13:02:08Z) - Suppress and Rebalance: Towards Generalized Multi-Modal Face
Anti-Spoofing [26.901402236963374]
Face Anti-Spoofing (FAS) is crucial for securing face recognition systems against presentation attacks.
Many multi-modal FAS approaches have emerged, but they face challenges in generalizing to unseen attacks and deployment conditions.
arXiv Detail & Related papers (2024-02-29T16:06:36Z) - Generative-based Fusion Mechanism for Multi-Modal Tracking [35.77340348091937]
We introduce Conditional Generative Adversarial Networks (CGANs) and Diffusion Models (DMs)
We condition these multi-modal features with random noise in the GM framework, effectively transforming the original training samples into harder instances.
This design excels at extracting discriminative clues from the features, enhancing the ultimate tracking performance.
arXiv Detail & Related papers (2023-09-04T17:22:10Z) - Cross-Attention is Not Enough: Incongruity-Aware Dynamic Hierarchical
Fusion for Multimodal Affect Recognition [69.32305810128994]
Incongruity between modalities poses a challenge for multimodal fusion, especially in affect recognition.
We propose the Hierarchical Crossmodal Transformer with Dynamic Modality Gating (HCT-DMG), a lightweight incongruity-aware model.
HCT-DMG: 1) outperforms previous multimodal models with a reduced size of approximately 0.8M parameters; 2) recognizes hard samples where incongruity makes affect recognition difficult; 3) mitigates the incongruity at the latent level in crossmodal attention.
arXiv Detail & Related papers (2023-05-23T01:24:15Z) - M3FAS: An Accurate and Robust MultiModal Mobile Face Anti-Spoofing System [39.37647248710612]
Face presentation attacks (FPA) have brought increasing concerns to the public through various malicious applications.
We devise an accurate and robust MultiModal Mobile Face Anti-Spoofing system named M3FAS.
arXiv Detail & Related papers (2023-01-30T12:37:04Z) - Dual Spoof Disentanglement Generation for Face Anti-spoofing with Depth
Uncertainty Learning [54.15303628138665]
Face anti-spoofing (FAS) plays a vital role in preventing face recognition systems from presentation attacks.
Existing face anti-spoofing datasets lack diversity due to the insufficient identity and insignificant variance.
We propose Dual Spoof Disentanglement Generation framework to tackle this challenge by "anti-spoofing via generation"
arXiv Detail & Related papers (2021-12-01T15:36:59Z) - Face Anti-Spoofing with Human Material Perception [76.4844593082362]
Face anti-spoofing (FAS) plays a vital role in securing the face recognition systems from presentation attacks.
We rephrase face anti-spoofing as a material recognition problem and combine it with classical human material perception.
We propose the Bilateral Convolutional Networks (BCN), which is able to capture intrinsic material-based patterns.
arXiv Detail & Related papers (2020-07-04T18:25:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.