Suppress and Rebalance: Towards Generalized Multi-Modal Face
Anti-Spoofing
- URL: http://arxiv.org/abs/2402.19298v2
- Date: Tue, 5 Mar 2024 11:59:29 GMT
- Title: Suppress and Rebalance: Towards Generalized Multi-Modal Face
Anti-Spoofing
- Authors: Xun Lin, Shuai Wang, Rizhao Cai, Yizhong Liu, Ying Fu, Zitong Yu,
Wenzhong Tang, Alex Kot
- Abstract summary: Face Anti-Spoofing (FAS) is crucial for securing face recognition systems against presentation attacks.
Many multi-modal FAS approaches have emerged, but they face challenges in generalizing to unseen attacks and deployment conditions.
- Score: 26.901402236963374
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Face Anti-Spoofing (FAS) is crucial for securing face recognition systems
against presentation attacks. With advancements in sensor manufacture and
multi-modal learning techniques, many multi-modal FAS approaches have emerged.
However, they face challenges in generalizing to unseen attacks and deployment
conditions. These challenges arise from (1) modality unreliability, where some
modality sensors like depth and infrared undergo significant domain shifts in
varying environments, leading to the spread of unreliable information during
cross-modal feature fusion, and (2) modality imbalance, where training overly
relies on a dominant modality hinders the convergence of others, reducing
effectiveness against attack types that are indistinguishable sorely using the
dominant modality. To address modality unreliability, we propose the
Uncertainty-Guided Cross-Adapter (U-Adapter) to recognize unreliably detected
regions within each modality and suppress the impact of unreliable regions on
other modalities. For modality imbalance, we propose a Rebalanced Modality
Gradient Modulation (ReGrad) strategy to rebalance the convergence speed of all
modalities by adaptively adjusting their gradients. Besides, we provide the
first large-scale benchmark for evaluating multi-modal FAS performance under
domain generalization scenarios. Extensive experiments demonstrate that our
method outperforms state-of-the-art methods. Source code and protocols will be
released on https://github.com/OMGGGGG/mmdg.
Related papers
- FlickerFusion: Intra-trajectory Domain Generalizing Multi-Agent RL [19.236153474365747]
Existing MARL approaches often rely on the restrictive assumption that the number of entities remains constant between training and inference.
In this paper, we tackle the challenge of intra-trajectory dynamic entity composition under zero-shot out-of-domain (OOD) generalization.
We propose FlickerFusion, a novel OOD generalization method that acts as a universally applicable augmentation technique for MARL backbone methods.
arXiv Detail & Related papers (2024-10-21T10:57:45Z) - On-the-fly Modulation for Balanced Multimodal Learning [53.616094855778954]
Multimodal learning is expected to boost model performance by integrating information from different modalities.
The widely-used joint training strategy leads to imbalanced and under-optimized uni-modal representations.
We propose On-the-fly Prediction Modulation (OPM) and On-the-fly Gradient Modulation (OGM) strategies to modulate the optimization of each modality.
arXiv Detail & Related papers (2024-10-15T13:15:50Z) - Cross-Modality Attack Boosted by Gradient-Evolutionary Multiform Optimization [4.226449585713182]
Cross-modal adversarial attacks pose significant challenges to attack transferability.
We propose a novel cross-modal adversarial attack strategy, termed multiform attack.
We demonstrate the superiority and robustness of Multiform Attack compared to existing techniques.
arXiv Detail & Related papers (2024-09-26T15:52:34Z) - MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild [81.32127423981426]
Multimodal emotion recognition based on audio and video data is important for real-world applications.
Recent methods have focused on exploiting advances of self-supervised learning (SSL) for pre-training of strong multimodal encoders.
We propose a different perspective on the problem and investigate the advancement of multimodal DFER performance by adapting SSL-pre-trained disjoint unimodal encoders.
arXiv Detail & Related papers (2024-04-13T13:39:26Z) - Real-GDSR: Real-World Guided DSM Super-Resolution via Edge-Enhancing Residual Network [2.3020018305241337]
A low-resolution digital surface model (DSM) features distinctive attributes impacted by noise, sensor limitations and data acquisition conditions.
This causes super-resolution models trained on synthetic data does not perform effectively on real ones.
We introduce a novel methodology to address the intricacies of real-world DSM super-resolution, named REAL-GDSR.
arXiv Detail & Related papers (2024-04-05T07:24:10Z) - Cross-Modality Perturbation Synergy Attack for Person Re-identification [66.48494594909123]
The main challenge in cross-modality ReID lies in effectively dealing with visual differences between different modalities.
Existing attack methods have primarily focused on the characteristics of the visible image modality.
This study proposes a universal perturbation attack specifically designed for cross-modality ReID.
arXiv Detail & Related papers (2024-01-18T15:56:23Z) - Cross-Attention is Not Enough: Incongruity-Aware Dynamic Hierarchical
Fusion for Multimodal Affect Recognition [69.32305810128994]
Incongruity between modalities poses a challenge for multimodal fusion, especially in affect recognition.
We propose the Hierarchical Crossmodal Transformer with Dynamic Modality Gating (HCT-DMG), a lightweight incongruity-aware model.
HCT-DMG: 1) outperforms previous multimodal models with a reduced size of approximately 0.8M parameters; 2) recognizes hard samples where incongruity makes affect recognition difficult; 3) mitigates the incongruity at the latent level in crossmodal attention.
arXiv Detail & Related papers (2023-05-23T01:24:15Z) - Exploiting modality-invariant feature for robust multimodal emotion
recognition with missing modalities [76.08541852988536]
We propose to use invariant features for a missing modality imagination network (IF-MMIN)
We show that the proposed model outperforms all baselines and invariantly improves the overall emotion recognition performance under uncertain missing-modality conditions.
arXiv Detail & Related papers (2022-10-27T12:16:25Z) - Semi-Supervised Learning with Variational Bayesian Inference and Maximum
Uncertainty Regularization [62.21716612888669]
We propose two generic methods for improving semi-supervised learning (SSL)
The first integrates weight perturbation (WP) into existing "consistency regularization" (CR) based methods.
The second method proposes a novel consistency loss called "maximum uncertainty regularization" (MUR)
arXiv Detail & Related papers (2020-12-03T09:49:35Z) - Contextual Fusion For Adversarial Robustness [0.0]
Deep neural networks are usually designed to process one particular information stream and susceptible to various types of adversarial perturbations.
We developed a fusion model using a combination of background and foreground features extracted in parallel from Places-CNN and Imagenet-CNN.
For gradient based attacks, our results show that fusion allows for significant improvements in classification without decreasing performance on unperturbed data.
arXiv Detail & Related papers (2020-11-18T20:13:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.