Attention Bootstrapping for Multi-Modal Test-Time Adaptation
- URL: http://arxiv.org/abs/2503.02221v1
- Date: Tue, 04 Mar 2025 02:53:53 GMT
- Title: Attention Bootstrapping for Multi-Modal Test-Time Adaptation
- Authors: Yusheng Zhao, Junyu Luo, Xiao Luo, Jinsheng Huang, Jingyang Yuan, Zhiping Xiao, Ming Zhang,
- Abstract summary: Test-time adaptation aims to adapt a well-trained model to potential distribution shifts at test time using only unlabeled test data.<n>This paper tackles the problem of multi-modal test-time adaptation by proposing a novel method named Attention Bootstrapping with Principal Entropy Minimization (ABPEM)<n>To mitigate this attention gap and encourage better modality fusion, we propose attention bootstrapping that promotes cross-attention with the guidance of self-attention.
- Score: 10.524759195371828
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Test-time adaptation aims to adapt a well-trained model to potential distribution shifts at test time using only unlabeled test data, without access to the original training data. While previous efforts mainly focus on a single modality, test-time distribution shift in the multi-modal setting is more complex and calls for new solutions. This paper tackles the problem of multi-modal test-time adaptation by proposing a novel method named Attention Bootstrapping with Principal Entropy Minimization (ABPEM). We observe that test-time distribution shift causes misalignment across modalities, leading to a large gap between intra-modality discrepancies (measured by self-attention) and inter-modality discrepancies (measured by cross-attention). We name this the attention gap. This attention gap widens with more severe distribution shifts, hindering effective modality fusion. To mitigate this attention gap and encourage better modality fusion, we propose attention bootstrapping that promotes cross-attention with the guidance of self-attention. Moreover, to reduce the gradient noise in the commonly-used entropy minimization, we adopt principal entropy minimization, a refinement of entropy minimization that reduces gradient noise by focusing on the principal parts of entropy, excluding less reliable gradient information. Extensive experiments on the benchmarks validate the effectiveness of the proposed ABPEM in comparison with competing baselines.
Related papers
- Learning to Match Unpaired Data with Minimum Entropy Coupling [7.399561232927219]
Minimum Entropy Coupling seeks to minimize the joint Entropy, while satisfying constraints on the marginals.
We propose a novel method to solve the continuous MEC problem, using well-known generative diffusion models.
We empirically demonstrate that our method, DDMEC, is general and can be easily used to address challenging tasks.
arXiv Detail & Related papers (2025-03-11T14:54:14Z) - Adding Additional Control to One-Step Diffusion with Joint Distribution Matching [58.37264951734603]
JDM is a novel approach that minimizes the reverse KL divergence between image-condition joint distributions.
By deriving a tractable upper bound, JDM decouples fidelity learning from condition learning.
This asymmetric distillation scheme enables our one-step student to handle controls unknown to the teacher model.
arXiv Detail & Related papers (2025-03-09T15:06:50Z) - Meta-TTT: A Meta-learning Minimax Framework For Test-Time Training [5.9631503543049895]
Test-time domain adaptation is a challenging task that aims to adapt a pre-trained model to limited, unlabeled target data during inference.
This paper introduces a meta-learning minimax framework for test-time training on batch normalization layers.
arXiv Detail & Related papers (2024-10-02T16:16:05Z) - Protected Test-Time Adaptation via Online Entropy Matching: A Betting Approach [14.958884168060097]
We present a novel approach for test-time adaptation via online self-training.<n>Our approach combines concepts in betting martingales and online learning to form a detection tool capable of reacting to distribution shifts.<n> Experimental results demonstrate that our approach improves test-time accuracy under distribution shifts while maintaining accuracy and calibration in their absence.
arXiv Detail & Related papers (2024-08-14T12:40:57Z) - Mitigating Shortcut Learning with Diffusion Counterfactuals and Diverse Ensembles [95.49699178874683]
We propose DiffDiv, an ensemble diversification framework exploiting Diffusion Probabilistic Models (DPMs)
We show that DPMs can generate images with novel feature combinations, even when trained on samples displaying correlated input features.
We show that DPM-guided diversification is sufficient to remove dependence on shortcut cues, without a need for additional supervised signals.
arXiv Detail & Related papers (2023-11-23T15:47:33Z) - PREM: A Simple Yet Effective Approach for Node-Level Graph Anomaly
Detection [65.24854366973794]
Node-level graph anomaly detection (GAD) plays a critical role in identifying anomalous nodes from graph-structured data in domains such as medicine, social networks, and e-commerce.
We introduce a simple method termed PREprocessing and Matching (PREM for short) to improve the efficiency of GAD.
Our approach streamlines GAD, reducing time and memory consumption while maintaining powerful anomaly detection capabilities.
arXiv Detail & Related papers (2023-10-18T02:59:57Z) - Revisiting Modality Imbalance In Multimodal Pedestrian Detection [6.7841188753203046]
We introduce a novel training setup with regularizer in the multimodal architecture to resolve the problem of this disparity between the modalities.
Specifically, our regularizer term helps to make the feature fusion method more robust by considering both the feature extractors equivalently important during the training.
arXiv Detail & Related papers (2023-02-24T11:56:57Z) - MAPS: A Noise-Robust Progressive Learning Approach for Source-Free
Domain Adaptive Keypoint Detection [76.97324120775475]
Cross-domain keypoint detection methods always require accessing the source data during adaptation.
This paper considers source-free domain adaptive keypoint detection, where only the well-trained source model is provided to the target domain.
arXiv Detail & Related papers (2023-02-09T12:06:08Z) - MaxMatch: Semi-Supervised Learning with Worst-Case Consistency [149.03760479533855]
We propose a worst-case consistency regularization technique for semi-supervised learning (SSL)
We present a generalization bound for SSL consisting of the empirical loss terms observed on labeled and unlabeled training data separately.
Motivated by this bound, we derive an SSL objective that minimizes the largest inconsistency between an original unlabeled sample and its multiple augmented variants.
arXiv Detail & Related papers (2022-09-26T12:04:49Z) - Distribution Mismatch Correction for Improved Robustness in Deep Neural
Networks [86.42889611784855]
normalization methods increase the vulnerability with respect to noise and input corruptions.
We propose an unsupervised non-parametric distribution correction method that adapts the activation distribution of each layer.
In our experiments, we empirically show that the proposed method effectively reduces the impact of intense image corruptions.
arXiv Detail & Related papers (2021-10-05T11:36:25Z) - CLUB: A Contrastive Log-ratio Upper Bound of Mutual Information [105.73798100327667]
We propose a novel Contrastive Log-ratio Upper Bound (CLUB) of mutual information.
We provide a theoretical analysis of the properties of CLUB and its variational approximation.
Based on this upper bound, we introduce a MI minimization training scheme and further accelerate it with a negative sampling strategy.
arXiv Detail & Related papers (2020-06-22T05:36:16Z) - Majorization Minimization Methods for Distributed Pose Graph
Optimization with Convergence Guarantees [0.76146285961466]
We show that our proposed methods are guaranteed to converge to first-order critical points under mild conditions.
Since our proposed methods rely a proximal operator of distributed PGO, the convergence rate can be significantly accelerated.
The efficacy of this work is validated through applications on a number of 2D and 3D SLAM datasets.
arXiv Detail & Related papers (2020-03-11T15:18:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.