Smoothing the Shift: Towards Stable Test-Time Adaptation under Complex Multimodal Noises
- URL: http://arxiv.org/abs/2503.02616v1
- Date: Tue, 04 Mar 2025 13:36:16 GMT
- Title: Smoothing the Shift: Towards Stable Test-Time Adaptation under Complex Multimodal Noises
- Authors: Zirun Guo, Tao Jin,
- Abstract summary: Test-Time Adaptation (TTA) aims to tackle distribution shifts using unlabeled test data without access to the source data.<n>Existing TTA methods fail in such multimodal scenario because the abrupt distribution shifts will destroy the prior knowledge from the source model.<n>We propose two novel strategies: sample identification with interquartile range Smoothing and unimodal assistance, and Mutual information sharing.
- Score: 3.7816957214446103
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Test-Time Adaptation (TTA) aims to tackle distribution shifts using unlabeled test data without access to the source data. In the context of multimodal data, there are more complex noise patterns than unimodal data such as simultaneous corruptions for multiple modalities and missing modalities. Besides, in real-world applications, corruptions from different distribution shifts are always mixed. Existing TTA methods always fail in such multimodal scenario because the abrupt distribution shifts will destroy the prior knowledge from the source model, thus leading to performance degradation. To this end, we reveal a new challenge named multimodal wild TTA. To address this challenging problem, we propose two novel strategies: sample identification with interquartile range Smoothing and unimodal assistance, and Mutual information sharing (SuMi). SuMi smooths the adaptation process by interquartile range which avoids the abrupt distribution shifts. Then, SuMi fully utilizes the unimodal features to select low-entropy samples with rich multimodal information for optimization. Furthermore, mutual information sharing is introduced to align the information, reduce the discrepancies and enhance the information utilization across different modalities. Extensive experiments on two public datasets show the effectiveness and superiority over existing methods under the complex noise patterns in multimodal data. Code is available at https://github.com/zrguo/SuMi.
Related papers
- MODIS: Multi-Omics Data Integration for Small and Unpaired Datasets [1.4999444543328289]
MODIS stands for Multi-Omics Data Integration for Small and unpaired datasets.
We build controlled experiments to explore how much supervision is needed for an accurate alignment of modalities.
arXiv Detail & Related papers (2025-03-24T16:33:11Z) - MICINet: Multi-Level Inter-Class Confusing Information Removal for Reliable Multimodal Classification [57.08108545219043]
A reliable multimodal classification method dubbed Multi-Level Inter-Class Confusing Information Removal Network (MICINet) is proposed.
MICINet achieves the reliable removal of both types of noise by unifying them into the concept of Inter-class Confusing Information (textitICI) and eliminating it at both global and individual levels.
Experiments on four datasets demonstrate that MICINet outperforms other state-of-the-art reliable multimodal classification methods under various noise conditions.
arXiv Detail & Related papers (2025-02-27T01:33:28Z) - MINIMA: Modality Invariant Image Matching [52.505282811925454]
We present MINIMA, a unified image matching framework for multiple cross-modal cases.
We scale up the modalities from cheap but rich RGB-only matching data, by means of generative models.
With MD-syn, we can directly train any advanced matching pipeline on randomly selected modality pairs to obtain cross-modal ability.
arXiv Detail & Related papers (2024-12-27T02:39:50Z) - TabDiff: a Mixed-type Diffusion Model for Tabular Data Generation [91.50296404732902]
We introduce TabDiff, a joint diffusion framework that models all mixed-type distributions of tabular data in one model.<n>Our key innovation is the development of a joint continuous-time diffusion process for numerical and categorical data.<n>TabDiff achieves superior average performance over existing competitive baselines, with up to $22.5%$ improvement over the state-of-the-art model on pair-wise column correlation estimations.
arXiv Detail & Related papers (2024-10-27T22:58:47Z) - FedMAC: Tackling Partial-Modality Missing in Federated Learning with Cross-Modal Aggregation and Contrastive Regularization [11.954904313477176]
Federated Learning (FL) is a method for training machine learning models using distributed data sources.
This study proposes a novel framework named FedMAC, designed to address multi-modality missing under conditions of partial-modality missing in FL.
arXiv Detail & Related papers (2024-10-04T01:24:02Z) - Multimodal Fusion on Low-quality Data: A Comprehensive Survey [110.22752954128738]
This paper surveys the common challenges and recent advances of multimodal fusion in the wild.
We identify four main challenges that are faced by multimodal fusion on low-quality data.
This new taxonomy will enable researchers to understand the state of the field and identify several potential directions.
arXiv Detail & Related papers (2024-04-27T07:22:28Z) - Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers.
We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes.
We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z) - SUMMIT: Source-Free Adaptation of Uni-Modal Models to Multi-Modal
Targets [30.262094419776208]
Current approaches assume that the source data is available during adaptation and that the source consists of paired multi-modal data.
We propose a switching framework which automatically chooses between two complementary methods of cross-modal pseudo-label fusion.
Our method achieves an improvement in mIoU of up to 12% over competing baselines.
arXiv Detail & Related papers (2023-08-23T02:57:58Z) - Align and Attend: Multimodal Summarization with Dual Contrastive Losses [57.83012574678091]
The goal of multimodal summarization is to extract the most important information from different modalities to form output summaries.
Existing methods fail to leverage the temporal correspondence between different modalities and ignore the intrinsic correlation between different samples.
We introduce Align and Attend Multimodal Summarization (A2Summ), a unified multimodal transformer-based model which can effectively align and attend the multimodal input.
arXiv Detail & Related papers (2023-03-13T17:01:42Z) - Unite and Conquer: Plug & Play Multi-Modal Synthesis using Diffusion
Models [54.1843419649895]
We propose a solution based on denoising diffusion probabilistic models (DDPMs)
Our motivation for choosing diffusion models over other generative models comes from the flexible internal structure of diffusion models.
Our method can unite multiple diffusion models trained on multiple sub-tasks and conquer the combined task.
arXiv Detail & Related papers (2022-12-01T18:59:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.