Related papers: SUMMIT: Source-Free Adaptation of Uni-Modal Models to Multi-Modal Targets

SUMMIT: Source-Free Adaptation of Uni-Modal Models to Multi-Modal Targets

URL: http://arxiv.org/abs/2308.11880v1
Date: Wed, 23 Aug 2023 02:57:58 GMT
Title: SUMMIT: Source-Free Adaptation of Uni-Modal Models to Multi-Modal Targets
Authors: Cody Simons, Dripta S. Raychaudhuri, Sk Miraj Ahmed, Suya You, Konstantinos Karydis, Amit K. Roy-Chowdhury
Abstract summary: Current approaches assume that the source data is available during adaptation and that the source consists of paired multi-modal data. We propose a switching framework which automatically chooses between two complementary methods of cross-modal pseudo-label fusion. Our method achieves an improvement in mIoU of up to 12% over competing baselines.
Score: 30.262094419776208
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Scene understanding using multi-modal data is necessary in many applications, e.g., autonomous navigation. To achieve this in a variety of situations, existing models must be able to adapt to shifting data distributions without arduous data annotation. Current approaches assume that the source data is available during adaptation and that the source consists of paired multi-modal data. Both these assumptions may be problematic for many applications. Source data may not be available due to privacy, security, or economic concerns. Assuming the existence of paired multi-modal data for training also entails significant data collection costs and fails to take advantage of widely available freely distributed pre-trained uni-modal models. In this work, we relax both of these assumptions by addressing the problem of adapting a set of models trained independently on uni-modal data to a target domain consisting of unlabeled multi-modal data, without having access to the original source dataset. Our proposed approach solves this problem through a switching framework which automatically chooses between two complementary methods of cross-modal pseudo-label fusion -- agreement filtering and entropy weighting -- based on the estimated domain gap. We demonstrate our work on the semantic segmentation problem. Experiments across seven challenging adaptation scenarios verify the efficacy of our approach, achieving results comparable to, and in some cases outperforming, methods which assume access to source data. Our method achieves an improvement in mIoU of up to 12% over competing baselines. Our code is publicly available at https://github.com/csimo005/SUMMIT.

Related papers

Enhancing Modality Representation and Alignment for Multimodal Cold-start Active Learning [17.954883799795155]
We develop a two-stage method for Multi-Modal Cold-Start Active Learning (MMCSAL) First, we observe the modality gap, a significant distance between the centroids of representations from different modalities, when only using cross-modal pairing information as self-supervision signals. Secondly, we propose enhancing cross-modal alignment through regularization, thereby improving the quality of selected multimodal data pairs in MMCSAL.
arXiv Detail & Related papers (2024-12-12T10:03:46Z)
FedMAC: Tackling Partial-Modality Missing in Federated Learning with Cross-Modal Aggregation and Contrastive Regularization [11.954904313477176]
Federated Learning (FL) is a method for training machine learning models using distributed data sources. This study proposes a novel framework named FedMAC, designed to address multi-modality missing under conditions of partial-modality missing in FL.
arXiv Detail & Related papers (2024-10-04T01:24:02Z)
Missing Modality Prediction for Unpaired Multimodal Learning via Joint Embedding of Unimodal Models [6.610033827647869]
In real-world scenarios, consistently acquiring complete multimodal data presents significant challenges. This often leads to the issue of missing modalities, where data for certain modalities are absent. We propose a novel framework integrating parameter-efficient fine-tuning of unimodal pretrained models with a self-supervised joint-embedding learning method.
arXiv Detail & Related papers (2024-07-17T14:44:25Z)
Borrowing Treasures from Neighbors: In-Context Learning for Multimodal Learning with Missing Modalities and Data Scarcity [9.811378971225727]
This paper extends the current research into missing modalities to the low-data regime. It is often expensive to get full-modality data and sufficient annotated training samples. We propose to use retrieval-augmented in-context learning to address these two crucial issues.
arXiv Detail & Related papers (2024-03-14T14:19:48Z)
Debiasing Multimodal Models via Causal Information Minimization [65.23982806840182]
We study bias arising from confounders in a causal graph for multimodal data. Robust predictive features contain diverse information that helps a model generalize to out-of-distribution data. We use these features as confounder representations and use them via methods motivated by causal theory to remove bias from models.
arXiv Detail & Related papers (2023-11-28T16:46:14Z)
Informative Data Mining for One-Shot Cross-Domain Semantic Segmentation [84.82153655786183]
We propose a novel framework called Informative Data Mining (IDM) to enable efficient one-shot domain adaptation for semantic segmentation. IDM provides an uncertainty-based selection criterion to identify the most informative samples, which facilitates quick adaptation and reduces redundant training. Our approach outperforms existing methods and achieves a new state-of-the-art one-shot performance of 56.7%/55.4% on the GTA5/SYNTHIA to Cityscapes adaptation tasks.
arXiv Detail & Related papers (2023-09-25T15:56:01Z)
Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models. This creates a barrier to fusing knowledge across individual models to yield a better single model. We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z)
Discriminative Multimodal Learning via Conditional Priors in Generative Models [21.166519800652047]
This research studies the realistic scenario in which all modalities and class labels are available for model training. We show, in this scenario, that the variational lower bound limits mutual information between joint representations and missing modalities.
arXiv Detail & Related papers (2021-10-09T17:22:24Z)
Learning Invariant Representation with Consistency and Diversity for Semi-supervised Source Hypothesis Transfer [46.68586555288172]
We propose a novel task named Semi-supervised Source Hypothesis Transfer (SSHT), which performs domain adaptation based on source trained model, to generalize well in target domain with a few supervisions. We propose Consistency and Diversity Learning (CDL), a simple but effective framework for SSHT by facilitating prediction consistency between two randomly augmented unlabeled data. Experimental results show that our method outperforms existing SSDA methods and unsupervised model adaptation methods on DomainNet, Office-Home and Office-31 datasets.
arXiv Detail & Related papers (2021-07-07T04:14:24Z)
Unsupervised Multi-source Domain Adaptation Without Access to Source Data [58.551861130011886]
Unsupervised Domain Adaptation (UDA) aims to learn a predictor model for an unlabeled domain by transferring knowledge from a separate labeled source domain. We propose a novel and efficient algorithm which automatically combines the source models with suitable weights in such a way that it performs at least as good as the best source model.
arXiv Detail & Related papers (2021-04-05T10:45:12Z)
Relating by Contrasting: A Data-efficient Framework for Multimodal Generative Models [86.9292779620645]
We develop a contrastive framework for generative model learning, allowing us to train the model not just by the commonality between modalities, but by the distinction between "related" and "unrelated" multimodal data. Under our proposed framework, the generative model can accurately identify related samples from unrelated ones, making it possible to make use of the plentiful unlabeled, unpaired multimodal data.
arXiv Detail & Related papers (2020-07-02T15:08:11Z)
Do We Really Need to Access the Source Data? Source Hypothesis Transfer for Unsupervised Domain Adaptation [102.67010690592011]
Unsupervised adaptationUDA (UDA) aims to leverage the knowledge learned from a labeled source dataset to solve similar tasks in a new unlabeled domain. Prior UDA methods typically require to access the source data when learning to adapt the model. This work tackles a practical setting where only a trained source model is available and how we can effectively utilize such a model without source data to solve UDA problems.
arXiv Detail & Related papers (2020-02-20T03:13:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.