Source-Free Domain Adaptation with Frozen Multimodal Foundation Model
- URL: http://arxiv.org/abs/2311.16510v3
- Date: Wed, 13 Mar 2024 05:11:47 GMT
- Title: Source-Free Domain Adaptation with Frozen Multimodal Foundation Model
- Authors: Song Tang, Wenxin Su, Mao Ye, and Xiatian Zhu
- Abstract summary: Source-Free Domain Adaptation (SFDA) aims to adapt a source model for a target domain.
We explore the potentials of off-the-shelf vision-language (ViL) multimodal models with rich whilst heterogeneous knowledge.
We propose a novel Distilling multimodal Foundation model(DIFO)approach.
- Score: 42.19262809313472
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Source-Free Domain Adaptation (SFDA) aims to adapt a source model for a
target domain, with only access to unlabeled target training data and the
source model pre-trained on a supervised source domain. Relying on pseudo
labeling and/or auxiliary supervision, conventional methods are inevitably
error-prone. To mitigate this limitation, in this work we for the first time
explore the potentials of off-the-shelf vision-language (ViL) multimodal models
(e.g.,CLIP) with rich whilst heterogeneous knowledge. We find that directly
applying the ViL model to the target domain in a zero-shot fashion is
unsatisfactory, as it is not specialized for this particular task but largely
generic. To make it task specific, we propose a novel Distilling multimodal
Foundation model(DIFO)approach. Specifically, DIFO alternates between two steps
during adaptation: (i) Customizing the ViL model by maximizing the mutual
information with the target model in a prompt learning manner, (ii) Distilling
the knowledge of this customized ViL model to the target model. For more
fine-grained and reliable distillation, we further introduce two effective
regularization terms, namely most-likely category encouragement and predictive
consistency. Extensive experiments show that DIFO significantly outperforms the
state-of-the-art alternatives. Code is here
Related papers
- ViLAaD: Enhancing "Attracting and Dispersing'' Source-Free Domain Adaptation with Vision-and-Language Model [0.9831489366502302]
Source-Free Domain Adaptation (SFDA) aims to adapt a pre-trained source model to a target dataset from a different domain without access to the source data.
We propose a novel method that incorporates auxiliary information by extending an existing SFDA framework using Vision-and-Language (ViL) models.
Our approach, called ViL-enhanced AaD (ViLAaD), preserves the simplicity and flexibility of the AaD framework, while leveraging ViL models to significantly boost adaptation performance.
arXiv Detail & Related papers (2025-03-30T17:22:55Z) - Similarity-Based Domain Adaptation with LLMs [13.692329347889212]
Unsupervised domain adaptation leverages abundant labeled data from various source domains to generalize onto unlabeled target data.
This paper introduces a simple framework that utilizes the impressive generalization capabilities of Large Language Models (LLMs) for target data annotation.
Our framework achieves impressive performance, specifically, 2.44% accuracy improvement when compared to the SOTA method.
arXiv Detail & Related papers (2025-03-07T09:51:07Z) - Vision-Language-Action Model and Diffusion Policy Switching Enables Dexterous Control of an Anthropomorphic Hand [2.7036595757881323]
We propose a hybrid control method that combines the relative advantages of a fine-tuned Vision-Language-Action model and diffusion models.
We demonstrate this model switching approach results in a over 80% success rate compared to under 40% when only using a VLA model.
arXiv Detail & Related papers (2024-10-17T20:49:45Z) - The Unreasonable Effectiveness of Large Language-Vision Models for
Source-free Video Domain Adaptation [56.61543110071199]
Source-Free Video Unsupervised Domain Adaptation (SFVUDA) task consists in adapting an action recognition model, trained on a labelled source dataset, to an unlabelled target dataset.
Previous approaches have attempted to address SFVUDA by leveraging self-supervision derived from the target data itself.
We take an approach by exploiting "web-supervision" from Large Language-Vision Models (LLVMs), driven by the rationale that LLVMs contain a rich world prior surprisingly robust to domain-shift.
arXiv Detail & Related papers (2023-08-17T18:12:05Z) - Black-box Source-free Domain Adaptation via Two-stage Knowledge
Distillation [8.224874938178633]
Source-free domain adaptation aims to adapt deep neural networks using only pre-trained source models and target data.
accessing the source model still has a potential concern about leaking the source data, which reveals the patient's privacy.
We study the challenging but practical problem: black-box source-free domain adaptation where only the outputs of the source model and target data are available.
arXiv Detail & Related papers (2023-05-13T10:00:24Z) - Towards Efficient Task-Driven Model Reprogramming with Foundation Models [52.411508216448716]
Vision foundation models exhibit impressive power, benefiting from the extremely large model capacity and broad training data.
However, in practice, downstream scenarios may only support a small model due to the limited computational resources or efficiency considerations.
This brings a critical challenge for the real-world application of foundation models: one has to transfer the knowledge of a foundation model to the downstream task.
arXiv Detail & Related papers (2023-04-05T07:28:33Z) - Confidence Attention and Generalization Enhanced Distillation for
Continuous Video Domain Adaptation [62.458968086881555]
Continuous Video Domain Adaptation (CVDA) is a scenario where a source model is required to adapt to a series of individually available changing target domains.
We propose a Confidence-Attentive network with geneRalization enhanced self-knowledge disTillation (CART) to address the challenge in CVDA.
arXiv Detail & Related papers (2023-03-18T16:40:10Z) - Distill and Fine-tune: Effective Adaptation from a Black-box Source
Model [138.12678159620248]
Unsupervised domain adaptation (UDA) aims to transfer knowledge in previous related labeled datasets (source) to a new unlabeled dataset (target)
We propose a novel two-step adaptation framework called Distill and Fine-tune (Dis-tune)
arXiv Detail & Related papers (2021-04-04T05:29:05Z) - Do We Really Need to Access the Source Data? Source Hypothesis Transfer
for Unsupervised Domain Adaptation [102.67010690592011]
Unsupervised adaptationUDA (UDA) aims to leverage the knowledge learned from a labeled source dataset to solve similar tasks in a new unlabeled domain.
Prior UDA methods typically require to access the source data when learning to adapt the model.
This work tackles a practical setting where only a trained source model is available and how we can effectively utilize such a model without source data to solve UDA problems.
arXiv Detail & Related papers (2020-02-20T03:13:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.