Vision-aware Multimodal Prompt Tuning for Uploadable Multi-source Few-shot Domain Adaptation
- URL: http://arxiv.org/abs/2503.06106v1
- Date: Sat, 08 Mar 2025 07:17:06 GMT
- Title: Vision-aware Multimodal Prompt Tuning for Uploadable Multi-source Few-shot Domain Adaptation
- Authors: Kuanghong Liu, Jin Wang, Kangjian He, Dan Xu, Xuejie Zhang,
- Abstract summary: This paper introduces an uploadable multi-source few-shot domain adaptation (UMFDA) schema.<n>It belongs to a decentralized edge collaborative learning in the edge-side models that must maintain a low computational load.<n>A vision-aware multimodal prompt tuning framework (VAMP) under the decentralized schema is proposed.
- Score: 12.380114998101433
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Conventional multi-source domain few-shot adaptation (MFDA) faces the challenge of further reducing the load on edge-side devices in low-resource scenarios. Considering the native language-supervised advantage of CLIP and the plug-and-play nature of prompt to transfer CLIP efficiently, this paper introduces an uploadable multi-source few-shot domain adaptation (UMFDA) schema. It belongs to a decentralized edge collaborative learning in the edge-side models that must maintain a low computational load. And only a limited amount of annotations in source domain data is provided, with most of the data being unannotated. Further, this paper proposes a vision-aware multimodal prompt tuning framework (VAMP) under the decentralized schema, where the vision-aware prompt guides the text domain-specific prompt to maintain semantic discriminability and perceive the domain information. The cross-modal semantic and domain distribution alignment losses optimize each edge-side model, while text classifier consistency and semantic diversity losses promote collaborative learning among edge-side models. Extensive experiments were conducted on OfficeHome and DomainNet datasets to demonstrate the effectiveness of the proposed VAMP in the UMFDA, which outperformed the previous prompt tuning methods.
Related papers
- Multi-task Domain Adaptation for Computation Offloading in Edge-intelligence Networks [34.934911340540545]
This paper introduces a new approach, termed as Multi-Task Domain Adaptation (MTDA)<n>The proposed MTDA model incorporates a teacher-student architecture that allows continuous adaptation without necessitating access to the source domain data during inference.<n>It is observed that the proposed MTDA model maintains high performance across various scenarios, demonstrating its potential for practical deployment in emerging MEC applications.
arXiv Detail & Related papers (2025-01-02T13:20:29Z) - Multisource Collaborative Domain Generalization for Cross-Scene Remote Sensing Image Classification [57.945437355714155]
Cross-scene image classification aims to transfer prior knowledge of ground materials to annotate regions with different distributions.<n>Existing approaches focus on single-source domain generalization to unseen target domains.<n>We propose a novel multi-source collaborative domain generalization framework (MS-CDG) based on homogeneity and heterogeneity characteristics of multi-source remote sensing data.
arXiv Detail & Related papers (2024-12-05T06:15:08Z) - Dynamic Domain Discrepancy Adjustment for Active Multi-Domain Adaptation [3.367755441623275]
Multi-source unsupervised domain adaptation (MUDA) aims to transfer knowledge from related source domains to an unlabeled target domain.
We propose a novel approach called Dynamic Domain Discrepancy Adjustment for Active Multi-Domain Adaptation (D3AAMDA)
This mechanism controls the alignment level of features between each source domain and the target domain, effectively leveraging the local advantageous feature information within the source domains.
arXiv Detail & Related papers (2023-07-26T09:40:19Z) - VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature
Alignment [52.489874804051304]
VoLTA is a new vision-language pre-training paradigm that only utilizes image-caption data but fine-grained region-level image understanding.
VoLTA pushes multi-modal fusion deep into the uni-modal backbones during pre-training.
Experiments on a wide range of vision- and vision-language downstream tasks demonstrate the effectiveness of VoLTA.
arXiv Detail & Related papers (2022-10-09T01:49:58Z) - Multi-Prompt Alignment for Multi-Source Unsupervised Domain Adaptation [86.02485817444216]
We introduce Multi-Prompt Alignment (MPA), a simple yet efficient framework for multi-source UDA.
MPA denoises the learned prompts through an auto-encoding process and aligns them by maximizing the agreement of all the reconstructed prompts.
Experiments show that MPA achieves state-of-the-art results on three popular datasets with an impressive average accuracy of 54.1% on DomainNet.
arXiv Detail & Related papers (2022-09-30T03:40:10Z) - Improving Transferability of Domain Adaptation Networks Through Domain
Alignment Layers [1.3766148734487902]
Multi-source unsupervised domain adaptation (MSDA) aims at learning a predictor for an unlabeled domain by assigning weak knowledge from a bag of source models.
We propose to embed Multi-Source version of DomaIn Alignment Layers (MS-DIAL) at different levels of the predictor.
Our approach can improve state-of-the-art MSDA methods, yielding relative gains of up to +30.64% on their classification accuracies.
arXiv Detail & Related papers (2021-09-06T18:41:19Z) - T-SVDNet: Exploring High-Order Prototypical Correlations for
Multi-Source Domain Adaptation [41.356774580308986]
We propose a novel approach named T-SVDNet to address the task of Multi-source Domain Adaptation.
High-order correlations among multiple domains and categories are fully explored so as to better bridge the domain gap.
To avoid negative transfer brought by noisy source data, we propose a novel uncertainty-aware weighting strategy.
arXiv Detail & Related papers (2021-07-30T06:33:05Z) - Multi-Source domain adaptation via supervised contrastive learning and
confident consistency regularization [0.0]
Multi-Source Unsupervised Domain Adaptation (multi-source UDA) aims to learn a model from several labeled source domains.
We propose Contrastive Multi-Source Domain Adaptation (CMSDA) for multi-source UDA that addresses this limitation.
arXiv Detail & Related papers (2021-06-30T14:39:15Z) - Adaptively-Accumulated Knowledge Transfer for Partial Domain Adaptation [66.74638960925854]
Partial domain adaptation (PDA) deals with a realistic and challenging problem when the source domain label space substitutes the target domain.
We propose an Adaptively-Accumulated Knowledge Transfer framework (A$2$KT) to align the relevant categories across two domains.
arXiv Detail & Related papers (2020-08-27T00:53:43Z) - Learning to Combine: Knowledge Aggregation for Multi-Source Domain
Adaptation [56.694330303488435]
We propose a Learning to Combine for Multi-Source Domain Adaptation (LtC-MSDA) framework.
In the nutshell, a knowledge graph is constructed on the prototypes of various domains to realize the information propagation among semantically adjacent representations.
Our approach outperforms existing methods with a remarkable margin.
arXiv Detail & Related papers (2020-07-17T07:52:44Z) - Mutual Learning Network for Multi-Source Domain Adaptation [73.25974539191553]
We propose a novel multi-source domain adaptation method, Mutual Learning Network for Multiple Source Domain Adaptation (ML-MSDA)
Under the framework of mutual learning, the proposed method pairs the target domain with each single source domain to train a conditional adversarial domain adaptation network as a branch network.
The proposed method outperforms the comparison methods and achieves the state-of-the-art performance.
arXiv Detail & Related papers (2020-03-29T04:31:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.