Exploring Sparse Visual Prompt for Domain Adaptive Dense Prediction
- URL: http://arxiv.org/abs/2303.09792v3
- Date: Mon, 15 Apr 2024 04:58:07 GMT
- Title: Exploring Sparse Visual Prompt for Domain Adaptive Dense Prediction
- Authors: Senqiao Yang, Jiarui Wu, Jiaming Liu, Xiaoqi Li, Qizhe Zhang, Mingjie Pan, Yulu Gan, Zehui Chen, Shanghang Zhang,
- Abstract summary: We propose a novel Sparse Visual Domain Prompts (SVDP) approach, which holds minimal trainable parameters in the image-level prompt and reserves more spatial information of the input.
Our proposed method achieves state-of-the-art performance in both semantic segmentation and depth estimation tasks.
- Score: 22.868597464136787
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The visual prompts have provided an efficient manner in addressing visual cross-domain problems. In previous works, Visual Domain Prompt (VDP) first introduces domain prompts to tackle the classification Test-Time Adaptation (TTA) problem by warping image-level prompts on the input and fine-tuning prompts for each target domain. However, since the image-level prompts mask out continuous spatial details in the prompt-allocated region, it will suffer from inaccurate contextual information and limited domain knowledge extraction, particularly when dealing with dense prediction TTA problems. To overcome these challenges, we propose a novel Sparse Visual Domain Prompts (SVDP) approach, which holds minimal trainable parameters (e.g., 0.1\%) in the image-level prompt and reserves more spatial information of the input. To better apply SVDP in extracting domain-specific knowledge, we introduce the Domain Prompt Placement (DPP) method to adaptively allocates trainable parameters of SVDP on the pixels with large distribution shifts. Furthermore, recognizing that each target domain sample exhibits a unique domain shift, we design Domain Prompt Updating (DPU) strategy to optimize prompt parameters differently for each sample, facilitating efficient adaptation to the target domain. Extensive experiments were conducted on widely-used TTA and continual TTA benchmarks, and our proposed method achieves state-of-the-art performance in both semantic segmentation and depth estimation tasks.
Related papers
- OT-VP: Optimal Transport-guided Visual Prompting for Test-Time Adaptation [8.425690424016986]
Vision Transformers (ViTs) have demonstrated remarkable capabilities in learning representations, but their performance is compromised when applied to unseen domains.
Our approach, Optimal Transport-guided Test-Time Visual Prompting (OT-VP), handles these problems by leveraging prompt learning at test time to align the target and source domains.
OT-VP, with only four learned prompt tokens, exceeds state-of-the-art performance across three stylistic datasets.
arXiv Detail & Related papers (2024-06-12T18:30:03Z) - Domain-Controlled Prompt Learning [49.45309818782329]
Existing prompt learning methods often lack domain-awareness or domain-transfer mechanisms.
We propose a textbfDomain-Controlled Prompt Learning for the specific domains.
Our method achieves state-of-the-art performance in specific domain image recognition datasets.
arXiv Detail & Related papers (2023-09-30T02:59:49Z) - Prompting Diffusion Representations for Cross-Domain Semantic
Segmentation [101.04326113360342]
diffusion-pretraining achieves extraordinary domain generalization results for semantic segmentation.
We introduce a scene prompt and a prompt randomization strategy to help further disentangle the domain-invariant information when training the segmentation head.
arXiv Detail & Related papers (2023-07-05T09:28:25Z) - PM-DETR: Domain Adaptive Prompt Memory for Object Detection with
Transformers [25.812325027602252]
Transformer-based detectors (i.e., DETR) have demonstrated impressive performance on end-to-end object detection.
transferring DETR to different data distributions may lead to a significant performance degradation.
We propose a hierarchical Prompt Domain Memory (PDM) for adapting detection transformers to different distributions.
arXiv Detail & Related papers (2023-07-01T12:02:24Z) - Explicit Visual Prompting for Universal Foreground Segmentations [55.51869354956533]
We present a unified framework for a number of foreground segmentation tasks without any task-specific designs.
We take inspiration from the widely-used pre-training and then prompt tuning protocols in NLP.
Our method freezes a pre-trained model and then learns task-specific knowledge using a few extra parameters.
arXiv Detail & Related papers (2023-05-29T11:05:01Z) - Explicit Visual Prompting for Low-Level Structure Segmentations [55.51869354956533]
We propose a new visual prompting model, named Explicit Visual Prompting (EVP)
EVP significantly outperforms other parameter-efficient tuning protocols under the same amount of tunable parameters.
EVP also achieves state-of-the-art performances on diverse low-level structure segmentation tasks.
arXiv Detail & Related papers (2023-03-20T06:01:53Z) - Decorate the Newcomers: Visual Domain Prompt for Continual Test Time
Adaptation [14.473807945791132]
Continual Test-Time Adaptation (CTTA) aims to adapt the source model to continually changing unlabeled target domains without access to the source data.
Motivated by the prompt learning in NLP, in this paper we propose to learn an image-level visual domain prompt for target domains while having the source model parameters frozen.
arXiv Detail & Related papers (2022-12-08T08:56:02Z) - P{\O}DA: Prompt-driven Zero-shot Domain Adaptation [27.524962843495366]
We adapt a model trained on a source domain using only a general description in natural language of the target domain, i.e., a prompt.
We show that these prompt-driven augmentations can be used to perform zero-shot domain adaptation for semantic segmentation.
arXiv Detail & Related papers (2022-12-06T18:59:58Z) - Multi-Prompt Alignment for Multi-Source Unsupervised Domain Adaptation [86.02485817444216]
We introduce Multi-Prompt Alignment (MPA), a simple yet efficient framework for multi-source UDA.
MPA denoises the learned prompts through an auto-encoding process and aligns them by maximizing the agreement of all the reconstructed prompts.
Experiments show that MPA achieves state-of-the-art results on three popular datasets with an impressive average accuracy of 54.1% on DomainNet.
arXiv Detail & Related papers (2022-09-30T03:40:10Z) - AFAN: Augmented Feature Alignment Network for Cross-Domain Object
Detection [90.18752912204778]
Unsupervised domain adaptation for object detection is a challenging problem with many real-world applications.
We propose a novel augmented feature alignment network (AFAN) which integrates intermediate domain image generation and domain-adversarial training.
Our approach significantly outperforms the state-of-the-art methods on standard benchmarks for both similar and dissimilar domain adaptations.
arXiv Detail & Related papers (2021-06-10T05:01:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.