P{\O}DA: Prompt-driven Zero-shot Domain Adaptation
- URL: http://arxiv.org/abs/2212.03241v3
- Date: Sat, 19 Aug 2023 10:31:32 GMT
- Title: P{\O}DA: Prompt-driven Zero-shot Domain Adaptation
- Authors: Mohammad Fahes, Tuan-Hung Vu, Andrei Bursuc, Patrick P\'erez, Raoul de
Charette
- Abstract summary: We adapt a model trained on a source domain using only a general description in natural language of the target domain, i.e., a prompt.
We show that these prompt-driven augmentations can be used to perform zero-shot domain adaptation for semantic segmentation.
- Score: 27.524962843495366
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Domain adaptation has been vastly investigated in computer vision but still
requires access to target images at train time, which might be intractable in
some uncommon conditions. In this paper, we propose the task of `Prompt-driven
Zero-shot Domain Adaptation', where we adapt a model trained on a source domain
using only a general description in natural language of the target domain,
i.e., a prompt. First, we leverage a pretrained contrastive vision-language
model (CLIP) to optimize affine transformations of source features, steering
them towards the target text embedding while preserving their content and
semantics. To achieve this, we propose Prompt-driven Instance Normalization
(PIN). Second, we show that these prompt-driven augmentations can be used to
perform zero-shot domain adaptation for semantic segmentation. Experiments
demonstrate that our method significantly outperforms CLIP-based style transfer
baselines on several datasets for the downstream task at hand, even surpassing
one-shot unsupervised domain adaptation. A similar boost is observed on object
detection and image classification. The code is available at
https://github.com/astra-vision/PODA .
Related papers
- In the Era of Prompt Learning with Vision-Language Models [1.060608983034705]
We introduce textscStyLIP, a novel domain-agnostic prompt learning strategy for Domain Generalization (DG)
StyLIP disentangles visual style and content in CLIPs vision encoder by using style projectors to learn domain-specific prompt tokens.
We also propose AD-CLIP for unsupervised domain adaptation (DA), leveraging CLIPs frozen vision backbone.
arXiv Detail & Related papers (2024-11-07T17:31:21Z) - Domain Adaptation with a Single Vision-Language Embedding [45.93202559299953]
We present a new framework for domain adaptation relying on a single Vision-Language (VL) latent embedding instead of full target data.
We show that these mined styles can be used for zero-shot (i.e., target-free) and one-shot unsupervised domain adaptation.
arXiv Detail & Related papers (2024-10-28T17:59:53Z) - Phrase Grounding-based Style Transfer for Single-Domain Generalized
Object Detection [109.58348694132091]
Single-domain generalized object detection aims to enhance a model's generalizability to multiple unseen target domains.
This is a practical yet challenging task as it requires the model to address domain shift without incorporating target domain data into training.
We propose a novel phrase grounding-based style transfer approach for the task.
arXiv Detail & Related papers (2024-02-02T10:48:43Z) - Domain-Controlled Prompt Learning [49.45309818782329]
Existing prompt learning methods often lack domain-awareness or domain-transfer mechanisms.
We propose a textbfDomain-Controlled Prompt Learning for the specific domains.
Our method achieves state-of-the-art performance in specific domain image recognition datasets.
arXiv Detail & Related papers (2023-09-30T02:59:49Z) - Learning Domain Invariant Prompt for Vision-Language Models [31.581652862478965]
We propose a novel prompt learning paradigm that directly generates emphdomain invariant prompt that can be generalized to unseen domains, called MetaPrompt.
Our method consistently and significantly outperforms existing methods.
arXiv Detail & Related papers (2022-12-08T11:23:24Z) - Location-Aware Self-Supervised Transformers [74.76585889813207]
We propose to pretrain networks for semantic segmentation by predicting the relative location of image parts.
We control the difficulty of the task by masking a subset of the reference patch features visible to those of the query.
Our experiments show that this location-aware pretraining leads to representations that transfer competitively to several challenging semantic segmentation benchmarks.
arXiv Detail & Related papers (2022-12-05T16:24:29Z) - Domain Adaptation via Prompt Learning [39.97105851723885]
Unsupervised domain adaption (UDA) aims to adapt models learned from a well-annotated source domain to a target domain.
We introduce a novel prompt learning paradigm for UDA, named Domain Adaptation via Prompt Learning (DAPL)
arXiv Detail & Related papers (2022-02-14T13:25:46Z) - Surprisingly Simple Semi-Supervised Domain Adaptation with Pretraining
and Consistency [93.89773386634717]
Visual domain adaptation involves learning to classify images from a target visual domain using labels available in a different source domain.
We show that in the presence of a few target labels, simple techniques like self-supervision (via rotation prediction) and consistency regularization can be effective without any adversarial alignment to learn a good target classifier.
Our Pretraining and Consistency (PAC) approach, can achieve state of the art accuracy on this semi-supervised domain adaptation task, surpassing multiple adversarial domain alignment methods, across multiple datasets.
arXiv Detail & Related papers (2021-01-29T18:40:17Z) - Pixel-Level Cycle Association: A New Perspective for Domain Adaptive
Semantic Segmentation [169.82760468633236]
We propose to build the pixel-level cycle association between source and target pixel pairs.
Our method can be trained end-to-end in one stage and introduces no additional parameters.
arXiv Detail & Related papers (2020-10-31T00:11:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.