Learning Domain Invariant Prompt for Vision-Language Models
- URL: http://arxiv.org/abs/2212.04196v2
- Date: Fri, 31 Mar 2023 04:12:25 GMT
- Title: Learning Domain Invariant Prompt for Vision-Language Models
- Authors: Cairong Zhao, Yubin Wang, Xinyang Jiang, Yifei Shen, Kaitao Song,
Dongsheng Li, and Duoqian Miao
- Abstract summary: We propose a novel prompt learning paradigm that directly generates emphdomain invariant prompt that can be generalized to unseen domains, called MetaPrompt.
Our method consistently and significantly outperforms existing methods.
- Score: 31.581652862478965
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Prompt learning is one of the most effective and trending ways to adapt
powerful vision-language foundation models like CLIP to downstream datasets by
tuning learnable prompt vectors with very few samples. However, although prompt
learning achieves excellent performance over in-domain data, it still faces the
major challenge of generalizing to unseen classes and domains. Some existing
prompt learning methods tackle this issue by adaptively generating different
prompts for different tokens or domains but neglecting the ability of learned
prompts to generalize to unseen domains. In this paper, we propose a novel
prompt learning paradigm that directly generates \emph{domain invariant} prompt
that can be generalized to unseen domains, called MetaPrompt. Specifically, a
dual-modality prompt tuning network is proposed to generate prompts for input
from both image and text modalities. With a novel asymmetric contrastive loss,
the representation from the original pre-trained vision-language model acts as
supervision to enhance the generalization ability of the learned prompt. More
importantly, we propose a meta-learning-based prompt tuning algorithm that
explicitly constrains the task-specific prompt tuned for one domain or class to
also achieve good performance in another domain or class. Extensive experiments
on 11 datasets for base-to-new generalization and 4 datasets for domain
generalization demonstrate that our method consistently and significantly
outperforms existing methods.
Related papers
- MuAP: Multi-step Adaptive Prompt Learning for Vision-Language Model with Missing Modality [11.03329286331929]
We present the first comprehensive investigation into prompt learning behavior when modalities are incomplete.
We propose a novel Multi-step Adaptive Prompt Learning framework, aiming to generate multimodal prompts and perform multi-step prompt tuning.
arXiv Detail & Related papers (2024-09-07T03:33:46Z) - StylePrompter: Enhancing Domain Generalization with Test-Time Style Priors [39.695604434738186]
In real-world applications, the sample distribution at the inference stage often differs from the one at the training stage.
This paper introduces the style prompt in the language modality to adapt the trained model dynamically.
In particular, we train a style prompter to extract style information of the current image into an embedding in the token embedding space.
Our open space partition of the style token embedding space and the hand-crafted style regularization enable the trained style prompter to handle data from unknown domains effectively.
arXiv Detail & Related papers (2024-08-17T08:35:43Z) - CFPL-FAS: Class Free Prompt Learning for Generalizable Face Anti-spoofing [66.6712018832575]
Domain generalization (DG) based Face Anti-Spoofing (FAS) aims to improve the model's performance on unseen domains.
We make use of large-scale VLMs like CLIP and leverage the textual feature to dynamically adjust the classifier's weights for exploring generalizable visual features.
arXiv Detail & Related papers (2024-03-21T11:58:50Z) - Domain-Controlled Prompt Learning [49.45309818782329]
Existing prompt learning methods often lack domain-awareness or domain-transfer mechanisms.
We propose a textbfDomain-Controlled Prompt Learning for the specific domains.
Our method achieves state-of-the-art performance in specific domain image recognition datasets.
arXiv Detail & Related papers (2023-09-30T02:59:49Z) - Prompting Diffusion Representations for Cross-Domain Semantic
Segmentation [101.04326113360342]
diffusion-pretraining achieves extraordinary domain generalization results for semantic segmentation.
We introduce a scene prompt and a prompt randomization strategy to help further disentangle the domain-invariant information when training the segmentation head.
arXiv Detail & Related papers (2023-07-05T09:28:25Z) - Gradient-Regulated Meta-Prompt Learning for Generalizable
Vision-Language Models [137.74524357614285]
We introduce a novel Gradient-RegulAted Meta-prompt learning framework.
It helps pre-training models adapt to downstream tasks in a parameter -- and data -- efficient way.
GRAM can be easily incorporated into various prompt tuning methods in a model-agnostic way.
arXiv Detail & Related papers (2023-03-12T05:03:37Z) - SwitchPrompt: Learning Domain-Specific Gated Soft Prompts for
Classification in Low-Resource Domains [14.096170976149521]
SwitchPrompt is a novel and lightweight prompting methodology for adaptation of language models trained on datasets from the general domain to diverse low-resource domains.
Our few-shot experiments on three text classification benchmarks demonstrate the efficacy of the general-domain pre-trained language models when used with SwitchPrompt.
They often even outperform their domain-specific counterparts trained with baseline state-of-the-art prompting methods by up to 10.7% performance increase in accuracy.
arXiv Detail & Related papers (2023-02-14T07:14:08Z) - Unified Vision and Language Prompt Learning [86.1530128487077]
We present a systematic study on two representative prompt tuning methods, namely text prompt tuning and visual prompt tuning.
A major finding is that text prompt tuning fails on data with high intra-class visual variances while visual prompt tuning cannot handle low inter-class variances.
To combine the best from both worlds, we propose a simple approach called Unified Prompt Tuning (UPT), which essentially learns a tiny neural network to jointly optimize prompts across different modalities.
arXiv Detail & Related papers (2022-10-13T17:50:24Z) - LASP: Text-to-Text Optimization for Language-Aware Soft Prompting of
Vision & Language Models [67.19124099815645]
We propose a novel Language-Aware Soft Prompting (LASP) learning method to alleviate base class overfitting.
LASP is inherently amenable to including, during training, virtual classes, i.e. class names for which no visual samples are available.
LASP matches and surpasses, for the first time, the accuracy on novel classes obtained by hand-crafted prompts and CLIP for 8 out of 11 test datasets.
arXiv Detail & Related papers (2022-10-03T17:56:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.