Visual-Language Prompt Tuning with Knowledge-guided Context Optimization
- URL: http://arxiv.org/abs/2303.13283v1
- Date: Thu, 23 Mar 2023 14:04:23 GMT
- Title: Visual-Language Prompt Tuning with Knowledge-guided Context Optimization
- Authors: Hantao Yao, Rui Zhang, Changsheng Xu
- Abstract summary: Representative CoOp-based work combines the learnable textual tokens with the class tokens to obtain specific textual knowledge.
We introduce a novel Knowledge-guided Context Optimization (KgCoOp) to enhance the generalization ability of the learnable prompt for unseen classes.
- Score: 96.27531485377871
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Prompt tuning is an effective way to adapt the pre-trained visual-language
model (VLM) to the downstream task using task-related textual tokens.
Representative CoOp-based work combines the learnable textual tokens with the
class tokens to obtain specific textual knowledge. However, the specific
textual knowledge is the worse generalization to the unseen classes because it
forgets the essential general textual knowledge having a strong generalization
ability. To tackle this issue, we introduce a novel Knowledge-guided Context
Optimization (KgCoOp) to enhance the generalization ability of the learnable
prompt for unseen classes. The key insight of KgCoOp is that forgetting about
essential knowledge can be alleviated by reducing the discrepancy between the
learnable prompt and the hand-crafted prompt. Especially, KgCoOp minimizes the
discrepancy between the textual embeddings generated by learned prompts and the
hand-crafted prompts. Finally, adding the KgCoOp upon the contrastive loss can
make a discriminative prompt for both seen and unseen tasks. Extensive
evaluation of several benchmarks demonstrates that the proposed
Knowledge-guided Context Optimization is an efficient method for prompt tuning,
\emph{i.e.,} achieves better performance with less training time.
Related papers
- IntCoOp: Interpretability-Aware Vision-Language Prompt Tuning [94.52149969720712]
IntCoOp learns to jointly align attribute-level inductive biases and class embeddings during prompt-tuning.
IntCoOp improves CoOp by 7.35% in average performance across 10 diverse datasets.
arXiv Detail & Related papers (2024-06-19T16:37:31Z) - Instructing Prompt-to-Prompt Generation for Zero-Shot Learning [116.33775552866476]
We propose a textbfPrompt-to-textbfPrompt generation methodology (textbfP2P) to distill instructive visual prompts for transferable knowledge discovery.
The core of P2P is to mine semantic-related instruction from prompt-conditioned visual features and text instruction on modal-sharing semantic concepts.
arXiv Detail & Related papers (2024-06-05T07:59:48Z) - AAPL: Adding Attributes to Prompt Learning for Vision-Language Models [6.32186874112557]
We propose adversarial token embedding to disentangle low-level visual augmentation features from high-level class information when inducing bias in learnable prompts.
We have conducted experiments across 11 datasets, and overall, AAPL shows favorable performances compared to the existing methods in few-shot learning, zero-shot learning, cross-dataset, and domain generalization tasks.
arXiv Detail & Related papers (2024-04-25T17:51:10Z) - COMMA: Co-Articulated Multi-Modal Learning [39.778958624066185]
We propose Co-Articulated Multi-Modal Learning (COMMA) to handle the limitations of previous methods.
Our method considers prompts from both branches to generate the prompts to enhance the representation alignment of both branches.
We evaluate our method across three representative tasks of generalization to novel classes, new target datasets and unseen domain shifts.
arXiv Detail & Related papers (2023-12-30T15:47:36Z) - TCP:Textual-based Class-aware Prompt tuning for Visual-Language Model [78.77544632773404]
We present a Textual-based Class-aware Prompt tuning( TCP) that explicitly incorporates prior knowledge about classes to enhance their discriminability.
TCP consistently achieves superior performance while demanding less training time.
arXiv Detail & Related papers (2023-11-30T03:59:23Z) - PRE: Vision-Language Prompt Learning with Reparameterization Encoder [26.017809323969285]
Large pre-trained vision-language models such as CLIP have demonstrated great potential in zero-shot transferability to downstream tasks.
To attain optimal performance, the manual selection of prompts is necessary to improve alignment between the downstream image distribution and the textual class descriptions.
To avoid non-trivial prompt engineering, recent work Context Optimization (CoOp) introduced the concept of prompt learning to the vision domain using learnable textual tokens.
arXiv Detail & Related papers (2023-09-14T14:48:01Z) - InfoPrompt: Information-Theoretic Soft Prompt Tuning for Natural
Language Understanding [51.48361798508375]
We develop an information-theoretic framework that formulates soft prompt tuning as maximizing mutual information between prompts and other model parameters.
We show that InfoPrompt can significantly accelerate the convergence of the prompt tuning and outperform traditional prompt tuning methods.
arXiv Detail & Related papers (2023-06-08T04:31:48Z) - KnowPrompt: Knowledge-aware Prompt-tuning with Synergistic Optimization
for Relation Extraction [111.74812895391672]
We propose a Knowledge-aware Prompt-tuning approach with synergistic optimization (KnowPrompt)
We inject latent knowledge contained in relation labels into prompt construction with learnable virtual type words and answer words.
arXiv Detail & Related papers (2021-04-15T17:57:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.