Related papers: Visual-Language Prompt Tuning with Knowledge-guided Context Optimization

Visual-Language Prompt Tuning with Knowledge-guided Context Optimization

URL: http://arxiv.org/abs/2303.13283v1
Date: Thu, 23 Mar 2023 14:04:23 GMT
Title: Visual-Language Prompt Tuning with Knowledge-guided Context Optimization
Authors: Hantao Yao, Rui Zhang, Changsheng Xu
Abstract summary: Representative CoOp-based work combines the learnable textual tokens with the class tokens to obtain specific textual knowledge. We introduce a novel Knowledge-guided Context Optimization (KgCoOp) to enhance the generalization ability of the learnable prompt for unseen classes.
Score: 96.27531485377871
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Prompt tuning is an effective way to adapt the pre-trained visual-language model (VLM) to the downstream task using task-related textual tokens. Representative CoOp-based work combines the learnable textual tokens with the class tokens to obtain specific textual knowledge. However, the specific textual knowledge is the worse generalization to the unseen classes because it forgets the essential general textual knowledge having a strong generalization ability. To tackle this issue, we introduce a novel Knowledge-guided Context Optimization (KgCoOp) to enhance the generalization ability of the learnable prompt for unseen classes. The key insight of KgCoOp is that forgetting about essential knowledge can be alleviated by reducing the discrepancy between the learnable prompt and the hand-crafted prompt. Especially, KgCoOp minimizes the discrepancy between the textual embeddings generated by learned prompts and the hand-crafted prompts. Finally, adding the KgCoOp upon the contrastive loss can make a discriminative prompt for both seen and unseen tasks. Extensive evaluation of several benchmarks demonstrates that the proposed Knowledge-guided Context Optimization is an efficient method for prompt tuning, \emph{i.e.,} achieves better performance with less training time.

Related papers

Embracing Collaboration Over Competition: Condensing Multiple Prompts for Visual In-Context Learning [82.76635679786732]
Visual In-Context Learning (VICL) enables adaptively solving vision tasks by leveraging pixel demonstrations, mimicking human-like task completion through analogy. Current methods assume the existence of a single "ideal" prompt in a pool of candidates, which in practice may not hold true. We propose a new perspective: prompt condensation. Rather than relying on a single prompt, candidate prompts collaborate to efficiently integrate informative contexts without sacrificing resolution.
arXiv Detail & Related papers (2025-04-30T02:43:03Z)
IPO: Interpretable Prompt Optimization for Vision-Language Models [40.83071220530289]
This paper introduces a simple but interpretable prompt (IPO) IPO utilizes large language models (LLMs) to generate textual prompts dynamically. We incorporate a large multimodal model (LMM) to condition on visual content by generating image descriptions.
arXiv Detail & Related papers (2024-10-20T14:10:22Z)
Generalizable Prompt Tuning for Vision-Language Models [3.1008306011364644]
Learnable soft prompts often perform well in downstream tasks but lack generalizability. The study shows that by treating soft and hand-crafted prompts as dual views of the textual modality, we can better ensemble task-specific and general semantic information. To generate more expressive prompts, the study introduces a class-wise augmentation from the visual modality, resulting in significant robustness to a wider range of unseen classes.
arXiv Detail & Related papers (2024-10-04T07:02:13Z)
Revisiting Prompt Pretraining of Vision-Language Models [13.888505919946578]
We propose a general framework termed Revisiting Prompt Pretraining (RPP) RPP targets at improving the fitting and generalization ability from two aspects: prompt structure and prompt supervision. We additionally utilize soft labels derived from zero-shot probability predictions provided by a pretrained Contrastive Language Image Pretraining (CLIP) teacher model.
arXiv Detail & Related papers (2024-09-10T02:36:13Z)
IntCoOp: Interpretability-Aware Vision-Language Prompt Tuning [94.52149969720712]
IntCoOp learns to jointly align attribute-level inductive biases and class embeddings during prompt-tuning. IntCoOp improves CoOp by 7.35% in average performance across 10 diverse datasets.
arXiv Detail & Related papers (2024-06-19T16:37:31Z)
Instructing Prompt-to-Prompt Generation for Zero-Shot Learning [116.33775552866476]
We propose a textbfPrompt-to-textbfPrompt generation methodology (textbfP2P) to distill instructive visual prompts for transferable knowledge discovery. The core of P2P is to mine semantic-related instruction from prompt-conditioned visual features and text instruction on modal-sharing semantic concepts.
arXiv Detail & Related papers (2024-06-05T07:59:48Z)
AAPL: Adding Attributes to Prompt Learning for Vision-Language Models [6.32186874112557]
We propose adversarial token embedding to disentangle low-level visual augmentation features from high-level class information when inducing bias in learnable prompts. We have conducted experiments across 11 datasets, and overall, AAPL shows favorable performances compared to the existing methods in few-shot learning, zero-shot learning, cross-dataset, and domain generalization tasks.
arXiv Detail & Related papers (2024-04-25T17:51:10Z)
TCP:Textual-based Class-aware Prompt tuning for Visual-Language Model [78.77544632773404]
We present a Textual-based Class-aware Prompt tuning( TCP) that explicitly incorporates prior knowledge about classes to enhance their discriminability. TCP consistently achieves superior performance while demanding less training time.
arXiv Detail & Related papers (2023-11-30T03:59:23Z)
PRE: Vision-Language Prompt Learning with Reparameterization Encoder [24.855142164168605]
Large pre-trained vision-language models such as CLIP have demonstrated great potential in zero-shot transferability to downstream tasks. To attain optimal performance, the manual selection of prompts is necessary to improve alignment between the downstream image distribution and the textual class descriptions. To avoid non-trivial prompt engineering, recent work Context Optimization (CoOp) introduced the concept of prompt learning to the vision domain using learnable textual tokens.
arXiv Detail & Related papers (2023-09-14T14:48:01Z)
InfoPrompt: Information-Theoretic Soft Prompt Tuning for Natural Language Understanding [51.48361798508375]
We develop an information-theoretic framework that formulates soft prompt tuning as maximizing mutual information between prompts and other model parameters. We show that InfoPrompt can significantly accelerate the convergence of the prompt tuning and outperform traditional prompt tuning methods.
arXiv Detail & Related papers (2023-06-08T04:31:48Z)
KnowPrompt: Knowledge-aware Prompt-tuning with Synergistic Optimization for Relation Extraction [111.74812895391672]
We propose a Knowledge-aware Prompt-tuning approach with synergistic optimization (KnowPrompt) We inject latent knowledge contained in relation labels into prompt construction with learnable virtual type words and answer words.
arXiv Detail & Related papers (2021-04-15T17:57:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.