Visual-Language Prompt Tuning with Knowledge-guided Context Optimization
- URL: http://arxiv.org/abs/2303.13283v1
- Date: Thu, 23 Mar 2023 14:04:23 GMT
- Title: Visual-Language Prompt Tuning with Knowledge-guided Context Optimization
- Authors: Hantao Yao, Rui Zhang, Changsheng Xu
- Abstract summary: Representative CoOp-based work combines the learnable textual tokens with the class tokens to obtain specific textual knowledge.
We introduce a novel Knowledge-guided Context Optimization (KgCoOp) to enhance the generalization ability of the learnable prompt for unseen classes.
- Score: 96.27531485377871
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Prompt tuning is an effective way to adapt the pre-trained visual-language
model (VLM) to the downstream task using task-related textual tokens.
Representative CoOp-based work combines the learnable textual tokens with the
class tokens to obtain specific textual knowledge. However, the specific
textual knowledge is the worse generalization to the unseen classes because it
forgets the essential general textual knowledge having a strong generalization
ability. To tackle this issue, we introduce a novel Knowledge-guided Context
Optimization (KgCoOp) to enhance the generalization ability of the learnable
prompt for unseen classes. The key insight of KgCoOp is that forgetting about
essential knowledge can be alleviated by reducing the discrepancy between the
learnable prompt and the hand-crafted prompt. Especially, KgCoOp minimizes the
discrepancy between the textual embeddings generated by learned prompts and the
hand-crafted prompts. Finally, adding the KgCoOp upon the contrastive loss can
make a discriminative prompt for both seen and unseen tasks. Extensive
evaluation of several benchmarks demonstrates that the proposed
Knowledge-guided Context Optimization is an efficient method for prompt tuning,
\emph{i.e.,} achieves better performance with less training time.
Related papers
- IPO: Interpretable Prompt Optimization for Vision-Language Models [40.83071220530289]
This paper introduces a simple but interpretable prompt (IPO)
IPO utilizes large language models (LLMs) to generate textual prompts dynamically.
We incorporate a large multimodal model (LMM) to condition on visual content by generating image descriptions.
arXiv Detail & Related papers (2024-10-20T14:10:22Z) - Generalizable Prompt Tuning for Vision-Language Models [3.1008306011364644]
Learnable soft prompts often perform well in downstream tasks but lack generalizability.
The study shows that by treating soft and hand-crafted prompts as dual views of the textual modality, we can better ensemble task-specific and general semantic information.
To generate more expressive prompts, the study introduces a class-wise augmentation from the visual modality, resulting in significant robustness to a wider range of unseen classes.
arXiv Detail & Related papers (2024-10-04T07:02:13Z) - Revisiting Prompt Pretraining of Vision-Language Models [13.888505919946578]
We propose a general framework termed Revisiting Prompt Pretraining (RPP)
RPP targets at improving the fitting and generalization ability from two aspects: prompt structure and prompt supervision.
We additionally utilize soft labels derived from zero-shot probability predictions provided by a pretrained Contrastive Language Image Pretraining (CLIP) teacher model.
arXiv Detail & Related papers (2024-09-10T02:36:13Z) - IntCoOp: Interpretability-Aware Vision-Language Prompt Tuning [94.52149969720712]
IntCoOp learns to jointly align attribute-level inductive biases and class embeddings during prompt-tuning.
IntCoOp improves CoOp by 7.35% in average performance across 10 diverse datasets.
arXiv Detail & Related papers (2024-06-19T16:37:31Z) - Instructing Prompt-to-Prompt Generation for Zero-Shot Learning [116.33775552866476]
We propose a textbfPrompt-to-textbfPrompt generation methodology (textbfP2P) to distill instructive visual prompts for transferable knowledge discovery.
The core of P2P is to mine semantic-related instruction from prompt-conditioned visual features and text instruction on modal-sharing semantic concepts.
arXiv Detail & Related papers (2024-06-05T07:59:48Z) - AAPL: Adding Attributes to Prompt Learning for Vision-Language Models [6.32186874112557]
We propose adversarial token embedding to disentangle low-level visual augmentation features from high-level class information when inducing bias in learnable prompts.
We have conducted experiments across 11 datasets, and overall, AAPL shows favorable performances compared to the existing methods in few-shot learning, zero-shot learning, cross-dataset, and domain generalization tasks.
arXiv Detail & Related papers (2024-04-25T17:51:10Z) - TCP:Textual-based Class-aware Prompt tuning for Visual-Language Model [78.77544632773404]
We present a Textual-based Class-aware Prompt tuning( TCP) that explicitly incorporates prior knowledge about classes to enhance their discriminability.
TCP consistently achieves superior performance while demanding less training time.
arXiv Detail & Related papers (2023-11-30T03:59:23Z) - PRE: Vision-Language Prompt Learning with Reparameterization Encoder [24.855142164168605]
Large pre-trained vision-language models such as CLIP have demonstrated great potential in zero-shot transferability to downstream tasks.
To attain optimal performance, the manual selection of prompts is necessary to improve alignment between the downstream image distribution and the textual class descriptions.
To avoid non-trivial prompt engineering, recent work Context Optimization (CoOp) introduced the concept of prompt learning to the vision domain using learnable textual tokens.
arXiv Detail & Related papers (2023-09-14T14:48:01Z) - InfoPrompt: Information-Theoretic Soft Prompt Tuning for Natural
Language Understanding [51.48361798508375]
We develop an information-theoretic framework that formulates soft prompt tuning as maximizing mutual information between prompts and other model parameters.
We show that InfoPrompt can significantly accelerate the convergence of the prompt tuning and outperform traditional prompt tuning methods.
arXiv Detail & Related papers (2023-06-08T04:31:48Z) - KnowPrompt: Knowledge-aware Prompt-tuning with Synergistic Optimization
for Relation Extraction [111.74812895391672]
We propose a Knowledge-aware Prompt-tuning approach with synergistic optimization (KnowPrompt)
We inject latent knowledge contained in relation labels into prompt construction with learnable virtual type words and answer words.
arXiv Detail & Related papers (2021-04-15T17:57:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.