TCP:Textual-based Class-aware Prompt tuning for Visual-Language Model
- URL: http://arxiv.org/abs/2311.18231v2
- Date: Wed, 13 Mar 2024 01:42:51 GMT
- Title: TCP:Textual-based Class-aware Prompt tuning for Visual-Language Model
- Authors: Hantao Yao, Rui Zhang, Changsheng Xu
- Abstract summary: We present a Textual-based Class-aware Prompt tuning( TCP) that explicitly incorporates prior knowledge about classes to enhance their discriminability.
TCP consistently achieves superior performance while demanding less training time.
- Score: 78.77544632773404
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Prompt tuning represents a valuable technique for adapting pre-trained
visual-language models (VLM) to various downstream tasks. Recent advancements
in CoOp-based methods propose a set of learnable domain-shared or
image-conditional textual tokens to facilitate the generation of task-specific
textual classifiers. However, those textual tokens have a limited
generalization ability regarding unseen domains, as they cannot dynamically
adjust to the distribution of testing classes. To tackle this issue, we present
a novel Textual-based Class-aware Prompt tuning(TCP) that explicitly
incorporates prior knowledge about classes to enhance their discriminability.
The critical concept of TCP involves leveraging Textual Knowledge Embedding
(TKE) to map the high generalizability of class-level textual knowledge into
class-aware textual tokens. By seamlessly integrating these class-aware prompts
into the Text Encoder, a dynamic class-aware classifier is generated to enhance
discriminability for unseen domains. During inference, TKE dynamically
generates class-aware prompts related to the unseen classes. Comprehensive
evaluations demonstrate that TKE serves as a plug-and-play module effortlessly
combinable with existing methods. Furthermore, TCP consistently achieves
superior performance while demanding less training time.
Code:https://github.com/htyao89/Textual-based_Class-aware_prompt_tuning/
Related papers
- Prompt-and-Transfer: Dynamic Class-aware Enhancement for Few-shot Segmentation [15.159690685421586]
This paper mimics the visual perception pattern of human beings and proposes a novel and powerful prompt-driven scheme, called Prompt and Transfer" (PAT)
PAT constructs a dynamic class-aware prompting paradigm to tune the encoder for focusing on the interested object (target class) in the current task.
Surprisingly, PAT achieves competitive performance on 4 different tasks including standard FSS, Cross-domain FSS, Weak-label, and Zero-shot-label.
arXiv Detail & Related papers (2024-09-16T15:24:26Z) - SEP: Self-Enhanced Prompt Tuning for Visual-Language Model [68.68025991850115]
We introduce a novel approach named Self-Enhanced Prompt Tuning (SEP)
SEP explicitly incorporates discriminative prior knowledge to enhance both textual-level and visual-level embeddings.
Comprehensive evaluations across various benchmarks and tasks confirm SEP's efficacy in prompt tuning.
arXiv Detail & Related papers (2024-05-24T13:35:56Z) - Can Better Text Semantics in Prompt Tuning Improve VLM Generalization? [28.041879000565874]
We introduce a prompt-tuning method that leverages class descriptions obtained from Large Language Models.
Our approach constructs part-level description-guided image and text features, which are subsequently aligned to learn more generalizable prompts.
Our comprehensive experiments conducted across 11 benchmark datasets show that our method outperforms established methods.
arXiv Detail & Related papers (2024-05-13T16:52:17Z) - Unlocking the Multi-modal Potential of CLIP for Generalized Category Discovery [50.564146730579424]
We propose a Text Embedding Synthesizer (TES) to generate pseudo text embeddings for unlabelled samples.
Our method unlocks the multi-modal potentials of CLIP and outperforms the baseline methods by a large margin on all GCD benchmarks.
arXiv Detail & Related papers (2024-03-15T02:40:13Z) - Text-driven Prompt Generation for Vision-Language Models in Federated
Learning [24.005620820818756]
Our work proposes Federated Text-driven Prompt Generation (FedTPG)
FedTPG learns a unified prompt generation network across multiple remote clients in a scalable manner.
Our comprehensive empirical evaluations on nine diverse image classification datasets show that our method is superior to existing federated prompt learning methods.
arXiv Detail & Related papers (2023-10-09T19:57:24Z) - Visual-Language Prompt Tuning with Knowledge-guided Context Optimization [96.27531485377871]
Representative CoOp-based work combines the learnable textual tokens with the class tokens to obtain specific textual knowledge.
We introduce a novel Knowledge-guided Context Optimization (KgCoOp) to enhance the generalization ability of the learnable prompt for unseen classes.
arXiv Detail & Related papers (2023-03-23T14:04:23Z) - LASP: Text-to-Text Optimization for Language-Aware Soft Prompting of
Vision & Language Models [67.19124099815645]
We propose a novel Language-Aware Soft Prompting (LASP) learning method to alleviate base class overfitting.
LASP is inherently amenable to including, during training, virtual classes, i.e. class names for which no visual samples are available.
LASP matches and surpasses, for the first time, the accuracy on novel classes obtained by hand-crafted prompts and CLIP for 8 out of 11 test datasets.
arXiv Detail & Related papers (2022-10-03T17:56:35Z) - PTR: Prompt Tuning with Rules for Text Classification [64.1655047016891]
Fine-tuned pre-trained language models (PLMs) have achieved awesome performance on almost all NLP tasks.
We propose prompt tuning with rules (PTR) for many-class text classification.
PTR is able to encode prior knowledge of each class into prompt tuning.
arXiv Detail & Related papers (2021-05-24T13:24:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.