Related papers: TCP:Textual-based Class-aware Prompt tuning for Visual-Language Model

TCP:Textual-based Class-aware Prompt tuning for Visual-Language Model

URL: http://arxiv.org/abs/2311.18231v2
Date: Wed, 13 Mar 2024 01:42:51 GMT
Title: TCP:Textual-based Class-aware Prompt tuning for Visual-Language Model
Authors: Hantao Yao, Rui Zhang, Changsheng Xu
Abstract summary: We present a Textual-based Class-aware Prompt tuning( TCP) that explicitly incorporates prior knowledge about classes to enhance their discriminability. TCP consistently achieves superior performance while demanding less training time.
Score: 78.77544632773404
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Prompt tuning represents a valuable technique for adapting pre-trained visual-language models (VLM) to various downstream tasks. Recent advancements in CoOp-based methods propose a set of learnable domain-shared or image-conditional textual tokens to facilitate the generation of task-specific textual classifiers. However, those textual tokens have a limited generalization ability regarding unseen domains, as they cannot dynamically adjust to the distribution of testing classes. To tackle this issue, we present a novel Textual-based Class-aware Prompt tuning(TCP) that explicitly incorporates prior knowledge about classes to enhance their discriminability. The critical concept of TCP involves leveraging Textual Knowledge Embedding (TKE) to map the high generalizability of class-level textual knowledge into class-aware textual tokens. By seamlessly integrating these class-aware prompts into the Text Encoder, a dynamic class-aware classifier is generated to enhance discriminability for unseen domains. During inference, TKE dynamically generates class-aware prompts related to the unseen classes. Comprehensive evaluations demonstrate that TKE serves as a plug-and-play module effortlessly combinable with existing methods. Furthermore, TCP consistently achieves superior performance while demanding less training time. Code:https://github.com/htyao89/Textual-based_Class-aware_prompt_tuning/

Related papers

FedMVP: Federated Multi-modal Visual Prompt Tuning for Vision-Language Models [24.47897642582332]
Textual prompt tuning adapts Vision-Language Models (e.g., CLIP) in federated learning by tuning lightweight input tokens (or prompts) on local client data, while keeping network weights frozen. FedMVP conditions the prompts on comprehensive contextual information -- image-conditioned features and textual attribute features of a class -- that is multimodal in nature. The dynamically generated multimodal visual prompts are then input to the frozen vision encoder of CLIP, and trained with a combination of CLIP similarity loss and a consistency loss.
arXiv Detail & Related papers (2025-04-29T15:36:51Z)
SDVPT: Semantic-Driven Visual Prompt Tuning for Open-World Object Counting [70.49268117587562]
We propose a plug-and-play Semantic-Driven Visual Prompt Tuning framework (SDVPT) that transfers knowledge from the training set to unseen categories. During inference, we dynamically synthesize the visual prompts for unseen categories based on the semantic correlation between unseen and training categories.
arXiv Detail & Related papers (2025-04-24T09:31:08Z)
InPK: Infusing Prior Knowledge into Prompt for Vision-Language Models [24.170351966913557]
We propose the InPK model, which infuses class-specific prior knowledge into the learnable tokens. We also introduce a learnable text-to-vision projection layer to accommodate the text adjustments. In experiments, InPK significantly outperforms state-of-the-art methods in multiple zero/few-shot image classification tasks.
arXiv Detail & Related papers (2025-02-27T05:33:18Z)
ATPrompt: Textual Prompt Learning with Embedded Attributes [73.1352833091256]
We introduce an Attribute-embedded Textual Prompt learning method for vision-language models, named ATPrompt. We transform the text prompt from a category-centric form to an attribute-category hybrid form. As an easy-to-use plug-in technique, ATPrompt can seamlessly replace the existing prompt format.
arXiv Detail & Related papers (2024-12-12T16:57:20Z)
TextRefiner: Internal Visual Feature as Efficient Refiner for Vision-Language Models Prompt Tuning [16.881957688535557]
TextRefiner is a plug-and-play method to refine the text prompts of existing methods. It builds a novel local cache module to encapsulate fine-grained visual concepts. It achieves state-of-the-art performance and is efficient in inference.
arXiv Detail & Related papers (2024-12-11T08:07:12Z)
Prompt-and-Transfer: Dynamic Class-aware Enhancement for Few-shot Segmentation [15.159690685421586]
This paper mimics the visual perception pattern of human beings and proposes a novel and powerful prompt-driven scheme, called Prompt and Transfer" (PAT) PAT constructs a dynamic class-aware prompting paradigm to tune the encoder for focusing on the interested object (target class) in the current task. Surprisingly, PAT achieves competitive performance on 4 different tasks including standard FSS, Cross-domain FSS, Weak-label, and Zero-shot-label.
arXiv Detail & Related papers (2024-09-16T15:24:26Z)
SEP: Self-Enhanced Prompt Tuning for Visual-Language Model [93.94454894142413]
We introduce a novel approach named Self-Enhanced Prompt Tuning (SEP) SEP explicitly incorporates discriminative prior knowledge to enhance both textual-level and visual-level embeddings. Comprehensive evaluations across various benchmarks and tasks confirm SEP's efficacy in prompt tuning.
arXiv Detail & Related papers (2024-05-24T13:35:56Z)
Can Better Text Semantics in Prompt Tuning Improve VLM Generalization? [28.041879000565874]
We introduce a prompt-tuning method that leverages class descriptions obtained from Large Language Models. Our approach constructs part-level description-guided image and text features, which are subsequently aligned to learn more generalizable prompts. Our comprehensive experiments conducted across 11 benchmark datasets show that our method outperforms established methods.
arXiv Detail & Related papers (2024-05-13T16:52:17Z)
Unlocking the Multi-modal Potential of CLIP for Generalized Category Discovery [50.564146730579424]
We propose a Text Embedding Synthesizer (TES) to generate pseudo text embeddings for unlabelled samples. Our method unlocks the multi-modal potentials of CLIP and outperforms the baseline methods by a large margin on all GCD benchmarks.
arXiv Detail & Related papers (2024-03-15T02:40:13Z)
Text-driven Prompt Generation for Vision-Language Models in Federated Learning [24.005620820818756]
Our work proposes Federated Text-driven Prompt Generation (FedTPG) FedTPG learns a unified prompt generation network across multiple remote clients in a scalable manner. Our comprehensive empirical evaluations on nine diverse image classification datasets show that our method is superior to existing federated prompt learning methods.
arXiv Detail & Related papers (2023-10-09T19:57:24Z)
Visual-Language Prompt Tuning with Knowledge-guided Context Optimization [96.27531485377871]
Representative CoOp-based work combines the learnable textual tokens with the class tokens to obtain specific textual knowledge. We introduce a novel Knowledge-guided Context Optimization (KgCoOp) to enhance the generalization ability of the learnable prompt for unseen classes.
arXiv Detail & Related papers (2023-03-23T14:04:23Z)
LASP: Text-to-Text Optimization for Language-Aware Soft Prompting of Vision & Language Models [67.19124099815645]
We propose a novel Language-Aware Soft Prompting (LASP) learning method to alleviate base class overfitting. LASP is inherently amenable to including, during training, virtual classes, i.e. class names for which no visual samples are available. LASP matches and surpasses, for the first time, the accuracy on novel classes obtained by hand-crafted prompts and CLIP for 8 out of 11 test datasets.
arXiv Detail & Related papers (2022-10-03T17:56:35Z)
PTR: Prompt Tuning with Rules for Text Classification [64.1655047016891]
Fine-tuned pre-trained language models (PLMs) have achieved awesome performance on almost all NLP tasks. We propose prompt tuning with rules (PTR) for many-class text classification. PTR is able to encode prior knowledge of each class into prompt tuning.
arXiv Detail & Related papers (2021-05-24T13:24:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.