ATPrompt: Textual Prompt Learning with Embedded Attributes
- URL: http://arxiv.org/abs/2412.09442v1
- Date: Thu, 12 Dec 2024 16:57:20 GMT
- Title: ATPrompt: Textual Prompt Learning with Embedded Attributes
- Authors: Zheng Li, Yibing Song, Penghai Zhao, Ming-Ming Cheng, Xiang Li, Jian Yang,
- Abstract summary: We introduce an Attribute-embedded Textual Prompt learning method for vision-language models, named ATPrompt.
We transform the text prompt from a category-centric form to an attribute-category hybrid form.
As an easy-to-use plug-in technique, ATPrompt can seamlessly replace the existing prompt format.
- Score: 73.1352833091256
- License:
- Abstract: Textual-based prompt learning methods primarily employ multiple learnable soft prompts and hard class tokens in a cascading manner as text prompt inputs, aiming to align image and text (category) spaces for downstream tasks. However, current training is restricted to aligning images with predefined known categories and cannot be associated with unknown categories. In this work, we propose utilizing universal attributes as a bridge to enhance the alignment between images and unknown categories. Specifically, we introduce an Attribute-embedded Textual Prompt learning method for vision-language models, named ATPrompt. This approach expands the learning space of soft prompts from the original one-dimensional category level into the multi-dimensional attribute level by incorporating multiple universal attribute tokens into the learnable soft prompts. Through this modification, we transform the text prompt from a category-centric form to an attribute-category hybrid form. To finalize the attributes for downstream tasks, we propose a differentiable attribute search method that learns to identify representative and suitable attributes from a candidate pool summarized by a large language model. As an easy-to-use plug-in technique, ATPrompt can seamlessly replace the existing prompt format of textual-based methods, offering general improvements at a negligible computational cost. Extensive experiments on 11 datasets demonstrate the effectiveness of our method.
Related papers
- Descriminative-Generative Custom Tokens for Vision-Language Models [101.40245125955306]
This paper explores the possibility of learning custom tokens for representing new concepts in Vision-Language Models (VLMs)
Our aim is to learn tokens that can be effective for both discriminative and generative tasks while composing well with words to form new input queries.
arXiv Detail & Related papers (2025-02-17T18:13:42Z) - From Open-Vocabulary to Vocabulary-Free Semantic Segmentation [78.62232202171919]
Open-vocabulary semantic segmentation enables models to identify novel object categories beyond their training data.
Current approaches still rely on manually specified class names as input, creating an inherent bottleneck in real-world applications.
This work proposes a Vocabulary-Free Semantic pipeline, eliminating the need for predefined class vocabularies.
arXiv Detail & Related papers (2025-02-17T15:17:08Z) - Tree of Attributes Prompt Learning for Vision-Language Models [27.64685205305313]
We propose the Tree of Attributes Prompt learning (TAP), which generates a tree of attributes with a "concept - attribute - description" structure for each category.
Unlike existing methods that merely augment category names with a set of unstructured descriptions, our approach essentially distills structured knowledge graphs.
Our approach introduces text and vision prompts designed to explicitly learn the corresponding visual attributes, effectively serving as domain experts.
arXiv Detail & Related papers (2024-10-15T02:37:39Z) - Mixture of Prompt Learning for Vision Language Models [12.828490399811376]
We propose a mixture of soft prompt learning method incorporating a routing module.
This module is able to capture a dataset's varied styles and dynamically selects the most suitable prompts for each instance.
We also implement semantically grouped text-level supervision, initializing each soft prompt with the token embeddings of manually designed templates from its group.
arXiv Detail & Related papers (2024-09-18T14:25:02Z) - CoAPT: Context Attribute words for Prompt Tuning [5.811993982861212]
We propose a novel prompt tuning method called CoAPT for few/zero-shot image classification.
The core motivation is that attributes are descriptive words with rich information about a given concept.
CoAPT integrates words as additional prompts within learnable prompt tuning and can be easily incorporated into various existing prompt tuning methods.
arXiv Detail & Related papers (2024-07-18T08:58:01Z) - Open-Vocabulary Temporal Action Localization using Multimodal Guidance [67.09635853019005]
OVTAL enables a model to recognize any desired action category in videos without the need to explicitly curate training data for all categories.
This flexibility poses significant challenges, as the model must recognize not only the action categories seen during training but also novel categories specified at inference.
We introduce OVFormer, a novel open-vocabulary framework extending ActionFormer with three key contributions.
arXiv Detail & Related papers (2024-06-21T18:00:05Z) - TCP:Textual-based Class-aware Prompt tuning for Visual-Language Model [78.77544632773404]
We present a Textual-based Class-aware Prompt tuning( TCP) that explicitly incorporates prior knowledge about classes to enhance their discriminability.
TCP consistently achieves superior performance while demanding less training time.
arXiv Detail & Related papers (2023-11-30T03:59:23Z) - Multi-Prompt with Depth Partitioned Cross-Modal Learning [25.239388488952375]
Partitioned Multi-modal Prompt (PMPO) is a multi-modal prompting technique that extends the soft prompt from a single learnable prompt to multiple prompts.
Our method divides the visual encoder depths and connects learnable prompts to the separated visual depths, enabling different prompts to capture hierarchical contextual depths.
We evaluate the effectiveness of our approach on three challenging tasks: new class generalization, cross-dataset evaluation, and domain generalization.
arXiv Detail & Related papers (2023-05-10T14:54:29Z) - Hard Prompts Made Easy: Gradient-Based Discrete Optimization for Prompt
Tuning and Discovery [55.905769757007185]
We describe an approach to robustly optimize hard text prompts through efficient gradient-based optimization.
Our approach automatically generates hard text-based prompts for both text-to-image and text-to-text applications.
In the text-to-text setting, we show that hard prompts can be automatically discovered that are effective in tuning LMs for classification.
arXiv Detail & Related papers (2023-02-07T18:40:18Z) - Prompt-Learning for Short Text Classification [30.53216712864025]
In short text, the extreme short length, feature sparsity and high ambiguity pose huge challenge to classification tasks.
In this paper, we propose a simple short text classification approach that makes use of prompt-learning based on knowledgeable expansion.
arXiv Detail & Related papers (2022-02-23T08:07:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.