Texts as Images in Prompt Tuning for Multi-Label Image Recognition
- URL: http://arxiv.org/abs/2211.12739v1
- Date: Wed, 23 Nov 2022 07:00:11 GMT
- Title: Texts as Images in Prompt Tuning for Multi-Label Image Recognition
- Authors: Zixian Guo, Bowen Dong, Zhilong Ji, Jinfeng Bai, Yiwen Guo, Wangmeng
Zuo
- Abstract summary: We advocate that image-text contrastive learning makes it feasible to treat texts as images for prompt tuning and introduce TaI prompting.
Particularly, we apply TaI prompting to multi-label image recognition, where sentences in the wild serve as alternatives to images for prompt tuning.
Our proposed TaI-DPT outperforms zero-shot CLIP by a large margin on multiple benchmarks.
- Score: 70.9310322461598
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Prompt tuning has been employed as an efficient way to adapt large
vision-language pre-trained models (e.g. CLIP) to various downstream tasks in
data-limited or label-limited settings. Nonetheless, visual data (e.g., images)
is by default prerequisite for learning prompts in existing methods. In this
work, we advocate that the effectiveness of image-text contrastive learning in
aligning the two modalities (for training CLIP) further makes it feasible to
treat texts as images for prompt tuning and introduce TaI prompting. In
contrast to the visual data, text descriptions are easy to collect, and their
class labels can be directly derived. Particularly, we apply TaI prompting to
multi-label image recognition, where sentences in the wild serve as
alternatives to images for prompt tuning. Moreover, with TaI, double-grained
prompt tuning (TaI-DPT) is further presented to extract both coarse-grained and
fine-grained embeddings for enhancing the multi-label recognition performance.
Experimental results show that our proposed TaI-DPT outperforms zero-shot CLIP
by a large margin on multiple benchmarks, e.g., MS-COCO, VOC2007, and NUS-WIDE,
while it can be combined with existing methods of prompting from images to
improve recognition performance further. Code is released at
https://github.com/guozix/TaI-DPT.
Related papers
- CoAPT: Context Attribute words for Prompt Tuning [5.811993982861212]
We propose a novel prompt tuning method called CoAPT for few/zero-shot image classification.
The core motivation is that attributes are descriptive words with rich information about a given concept.
CoAPT integrates words as additional prompts within learnable prompt tuning and can be easily incorporated into various existing prompt tuning methods.
arXiv Detail & Related papers (2024-07-18T08:58:01Z) - TAI++: Text as Image for Multi-Label Image Classification by Co-Learning Transferable Prompt [15.259819430801402]
We propose a pseudo-visual prompt(PVP) module for implicit visual prompt tuning to address this problem.
Specifically, we first learn the pseudo-visual prompt for each category, mining diverse visual knowledge by the well-aligned space of pre-trained vision-language models.
Experimental results on VOC2007, MS-COCO, and NUSWIDE datasets demonstrate that our method can surpass state-of-the-art(SOTA) methods.
arXiv Detail & Related papers (2024-05-11T06:11:42Z) - VIXEN: Visual Text Comparison Network for Image Difference Captioning [58.16313862434814]
We present VIXEN, a technique that succinctly summarizes in text the visual differences between a pair of images.
Our proposed network linearly maps image features in a pairwise manner, constructing a soft prompt for a pretrained large language model.
arXiv Detail & Related papers (2024-02-29T12:56:18Z) - Iterative Prompt Learning for Unsupervised Backlit Image Enhancement [86.90993077000789]
We propose a novel unsupervised backlit image enhancement method, abbreviated as CLIP-LIT.
We show that the open-world CLIP prior aids in distinguishing between backlit and well-lit images.
Our method alternates between updating the prompt learning framework and enhancement network until visually pleasing results are achieved.
arXiv Detail & Related papers (2023-03-30T17:37:14Z) - Task-Oriented Multi-Modal Mutual Leaning for Vision-Language Models [52.3032592038514]
We propose a class-aware text prompt to enrich generated prompts with label-related image information.
We achieve an average improvement of 4.03% on new classes and 3.19% on harmonic-mean over eleven classification benchmarks.
arXiv Detail & Related papers (2023-03-30T06:02:40Z) - CLIP-ReID: Exploiting Vision-Language Model for Image Re-Identification
without Concrete Text Labels [28.42405456691034]
We propose a two-stage strategy to facilitate a better visual representation in image re-identification tasks.
The key idea is to fully exploit the cross-modal description ability in CLIP through a set of learnable text tokens for each ID.
The effectiveness of the proposed strategy is validated on several datasets for the person or vehicle ReID tasks.
arXiv Detail & Related papers (2022-11-25T09:41:57Z) - Dual Modality Prompt Tuning for Vision-Language Pre-Trained Model [39.722927180264584]
We propose a novel Dual-modality Prompt Tuning (DPT) paradigm through learning text and visual prompts simultaneously.
To make the final image feature concentrate more on the target visual concept, a Class-Aware Visual Prompt Tuning scheme is proposed.
arXiv Detail & Related papers (2022-08-17T15:06:36Z) - DualCoOp: Fast Adaptation to Multi-Label Recognition with Limited
Annotations [61.41339201200135]
We propose Dual Context Optimization (DualCoOp) as a unified framework for partial-label MLR and zero-shot MLR.
Since DualCoOp only introduces a very light learnable overhead upon the pretrained vision-language framework, it can quickly adapt to multi-label recognition tasks.
arXiv Detail & Related papers (2022-06-20T02:36:54Z) - CRIS: CLIP-Driven Referring Image Segmentation [71.56466057776086]
We propose an end-to-end CLIP-Driven Referring Image framework (CRIS)
CRIS resorts to vision-language decoding and contrastive learning for achieving the text-to-pixel alignment.
Our proposed framework significantly outperforms the state-of-the-art performance without any post-processing.
arXiv Detail & Related papers (2021-11-30T07:29:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.