Improving Zero-Shot Generalization for CLIP with Synthesized Prompts
- URL: http://arxiv.org/abs/2307.07397v1
- Date: Fri, 14 Jul 2023 15:15:45 GMT
- Title: Improving Zero-Shot Generalization for CLIP with Synthesized Prompts
- Authors: Zhengbo Wang, Jian Liang, Ran He, Nan Xu, Zilei Wang, Tieniu Tan
- Abstract summary: Most existing methods require labeled data for all classes, which may not hold in real-world applications.
We propose a plug-and-play generative approach called textbfSynttextbfHestextbfIzed textbfPrompts(textbfSHIP) to improve existing fine-tuning methods.
- Score: 135.4317555866831
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the growing interest in pretrained vision-language models like CLIP,
recent research has focused on adapting these models to downstream tasks.
Despite achieving promising results, most existing methods require labeled data
for all classes, which may not hold in real-world applications due to the long
tail and Zipf's law. For example, some classes may lack labeled data entirely,
such as emerging concepts. To address this problem, we propose a plug-and-play
generative approach called \textbf{S}ynt\textbf{H}es\textbf{I}zed
\textbf{P}rompts~(\textbf{SHIP}) to improve existing fine-tuning methods.
Specifically, we follow variational autoencoders to introduce a generator that
reconstructs the visual features by inputting the synthesized prompts and the
corresponding class names to the textual encoder of CLIP. In this manner, we
easily obtain the synthesized features for the remaining label-only classes.
Thereafter, we fine-tune CLIP with off-the-shelf methods by combining labeled
and synthesized features. Extensive experiments on base-to-new generalization,
cross-dataset transfer learning, and generalized zero-shot learning demonstrate
the superiority of our approach. The code is available at
\url{https://github.com/mrflogs/SHIP}.
Related papers
- Cascade-CLIP: Cascaded Vision-Language Embeddings Alignment for Zero-Shot Semantic Segmentation [72.47110803885235]
We introduce a novel framework named Cascade-CLIP for zero-shot semantic segmentation.
Our framework achieves superior zero-shot performance on segmentation benchmarks like COCO-Stuff, Pascal-VOC, and Pascal-Context.
arXiv Detail & Related papers (2024-06-02T08:32:51Z) - Unlocking the Multi-modal Potential of CLIP for Generalized Category Discovery [50.564146730579424]
We propose a Text Embedding Synthesizer (TES) to generate pseudo text embeddings for unlabelled samples.
Our method unlocks the multi-modal potentials of CLIP and outperforms the baseline methods by a large margin on all GCD benchmarks.
arXiv Detail & Related papers (2024-03-15T02:40:13Z) - Semantic Residual Prompts for Continual Learning [21.986800282078498]
We show that our method significantly outperforms both state-of-the-art CL approaches and the zero-shot CLIP test.
Our findings hold true even for datasets with a substantial domain gap w.r.t. the pre-training knowledge of the backbone model.
arXiv Detail & Related papers (2024-03-11T16:23:38Z) - Learning to Prompt with Text Only Supervision for Vision-Language Models [107.282881515667]
One branch of methods adapts CLIP by learning prompts using visual information.
An alternative approach resorts to training-free methods by generating class descriptions from large language models.
We propose to combine the strengths of both streams by learning prompts using only text data.
arXiv Detail & Related papers (2024-01-04T18:59:49Z) - ProxyDet: Synthesizing Proxy Novel Classes via Classwise Mixup for
Open-Vocabulary Object Detection [7.122652901894367]
Open-vocabulary object detection (OVOD) aims to recognize novel objects whose categories are not included in the training set.
We present a novel, yet simple technique that helps generalization on the overall distribution of novel classes.
arXiv Detail & Related papers (2023-12-12T13:45:56Z) - AttriCLIP: A Non-Incremental Learner for Incremental Knowledge Learning [53.32576252950481]
Continual learning aims to enable a model to incrementally learn knowledge from sequentially arrived data.
In this paper, we propose a non-incremental learner, named AttriCLIP, to incrementally extract knowledge of new classes or tasks.
arXiv Detail & Related papers (2023-05-19T07:39:17Z) - OrdinalCLIP: Learning Rank Prompts for Language-Guided Ordinal
Regression [94.28253749970534]
We propose to learn the rank concepts from the rich semantic CLIP latent space.
OrdinalCLIP consists of learnable context tokens and learnable rank embeddings.
Experimental results show that our paradigm achieves competitive performance in general ordinal regression tasks.
arXiv Detail & Related papers (2022-06-06T03:54:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.