Multi-Class Textual-Inversion Secretly Yields a Semantic-Agnostic Classifier
- URL: http://arxiv.org/abs/2410.22317v1
- Date: Tue, 29 Oct 2024 17:55:02 GMT
- Title: Multi-Class Textual-Inversion Secretly Yields a Semantic-Agnostic Classifier
- Authors: Kai Wang, Fei Yang, Bogdan Raducanu, Joost van de Weijer,
- Abstract summary: Text-to-Image (T2I) personalization methods aim to adapt T2I models to unseen concepts by learning new tokens.
We show that new concept tokens possess both generation and classification capabilities by regarding each category as a single concept.
We propose Multi-Class textual inversion, which includes a discriminative regularization term for the token updating process.
- Score: 20.95472997379712
- License:
- Abstract: With the advent of large pre-trained vision-language models such as CLIP, prompt learning methods aim to enhance the transferability of the CLIP model. They learn the prompt given few samples from the downstream task given the specific class names as prior knowledge, which we term as semantic-aware classification. However, in many realistic scenarios, we only have access to few samples and knowledge of the class names (e.g., when considering instances of classes). This challenging scenario represents the semantic-agnostic discriminative case. Text-to-Image (T2I) personalization methods aim to adapt T2I models to unseen concepts by learning new tokens and endowing these tokens with the capability of generating the learned concepts. These methods do not require knowledge of class names as a semantic-aware prior. Therefore, in this paper, we first explore Textual Inversion and reveal that the new concept tokens possess both generation and classification capabilities by regarding each category as a single concept. However, learning classifiers from single-concept textual inversion is limited since the learned tokens are suboptimal for the discriminative tasks. To mitigate this issue, we propose Multi-Class textual inversion, which includes a discriminative regularization term for the token updating process. Using this technique, our method MC-TI achieves stronger Semantic-Agnostic Classification while preserving the generation capability of these modifier tokens given only few samples per category. In the experiments, we extensively evaluate MC-TI on 12 datasets covering various scenarios, which demonstrates that MC-TI achieves superior results in terms of both classification and generation outcomes.
Related papers
- Evolving Interpretable Visual Classifiers with Large Language Models [34.4903887876357]
Multimodal pre-trained models, such as CLIP, are popular for zero-shot classification due to their open-vocabulary flexibility and high performance.
vision-language models, which compute similarity scores between images and class labels, are largely black-box, with limited interpretability, risk for bias, and inability to discover new visual concepts not written down.
We present a novel method that discovers interpretable yet discriminative sets of attributes for visual recognition.
arXiv Detail & Related papers (2024-04-15T17:09:53Z) - Generative Multi-modal Models are Good Class-Incremental Learners [51.5648732517187]
We propose a novel generative multi-modal model (GMM) framework for class-incremental learning.
Our approach directly generates labels for images using an adapted generative model.
Under the Few-shot CIL setting, we have improved by at least 14% accuracy over all the current state-of-the-art methods with significantly less forgetting.
arXiv Detail & Related papers (2024-03-27T09:21:07Z) - Unlocking the Multi-modal Potential of CLIP for Generalized Category Discovery [50.564146730579424]
We propose a Text Embedding Synthesizer (TES) to generate pseudo text embeddings for unlabelled samples.
Our method unlocks the multi-modal potentials of CLIP and outperforms the baseline methods by a large margin on all GCD benchmarks.
arXiv Detail & Related papers (2024-03-15T02:40:13Z) - Data-free Multi-label Image Recognition via LLM-powered Prompt Tuning [23.671999163027284]
This paper proposes a novel framework for multi-label image recognition without any training data.
It uses knowledge of pre-trained Large Language Model to learn prompts to adapt pretrained Vision-Language Model like CLIP to multilabel classification.
Our framework presents a new way to explore the synergies between multiple pre-trained models for novel category recognition.
arXiv Detail & Related papers (2024-03-02T13:43:32Z) - Boosting Prompt-Based Self-Training With Mapping-Free Automatic
Verbalizer for Multi-Class Classification [3.647905567437244]
We introduce a novel, efficient verbalizer structure, named Mapping-free Automatic Verbal Modelingizer (MAV)
MAV serves as a trainable verbalizer that automatically extracts the requisite word features for classification by capitalizing on all available information from predictions.
Experimental results on five multi-class classification datasets indicate MAV's superior self-training efficacy.
arXiv Detail & Related papers (2023-12-08T11:43:00Z) - XAI-CLASS: Explanation-Enhanced Text Classification with Extremely Weak
Supervision [6.406111099707549]
XAI-CLASS is a novel explanation-enhanced weakly-supervised text classification method.
It incorporates word saliency prediction as an auxiliary task.
XAI-CLASS outperforms other weakly-supervised text classification methods significantly.
arXiv Detail & Related papers (2023-10-31T23:24:22Z) - TagCLIP: Improving Discrimination Ability of Open-Vocabulary Semantic Segmentation [53.974228542090046]
Contrastive Language-Image Pre-training (CLIP) has recently shown great promise in pixel-level zero-shot learning tasks.
Existing approaches utilizing CLIP's text and patch embeddings to generate semantic masks often misidentify input pixels from unseen classes.
We propose TagCLIP (Trusty-aware guided CLIP) to address this issue.
arXiv Detail & Related papers (2023-04-15T12:52:23Z) - Task-Specific Embeddings for Ante-Hoc Explainable Text Classification [6.671252951387647]
We propose an alternative training objective in which we learn task-specific embeddings of text.
Our proposed objective learns embeddings such that all texts that share the same target class label should be close together.
We present extensive experiments which show that the benefits of ante-hoc explainability and incremental learning come at no cost in overall classification accuracy.
arXiv Detail & Related papers (2022-11-30T19:56:25Z) - CCPrefix: Counterfactual Contrastive Prefix-Tuning for Many-Class
Classification [57.62886091828512]
We propose a brand-new prefix-tuning method, Counterfactual Contrastive Prefix-tuning (CCPrefix) for many-class classification.
Basically, an instance-dependent soft prefix, derived from fact-counterfactual pairs in the label space, is leveraged to complement the language verbalizers in many-class classification.
arXiv Detail & Related papers (2022-11-11T03:45:59Z) - Multi-Modal Few-Shot Object Detection with Meta-Learning-Based
Cross-Modal Prompting [77.69172089359606]
We study multi-modal few-shot object detection (FSOD) in this paper, using both few-shot visual examples and class semantic information for detection.
Our approach is motivated by the high-level conceptual similarity of (metric-based) meta-learning and prompt-based learning.
We comprehensively evaluate the proposed multi-modal FSOD models on multiple few-shot object detection benchmarks, achieving promising results.
arXiv Detail & Related papers (2022-04-16T16:45:06Z) - Dual Prototypical Contrastive Learning for Few-shot Semantic
Segmentation [55.339405417090084]
We propose a dual prototypical contrastive learning approach tailored to the few-shot semantic segmentation (FSS) task.
The main idea is to encourage the prototypes more discriminative by increasing inter-class distance while reducing intra-class distance in prototype feature space.
We demonstrate that the proposed dual contrastive learning approach outperforms state-of-the-art FSS methods on PASCAL-5i and COCO-20i datasets.
arXiv Detail & Related papers (2021-11-09T08:14:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.