Enhancing Visual Classification using Comparative Descriptors
- URL: http://arxiv.org/abs/2411.05357v2
- Date: Mon, 11 Nov 2024 02:24:36 GMT
- Title: Enhancing Visual Classification using Comparative Descriptors
- Authors: Hankyeol Lee, Gawon Seo, Wonseok Choi, Geunyoung Jung, Kyungwoo Song, Jiyoung Jung,
- Abstract summary: We introduce a novel concept of comparative descriptors.
These descriptors emphasize the unique features of a target class against its most similar classes, enhancing differentiation.
An additional filtering process ensures that these descriptors are closer to the image embeddings in the CLIP space.
- Score: 13.094102298155736
- License:
- Abstract: The performance of vision-language models (VLMs), such as CLIP, in visual classification tasks, has been enhanced by leveraging semantic knowledge from large language models (LLMs), including GPT. Recent studies have shown that in zero-shot classification tasks, descriptors incorporating additional cues, high-level concepts, or even random characters often outperform those using only the category name. In many classification tasks, while the top-1 accuracy may be relatively low, the top-5 accuracy is often significantly higher. This gap implies that most misclassifications occur among a few similar classes, highlighting the model's difficulty in distinguishing between classes with subtle differences. To address this challenge, we introduce a novel concept of comparative descriptors. These descriptors emphasize the unique features of a target class against its most similar classes, enhancing differentiation. By generating and integrating these comparative descriptors into the classification framework, we refine the semantic focus and improve classification accuracy. An additional filtering process ensures that these descriptors are closer to the image embeddings in the CLIP space, further enhancing performance. Our approach demonstrates improved accuracy and robustness in visual classification tasks by addressing the specific challenge of subtle inter-class differences.
Related papers
- RAR: Retrieving And Ranking Augmented MLLMs for Visual Recognition [78.97487780589574]
Multimodal Large Language Models (MLLMs) excel at classifying fine-grained categories.
This paper introduces a Retrieving And Ranking augmented method for MLLMs.
Our proposed approach not only addresses the inherent limitations in fine-grained recognition but also preserves the model's comprehensive knowledge base.
arXiv Detail & Related papers (2024-03-20T17:59:55Z) - LLMs as Visual Explainers: Advancing Image Classification with Evolving
Visual Descriptions [13.546494268784757]
We propose a framework that integrates large language models (LLMs) and vision-language models (VLMs) to find the optimal class descriptors.
Our training-free approach develops an LLM-based agent with an evolutionary optimization strategy to iteratively refine class descriptors.
arXiv Detail & Related papers (2023-11-20T16:37:45Z) - Mitigating Word Bias in Zero-shot Prompt-based Classifiers [55.60306377044225]
We show that matching class priors correlates strongly with the oracle upper bound performance.
We also demonstrate large consistent performance gains for prompt settings over a range of NLP tasks.
arXiv Detail & Related papers (2023-09-10T10:57:41Z) - Waffling around for Performance: Visual Classification with Random Words
and Broad Concepts [121.60918966567657]
WaffleCLIP is a framework for zero-shot visual classification which simply replaces LLM-generated descriptors with random character and word descriptors.
We conduct an extensive experimental study on the impact and shortcomings of additional semantics introduced with LLM-generated descriptors.
arXiv Detail & Related papers (2023-06-12T17:59:48Z) - Visual Classification via Description from Large Language Models [23.932495654407425]
Vision-language models (VLMs) have shown promising performance on a variety of recognition tasks.
We present an alternative framework for classification with VLMs, which we call classification by description.
arXiv Detail & Related papers (2022-10-13T17:03:46Z) - Fine-Grained Visual Classification using Self Assessment Classifier [12.596520707449027]
Extracting discriminative features plays a crucial role in the fine-grained visual classification task.
In this paper, we introduce a Self Assessment, which simultaneously leverages the representation of the image and top-k prediction classes.
We show that our method achieves new state-of-the-art results on CUB200-2011, Stanford Dog, and FGVC Aircraft datasets.
arXiv Detail & Related papers (2022-05-21T07:41:27Z) - Comparison Knowledge Translation for Generalizable Image Classification [31.530232003512957]
We build a generalizable framework that emulates the humans' recognition mechanism in the image classification task.
We put forward a Comparison Classification Translation Network (CCT-Net), which comprises a comparison classifier and a matching discriminator.
CCT-Net achieves surprising generalization ability on unseen categories and SOTA performance on target categories.
arXiv Detail & Related papers (2022-05-07T11:05:18Z) - On Guiding Visual Attention with Language Specification [76.08326100891571]
We use high-level language specification as advice for constraining the classification evidence to task-relevant features, instead of distractors.
We show that supervising spatial attention in this way improves performance on classification tasks with biased and noisy data.
arXiv Detail & Related papers (2022-02-17T22:40:19Z) - CLLD: Contrastive Learning with Label Distance for Text Classificatioin [0.6299766708197883]
We propose Contrastive Learning with Label Distance (CLLD) for learning contrastive classes.
CLLD ensures the flexibility within the subtle differences that lead to different label assignments.
Our experiments suggest that the learned label distance relieve the adversarial nature of interclasses.
arXiv Detail & Related papers (2021-10-25T07:07:14Z) - Multi-Label Image Classification with Contrastive Learning [57.47567461616912]
We show that a direct application of contrastive learning can hardly improve in multi-label cases.
We propose a novel framework for multi-label classification with contrastive learning in a fully supervised setting.
arXiv Detail & Related papers (2021-07-24T15:00:47Z) - Learning and Evaluating Representations for Deep One-class
Classification [59.095144932794646]
We present a two-stage framework for deep one-class classification.
We first learn self-supervised representations from one-class data, and then build one-class classifiers on learned representations.
In experiments, we demonstrate state-of-the-art performance on visual domain one-class classification benchmarks.
arXiv Detail & Related papers (2020-11-04T23:33:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.