Uniformly Distributed Category Prototype-Guided Vision-Language
Framework for Long-Tail Recognition
- URL: http://arxiv.org/abs/2308.12522v2
- Date: Mon, 6 Nov 2023 16:16:02 GMT
- Title: Uniformly Distributed Category Prototype-Guided Vision-Language
Framework for Long-Tail Recognition
- Authors: Siming Fu, Xiaoxuan He, Xinpeng Ding, Yuchen Cao, Hualiang Wang
- Abstract summary: We propose a uniformly category prototype-guided vision-language framework to effectively mitigate feature space bias caused by data imbalance.
Our method outperforms previous vision-language methods for long-tailed learning work by a large margin and achieves state-of-the-art performance.
- Score: 11.110124286206467
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, large-scale pre-trained vision-language models have presented
benefits for alleviating class imbalance in long-tailed recognition. However,
the long-tailed data distribution can corrupt the representation space, where
the distance between head and tail categories is much larger than the distance
between two tail categories. This uneven feature space distribution causes the
model to exhibit unclear and inseparable decision boundaries on the uniformly
distributed test set, which lowers its performance. To address these
challenges, we propose the uniformly category prototype-guided vision-language
framework to effectively mitigate feature space bias caused by data imbalance.
Especially, we generate a set of category prototypes uniformly distributed on a
hypersphere. Category prototype-guided mechanism for image-text matching makes
the features of different classes converge to these distinct and uniformly
distributed category prototypes, which maintain a uniform distribution in the
feature space, and improve class boundaries. Additionally, our proposed
irrelevant text filtering and attribute enhancement module allows the model to
ignore irrelevant noisy text and focus more on key attribute information,
thereby enhancing the robustness of our framework. In the image recognition
fine-tuning stage, to address the positive bias problem of the learnable
classifier, we design the class feature prototype-guided classifier, which
compensates for the performance of tail classes while maintaining the
performance of head classes. Our method outperforms previous vision-language
methods for long-tailed learning work by a large margin and achieves
state-of-the-art performance.
Related papers
- Simple-Sampling and Hard-Mixup with Prototypes to Rebalance Contrastive Learning for Text Classification [11.072083437769093]
We propose a novel model named SharpReCL for imbalanced text classification tasks.
Our model even outperforms popular large language models across several datasets.
arXiv Detail & Related papers (2024-05-19T11:33:49Z) - Subclass-balancing Contrastive Learning for Long-tailed Recognition [38.31221755013738]
Long-tailed recognition with imbalanced class distribution naturally emerges in practical machine learning applications.
We propose a novel subclass-balancing contrastive learning'' approach that clusters each head class into multiple subclasses of similar sizes as the tail classes.
We evaluate SBCL over a list of long-tailed benchmark datasets and it achieves the state-of-the-art performance.
arXiv Detail & Related papers (2023-06-28T05:08:43Z) - Unicom: Universal and Compact Representation Learning for Image
Retrieval [65.96296089560421]
We cluster the large-scale LAION400M into one million pseudo classes based on the joint textual and visual features extracted by the CLIP model.
To alleviate such conflict, we randomly select partial inter-class prototypes to construct the margin-based softmax loss.
Our method significantly outperforms state-of-the-art unsupervised and supervised image retrieval approaches on multiple benchmarks.
arXiv Detail & Related papers (2023-04-12T14:25:52Z) - Constructing Balance from Imbalance for Long-tailed Image Recognition [50.6210415377178]
The imbalance between majority (head) classes and minority (tail) classes severely skews the data-driven deep neural networks.
Previous methods tackle with data imbalance from the viewpoints of data distribution, feature space, and model design.
We propose a concise paradigm by progressively adjusting label space and dividing the head classes and tail classes.
Our proposed model also provides a feature evaluation method and paves the way for long-tailed feature learning.
arXiv Detail & Related papers (2022-08-04T10:22:24Z) - CAD: Co-Adapting Discriminative Features for Improved Few-Shot
Classification [11.894289991529496]
Few-shot classification is a challenging problem that aims to learn a model that can adapt to unseen classes given a few labeled samples.
Recent approaches pre-train a feature extractor, and then fine-tune for episodic meta-learning.
We propose a strategy to cross-attend and re-weight discriminative features for few-shot classification.
arXiv Detail & Related papers (2022-03-25T06:14:51Z) - Dual Prototypical Contrastive Learning for Few-shot Semantic
Segmentation [55.339405417090084]
We propose a dual prototypical contrastive learning approach tailored to the few-shot semantic segmentation (FSS) task.
The main idea is to encourage the prototypes more discriminative by increasing inter-class distance while reducing intra-class distance in prototype feature space.
We demonstrate that the proposed dual contrastive learning approach outperforms state-of-the-art FSS methods on PASCAL-5i and COCO-20i datasets.
arXiv Detail & Related papers (2021-11-09T08:14:50Z) - Learning Debiased and Disentangled Representations for Semantic
Segmentation [52.35766945827972]
We propose a model-agnostic and training scheme for semantic segmentation.
By randomly eliminating certain class information in each training iteration, we effectively reduce feature dependencies among classes.
Models trained with our approach demonstrate strong results on multiple semantic segmentation benchmarks.
arXiv Detail & Related papers (2021-10-31T16:15:09Z) - Learning and Evaluating Representations for Deep One-class
Classification [59.095144932794646]
We present a two-stage framework for deep one-class classification.
We first learn self-supervised representations from one-class data, and then build one-class classifiers on learned representations.
In experiments, we demonstrate state-of-the-art performance on visual domain one-class classification benchmarks.
arXiv Detail & Related papers (2020-11-04T23:33:41Z) - Rethinking Class-Balanced Methods for Long-Tailed Visual Recognition
from a Domain Adaptation Perspective [98.70226503904402]
Object frequency in the real world often follows a power law, leading to a mismatch between datasets with long-tailed class distributions.
We propose to augment the classic class-balanced learning by explicitly estimating the differences between the class-conditioned distributions with a meta-learning approach.
arXiv Detail & Related papers (2020-03-24T11:28:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.