Related papers: Uniformly Distributed Category Prototype-Guided Vision-Language Framework for Long-Tail Recognition

Uniformly Distributed Category Prototype-Guided Vision-Language Framework for Long-Tail Recognition

URL: http://arxiv.org/abs/2308.12522v2
Date: Mon, 6 Nov 2023 16:16:02 GMT
Title: Uniformly Distributed Category Prototype-Guided Vision-Language Framework for Long-Tail Recognition
Authors: Siming Fu, Xiaoxuan He, Xinpeng Ding, Yuchen Cao, Hualiang Wang
Abstract summary: We propose a uniformly category prototype-guided vision-language framework to effectively mitigate feature space bias caused by data imbalance. Our method outperforms previous vision-language methods for long-tailed learning work by a large margin and achieves state-of-the-art performance.
Score: 11.110124286206467
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recently, large-scale pre-trained vision-language models have presented benefits for alleviating class imbalance in long-tailed recognition. However, the long-tailed data distribution can corrupt the representation space, where the distance between head and tail categories is much larger than the distance between two tail categories. This uneven feature space distribution causes the model to exhibit unclear and inseparable decision boundaries on the uniformly distributed test set, which lowers its performance. To address these challenges, we propose the uniformly category prototype-guided vision-language framework to effectively mitigate feature space bias caused by data imbalance. Especially, we generate a set of category prototypes uniformly distributed on a hypersphere. Category prototype-guided mechanism for image-text matching makes the features of different classes converge to these distinct and uniformly distributed category prototypes, which maintain a uniform distribution in the feature space, and improve class boundaries. Additionally, our proposed irrelevant text filtering and attribute enhancement module allows the model to ignore irrelevant noisy text and focus more on key attribute information, thereby enhancing the robustness of our framework. In the image recognition fine-tuning stage, to address the positive bias problem of the learnable classifier, we design the class feature prototype-guided classifier, which compensates for the performance of tail classes while maintaining the performance of head classes. Our method outperforms previous vision-language methods for long-tailed learning work by a large margin and achieves state-of-the-art performance.

Related papers

Class-Aware Prototype Learning with Negative Contrast for Test-Time Adaptation of Vision-Language Models [48.61795272482598]
Vision-Language Models (VLMs) demonstrate impressive zero-shot generalization through large-scale image-text pretraining.<n>But their performance can drop once the deployment distribution diverges from the training distribution.<n>Test-Time Adaptation (TTA) methods update models using unlabeled target data.<n>We propose textbfClass-Aware textbfPrototype textbfL with textbfNegative textbfContrast(textbfCPL-NC), a lightweight TTA framework
arXiv Detail & Related papers (2025-10-22T17:38:35Z)
Simple-Sampling and Hard-Mixup with Prototypes to Rebalance Contrastive Learning for Text Classification [11.072083437769093]
We propose a novel model named SharpReCL for imbalanced text classification tasks. Our model even outperforms popular large language models across several datasets.
arXiv Detail & Related papers (2024-05-19T11:33:49Z)
Subclass-balancing Contrastive Learning for Long-tailed Recognition [38.31221755013738]
Long-tailed recognition with imbalanced class distribution naturally emerges in practical machine learning applications. We propose a novel subclass-balancing contrastive learning'' approach that clusters each head class into multiple subclasses of similar sizes as the tail classes. We evaluate SBCL over a list of long-tailed benchmark datasets and it achieves the state-of-the-art performance.
arXiv Detail & Related papers (2023-06-28T05:08:43Z)
Unicom: Universal and Compact Representation Learning for Image Retrieval [65.96296089560421]
We cluster the large-scale LAION400M into one million pseudo classes based on the joint textual and visual features extracted by the CLIP model. To alleviate such conflict, we randomly select partial inter-class prototypes to construct the margin-based softmax loss. Our method significantly outperforms state-of-the-art unsupervised and supervised image retrieval approaches on multiple benchmarks.
arXiv Detail & Related papers (2023-04-12T14:25:52Z)
Constructing Balance from Imbalance for Long-tailed Image Recognition [50.6210415377178]
The imbalance between majority (head) classes and minority (tail) classes severely skews the data-driven deep neural networks. Previous methods tackle with data imbalance from the viewpoints of data distribution, feature space, and model design. We propose a concise paradigm by progressively adjusting label space and dividing the head classes and tail classes. Our proposed model also provides a feature evaluation method and paves the way for long-tailed feature learning.
arXiv Detail & Related papers (2022-08-04T10:22:24Z)
CAD: Co-Adapting Discriminative Features for Improved Few-Shot Classification [11.894289991529496]
Few-shot classification is a challenging problem that aims to learn a model that can adapt to unseen classes given a few labeled samples. Recent approaches pre-train a feature extractor, and then fine-tune for episodic meta-learning. We propose a strategy to cross-attend and re-weight discriminative features for few-shot classification.
arXiv Detail & Related papers (2022-03-25T06:14:51Z)
Dual Prototypical Contrastive Learning for Few-shot Semantic Segmentation [55.339405417090084]
We propose a dual prototypical contrastive learning approach tailored to the few-shot semantic segmentation (FSS) task. The main idea is to encourage the prototypes more discriminative by increasing inter-class distance while reducing intra-class distance in prototype feature space. We demonstrate that the proposed dual contrastive learning approach outperforms state-of-the-art FSS methods on PASCAL-5i and COCO-20i datasets.
arXiv Detail & Related papers (2021-11-09T08:14:50Z)
Learning Debiased and Disentangled Representations for Semantic Segmentation [52.35766945827972]
We propose a model-agnostic and training scheme for semantic segmentation. By randomly eliminating certain class information in each training iteration, we effectively reduce feature dependencies among classes. Models trained with our approach demonstrate strong results on multiple semantic segmentation benchmarks.
arXiv Detail & Related papers (2021-10-31T16:15:09Z)
Learning and Evaluating Representations for Deep One-class Classification [59.095144932794646]
We present a two-stage framework for deep one-class classification. We first learn self-supervised representations from one-class data, and then build one-class classifiers on learned representations. In experiments, we demonstrate state-of-the-art performance on visual domain one-class classification benchmarks.
arXiv Detail & Related papers (2020-11-04T23:33:41Z)
Rethinking Class-Balanced Methods for Long-Tailed Visual Recognition from a Domain Adaptation Perspective [98.70226503904402]
Object frequency in the real world often follows a power law, leading to a mismatch between datasets with long-tailed class distributions. We propose to augment the classic class-balanced learning by explicitly estimating the differences between the class-conditioned distributions with a meta-learning approach.
arXiv Detail & Related papers (2020-03-24T11:28:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.