Exploring Vision-Language Models for Imbalanced Learning
- URL: http://arxiv.org/abs/2304.01457v2
- Date: Wed, 21 Jun 2023 15:44:19 GMT
- Title: Exploring Vision-Language Models for Imbalanced Learning
- Authors: Yidong Wang, Zhuohao Yu, Jindong Wang, Qiang Heng, Hao Chen, Wei Ye,
Rui Xie, Xing Xie, Shikun Zhang
- Abstract summary: Vision-Language models (VLMs) that use contrastive language-image pre-training have shown promising zero-shot classification performance.
Our study highlights the significance of imbalanced learning algorithms in face of VLMs pre-trained by huge data.
- Score: 29.235472353759388
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Vision-Language models (VLMs) that use contrastive language-image
pre-training have shown promising zero-shot classification performance.
However, their performance on imbalanced dataset is relatively poor, where the
distribution of classes in the training dataset is skewed, leading to poor
performance in predicting minority classes. For instance, CLIP achieved only 5%
accuracy on the iNaturalist18 dataset. We propose to add a lightweight decoder
to VLMs to avoid OOM (out of memory) problem caused by large number of classes
and capture nuanced features for tail classes. Then, we explore improvements of
VLMs using prompt tuning, fine-tuning, and incorporating imbalanced algorithms
such as Focal Loss, Balanced SoftMax and Distribution Alignment. Experiments
demonstrate that the performance of VLMs can be further boosted when used with
decoder and imbalanced methods. Specifically, our improved VLMs significantly
outperforms zero-shot classification by an average accuracy of 6.58%, 69.82%,
and 6.17%, on ImageNet-LT, iNaturalist18, and Places-LT, respectively. We
further analyze the influence of pre-training data size, backbones, and
training cost. Our study highlights the significance of imbalanced learning
algorithms in face of VLMs pre-trained by huge data. We release our code at
https://github.com/Imbalance-VLM/Imbalance-VLM.
Related papers
- Conformal-in-the-Loop for Learning with Imbalanced Noisy Data [5.69777817429044]
Class imbalance and label noise are pervasive in large-scale datasets.
Much of machine learning research assumes well-labeled, balanced data, which rarely reflects real world conditions.
We propose Conformal-in-the-Loop (CitL), a novel training framework that addresses both challenges with a conformal prediction-based approach.
arXiv Detail & Related papers (2024-11-04T17:09:58Z) - Entropy Law: The Story Behind Data Compression and LLM Performance [115.70395740286422]
We find that model performance is negatively correlated to the compression ratio of training data, which usually yields a lower training loss.
Based on the findings of the entropy law, we propose a quite efficient and universal data selection method.
We also present an interesting application of entropy law that can detect potential performance risks at the beginning of model training.
arXiv Detail & Related papers (2024-07-09T08:14:29Z) - Uncertainty Aware Learning for Language Model Alignment [97.36361196793929]
We propose uncertainty-aware learning (UAL) to improve the model alignment of different task scenarios.
We implement UAL in a simple fashion -- adaptively setting the label smoothing value of training according to the uncertainty of individual samples.
Experiments on widely used benchmarks demonstrate that our UAL significantly and consistently outperforms standard supervised fine-tuning.
arXiv Detail & Related papers (2024-06-07T11:37:45Z) - Why are Visually-Grounded Language Models Bad at Image Classification? [39.76294811955341]
We revisit the image classification task using visually-grounded language models (VLMs) such as GPT-4V and LLaVA.
We find that existing proprietary and public VLMs significantly underperform CLIP on standard image classification benchmarks like ImageNet.
Our analysis reveals that the primary cause is data-related: critical information for image classification is encoded in the VLM's latent space but can only be effectively decoded with enough training data.
arXiv Detail & Related papers (2024-05-28T17:57:06Z) - Class-Imbalanced Semi-Supervised Learning for Large-Scale Point Cloud
Semantic Segmentation via Decoupling Optimization [64.36097398869774]
Semi-supervised learning (SSL) has been an active research topic for large-scale 3D scene understanding.
The existing SSL-based methods suffer from severe training bias due to class imbalance and long-tail distributions of the point cloud data.
We introduce a new decoupling optimization framework, which disentangles feature representation learning and classifier in an alternative optimization manner to shift the bias decision boundary effectively.
arXiv Detail & Related papers (2024-01-13T04:16:40Z) - Addressing Class Variable Imbalance in Federated Semi-supervised
Learning [10.542178602467885]
We propose Federated Semi-supervised Learning for Class Variable Imbalance (FCVI) to solve class variable imbalance.
FCVI is used to mitigate the data imbalance due to changes of the number of classes.
Our scheme is proved to be significantly better than baseline methods, while maintaining client privacy.
arXiv Detail & Related papers (2023-03-21T12:50:17Z) - Efficient Augmentation for Imbalanced Deep Learning [8.38844520504124]
We study a convolutional neural network's internal representation of imbalanced image data.
We measure the generalization gap between a model's feature embeddings in the training and test sets, showing that the gap is wider for minority classes.
This insight enables us to design an efficient three-phase CNN training framework for imbalanced data.
arXiv Detail & Related papers (2022-07-13T09:43:17Z) - ZeroGen$^+$: Self-Guided High-Quality Data Generation in Efficient
Zero-Shot Learning [97.2907428983142]
ZeroGen attempts to purely use PLM to generate data and train a tiny model without relying on task-specific annotation.
We propose a noise-robust bi-level re-weighting framework which is able to learn the per-sample weights measuring the data quality without requiring any gold data.
arXiv Detail & Related papers (2022-05-25T11:38:48Z) - How Sensitive are Meta-Learners to Dataset Imbalance? [13.60699610822265]
We show that ML methods are more robust against meta-dataset imbalance than imbalance at the task-level.
Overall, these results highlight an implicit strength of ML algorithms, capable of learning generalizable features under dataset imbalance and domain-shift.
arXiv Detail & Related papers (2021-04-12T10:47:42Z) - Deep F-measure Maximization for End-to-End Speech Understanding [52.36496114728355]
We propose a differentiable approximation to the F-measure and train the network with this objective using standard backpropagation.
We perform experiments on two standard fairness datasets, Adult, Communities and Crime, and also on speech-to-intent detection on the ATIS dataset and speech-to-image concept classification on the Speech-COCO dataset.
In all four of these tasks, F-measure results in improved micro-F1 scores, with absolute improvements of up to 8% absolute, as compared to models trained with the cross-entropy loss function.
arXiv Detail & Related papers (2020-08-08T03:02:27Z) - Identifying and Compensating for Feature Deviation in Imbalanced Deep
Learning [59.65752299209042]
We investigate learning a ConvNet under such a scenario.
We found that a ConvNet significantly over-fits the minor classes.
We propose to incorporate class-dependent temperatures (CDT) training ConvNet.
arXiv Detail & Related papers (2020-01-06T03:52:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.