A Theoretical Analysis of the Learning Dynamics under Class Imbalance
- URL: http://arxiv.org/abs/2207.00391v4
- Date: Mon, 19 Feb 2024 10:29:39 GMT
- Title: A Theoretical Analysis of the Learning Dynamics under Class Imbalance
- Authors: Emanuele Francazi, Marco Baity-Jesi, Aurelien Lucchi
- Abstract summary: We show that the learning curves for minority and majority classes follow sub-optimal trajectories when training with a gradient-based trajectory.
This slowdown is related to the imbalance ratio and can be traced back to a competition between the optimization of different classes.
We find that GD is not guaranteed to decrease the loss for each class but that this problem can be addressed by performing a per-class normalization of the gradient.
- Score: 0.10231119246773925
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data imbalance is a common problem in machine learning that can have a
critical effect on the performance of a model. Various solutions exist but
their impact on the convergence of the learning dynamics is not understood.
Here, we elucidate the significant negative impact of data imbalance on
learning, showing that the learning curves for minority and majority classes
follow sub-optimal trajectories when training with a gradient-based optimizer.
This slowdown is related to the imbalance ratio and can be traced back to a
competition between the optimization of different classes. Our main
contribution is the analysis of the convergence of full-batch (GD) and
stochastic gradient descent (SGD), and of variants that renormalize the
contribution of each per-class gradient. We find that GD is not guaranteed to
decrease the loss for each class but that this problem can be addressed by
performing a per-class normalization of the gradient. With SGD, class imbalance
has an additional effect on the direction of the gradients: the minority class
suffers from a higher directional noise, which reduces the effectiveness of the
per-class gradient normalization. Our findings not only allow us to understand
the potential and limitations of strategies involving the per-class gradients,
but also the reason for the effectiveness of previously used solutions for
class imbalance such as oversampling.
Related papers
- Gradient-based Class Weighting for Unsupervised Domain Adaptation in Dense Prediction Visual Tasks [3.776249047528669]
This paper proposes a class-imbalance mitigation strategy that incorporates class-weights into the UDA learning losses.
The novelty of estimating these weights dynamically through the loss gradient defines a Gradient-based class weighting (GBW) learning.
GBW naturally increases the contribution of classes whose learning is hindered by large-represented classes.
arXiv Detail & Related papers (2024-07-01T14:34:25Z) - Gradient-Aware Logit Adjustment Loss for Long-tailed Classifier [30.931850375858573]
In the real-world setting, data often follows a long-tailed distribution, where head classes contain significantly more training samples than tail classes.
We propose the Gradient-Aware Logit Adjustment (GALA) loss, which adjusts the logits based on accumulated gradients to balance the optimization process.
Our approach achieves top-1 accuracy of 48.5%, 41.4%, and 73.3% on popular long-tailed recognition benchmark datasets.
arXiv Detail & Related papers (2024-03-14T02:21:01Z) - Gradient Reweighting: Towards Imbalanced Class-Incremental Learning [8.438092346233054]
Class-Incremental Learning (CIL) trains a model to continually recognize new classes from non-stationary data.
A major challenge of CIL arises when applying to real-world data characterized by non-uniform distribution.
We show that this dual imbalance issue causes skewed gradient updates with biased weights in FC layers, thus inducing over/under-fitting and catastrophic forgetting in CIL.
arXiv Detail & Related papers (2024-02-28T18:08:03Z) - On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics.
The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z) - Class Gradient Projection For Continual Learning [99.105266615448]
Catastrophic forgetting is one of the most critical challenges in Continual Learning (CL)
We propose Class Gradient Projection (CGP), which calculates the gradient subspace from individual classes rather than tasks.
arXiv Detail & Related papers (2023-11-25T02:45:56Z) - Long-Tailed Learning as Multi-Objective Optimization [29.012779934262973]
We argue that the seesaw dilemma is derived from gradient imbalance of different classes.
We propose a Gradient-Balancing Grouping (GBG) strategy to gather the classes with similar gradient directions.
arXiv Detail & Related papers (2023-10-31T14:30:31Z) - A Unified Generalization Analysis of Re-Weighting and Logit-Adjustment
for Imbalanced Learning [129.63326990812234]
We propose a technique named data-dependent contraction to capture how modified losses handle different classes.
On top of this technique, a fine-grained generalization bound is established for imbalanced learning, which helps reveal the mystery of re-weighting and logit-adjustment.
arXiv Detail & Related papers (2023-10-07T09:15:08Z) - The Equalization Losses: Gradient-Driven Training for Long-tailed Object
Recognition [84.51875325962061]
We propose a gradient-driven training mechanism to tackle the long-tail problem.
We introduce a new family of gradient-driven loss functions, namely equalization losses.
Our method consistently outperforms the baseline models.
arXiv Detail & Related papers (2022-10-11T16:00:36Z) - Neural Collapse Inspired Attraction-Repulsion-Balanced Loss for
Imbalanced Learning [97.81549071978789]
We propose Attraction-Repulsion-Balanced Loss (ARB-Loss) to balance the different components of the gradients.
We perform experiments on the large-scale classification and segmentation datasets and our ARB-Loss can achieve state-of-the-art performance.
arXiv Detail & Related papers (2022-04-19T08:23:23Z) - The Break-Even Point on Optimization Trajectories of Deep Neural
Networks [64.7563588124004]
We argue for the existence of the "break-even" point on this trajectory.
We show that using a large learning rate in the initial phase of training reduces the variance of the gradient.
We also show that using a low learning rate results in bad conditioning of the loss surface even for a neural network with batch normalization layers.
arXiv Detail & Related papers (2020-02-21T22:55:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.