Rethinking Class Imbalance in Machine Learning
- URL: http://arxiv.org/abs/2305.03900v1
- Date: Sat, 6 May 2023 02:36:39 GMT
- Title: Rethinking Class Imbalance in Machine Learning
- Authors: Ou Wu
- Abstract summary: Imbalance learning is a subfield of machine learning that focuses on learning tasks in the presence of class imbalance.
This study presents a new taxonomy of class imbalance in machine learning with a broader scope.
We propose a new logit perturbation-based imbalance learning loss when proportion, variance, and distance imbalances exist simultaneously.
- Score: 1.4467794332678536
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Imbalance learning is a subfield of machine learning that focuses on learning
tasks in the presence of class imbalance. Nearly all existing studies refer to
class imbalance as a proportion imbalance, where the proportion of training
samples in each class is not balanced. The ignorance of the proportion
imbalance will result in unfairness between/among classes and poor
generalization capability. Previous literature has presented numerous methods
for either theoretical/empirical analysis or new methods for imbalance
learning. This study presents a new taxonomy of class imbalance in machine
learning with a broader scope. Four other types of imbalance, namely, variance,
distance, neighborhood, and quality imbalances between/among classes, which may
exist in machine learning tasks, are summarized. Two different levels of
imbalance including global and local are also presented. Theoretical analysis
is used to illustrate the significant impact of the new imbalance types on
learning fairness. Moreover, our taxonomy and theoretical conclusions are used
to analyze the shortcomings of several classical methods. As an example, we
propose a new logit perturbation-based imbalance learning loss when proportion,
variance, and distance imbalances exist simultaneously. Several classical
losses become the special case of our proposed method. Meta learning is
utilized to infer the hyper-parameters related to the three types of imbalance.
Experimental results on several benchmark corpora validate the effectiveness of
the proposed method.
Related papers
- Balanced Data, Imbalanced Spectra: Unveiling Class Disparities with Spectral Imbalance [11.924440950433658]
We introduce the concept of spectral imbalance in features as a potential source for class disparities.
We derive exact expressions for the per-class error in a high-dimensional mixture model setting.
We study this phenomenon in 11 different state-of-the-art pretrained encoders.
arXiv Detail & Related papers (2024-02-18T23:59:54Z) - Simplifying Neural Network Training Under Class Imbalance [77.39968702907817]
Real-world datasets are often highly class-imbalanced, which can adversely impact the performance of deep learning models.
The majority of research on training neural networks under class imbalance has focused on specialized loss functions, sampling techniques, or two-stage training procedures.
We demonstrate that simply tuning existing components of standard deep learning pipelines, such as the batch size, data augmentation, and label smoothing, can achieve state-of-the-art performance without any such specialized class imbalance methods.
arXiv Detail & Related papers (2023-12-05T05:52:44Z) - Learning to Re-weight Examples with Optimal Transport for Imbalanced
Classification [74.62203971625173]
Imbalanced data pose challenges for deep learning based classification models.
One of the most widely-used approaches for tackling imbalanced data is re-weighting.
We propose a novel re-weighting method based on optimal transport (OT) from a distributional point of view.
arXiv Detail & Related papers (2022-08-05T01:23:54Z) - Neural Collapse Inspired Attraction-Repulsion-Balanced Loss for
Imbalanced Learning [97.81549071978789]
We propose Attraction-Repulsion-Balanced Loss (ARB-Loss) to balance the different components of the gradients.
We perform experiments on the large-scale classification and segmentation datasets and our ARB-Loss can achieve state-of-the-art performance.
arXiv Detail & Related papers (2022-04-19T08:23:23Z) - Towards Inter-class and Intra-class Imbalance in Class-imbalanced
Learning [24.01370257373491]
Imbalanced Learning (IL) is an important problem that widely exists in data mining applications.
We present Duple-Balanced Ensemble, namely DUBE, a versatile ensemble learning framework.
Unlike prevailing methods, DUBE directly performs inter-class and intra-class balancing without relying on heavy distance-based computation.
arXiv Detail & Related papers (2021-11-24T20:50:54Z) - Fairness-aware Class Imbalanced Learning [57.45784950421179]
We evaluate long-tail learning methods for tweet sentiment and occupation classification.
We extend a margin-loss based approach with methods to enforce fairness.
arXiv Detail & Related papers (2021-09-21T22:16:30Z) - Energy Aligning for Biased Models [39.00256193731365]
Training on class-imbalanced data usually results in biased models that tend to predict samples into the majority classes.
We propose a simple and effective method named Energy Aligning to eliminate the bias.
Experimental results show that energy aligning can effectively alleviate class imbalance issue and outperform state-of-the-art methods on several benchmarks.
arXiv Detail & Related papers (2021-06-07T05:12:26Z) - Few-Shot Learning with Class Imbalance [13.60699610822265]
Few-shot learning aims to train models on a limited number of labeled samples given in a support set in order to generalize to unseen samples from a query set.
In the standard setup, the support set contains an equal amount of data points for each class.
We present a detailed study of few-shot class-imbalance along three axes: meta-dataset vs. task imbalance, effect of different imbalance distributions (linear, step, random), and effect of rebalancing techniques.
arXiv Detail & Related papers (2021-01-07T12:54:32Z) - Counterfactual Representation Learning with Balancing Weights [74.67296491574318]
Key to causal inference with observational data is achieving balance in predictive features associated with each treatment type.
Recent literature has explored representation learning to achieve this goal.
We develop an algorithm for flexible, scalable and accurate estimation of causal effects.
arXiv Detail & Related papers (2020-10-23T19:06:03Z) - Long-Tailed Recognition Using Class-Balanced Experts [128.73438243408393]
We propose an ensemble of class-balanced experts that combines the strength of diverse classifiers.
Our ensemble of class-balanced experts reaches results close to state-of-the-art and an extended ensemble establishes a new state-of-the-art on two benchmarks for long-tailed recognition.
arXiv Detail & Related papers (2020-04-07T20:57:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.