Throwing Away Data Improves Worst-Class Error in Imbalanced
Classification
- URL: http://arxiv.org/abs/2205.11672v1
- Date: Mon, 23 May 2022 23:43:18 GMT
- Title: Throwing Away Data Improves Worst-Class Error in Imbalanced
Classification
- Authors: Martin Arjovsky, Kamalika Chaudhuri, David Lopez-Paz
- Abstract summary: Class imbalances pervade classification problems, yet their treatment differs in theory and practice.
We take on the challenge of developing learning theory able to describe the worst-class error of classifiers over linearly-separable data.
- Score: 36.91428748713018
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Class imbalances pervade classification problems, yet their treatment differs
in theory and practice. On the one hand, learning theory instructs us that
\emph{more data is better}, as sample size relates inversely to the average
test error over the entire data distribution. On the other hand, practitioners
have long developed a plethora of tricks to improve the performance of learning
machines over imbalanced data.
These include data reweighting and subsampling, synthetic construction of
additional samples from minority classes, ensembling expensive one-versus all
architectures, and tweaking classification losses and thresholds. All of these
are efforts to minimize the worst-class error, which is often associated to the
minority group in the training data, and finds additional motivation in the
robustness, fairness, and out-of-distribution literatures.
Here we take on the challenge of developing learning theory able to describe
the worst-class error of classifiers over linearly-separable data when fitted
either on (i) the full training set, or (ii) a subset where the majority class
is subsampled to match in size the minority class. We borrow tools from extreme
value theory to show that, under distributions with certain tail properties,
\emph{throwing away most data from the majority class leads to better
worst-class error}.
Related papers
- Exploring Vacant Classes in Label-Skewed Federated Learning [113.65301899666645]
Label skews, characterized by disparities in local label distribution across clients, pose a significant challenge in federated learning.
This paper introduces FedVLS, a novel approach to label-skewed federated learning that integrates vacant-class distillation and logit suppression simultaneously.
arXiv Detail & Related papers (2024-01-04T16:06:31Z) - A Unified Generalization Analysis of Re-Weighting and Logit-Adjustment
for Imbalanced Learning [129.63326990812234]
We propose a technique named data-dependent contraction to capture how modified losses handle different classes.
On top of this technique, a fine-grained generalization bound is established for imbalanced learning, which helps reveal the mystery of re-weighting and logit-adjustment.
arXiv Detail & Related papers (2023-10-07T09:15:08Z) - When Noisy Labels Meet Long Tail Dilemmas: A Representation Calibration
Method [40.25499257944916]
Real-world datasets are both noisily labeled and class-imbalanced.
We propose a representation calibration method RCAL.
We derive theoretical results to discuss the effectiveness of our representation calibration.
arXiv Detail & Related papers (2022-11-20T11:36:48Z) - Imbalanced Classification via Explicit Gradient Learning From Augmented
Data [0.0]
We propose a novel deep meta-learning technique to augment a given imbalanced dataset with new minority instances.
The advantage of the proposed method is demonstrated on synthetic and real-world datasets with various imbalance ratios.
arXiv Detail & Related papers (2022-02-21T22:16:50Z) - Fair-Net: A Network Architecture For Reducing Performance Disparity
Between Identifiable Sub-Populations [0.522145960878624]
We introduce Fair-Net, a multitask neural network architecture that improves both classification accuracy and probability calibration across identifiable sub-populations.
Empirical studies with three real world benchmark datasets demonstrate that Fair-Net improves classification and calibration performance.
arXiv Detail & Related papers (2021-06-01T18:26:08Z) - Theoretical Insights Into Multiclass Classification: A High-dimensional
Asymptotic View [82.80085730891126]
We provide the first modernally precise analysis of linear multiclass classification.
Our analysis reveals that the classification accuracy is highly distribution-dependent.
The insights gained may pave the way for a precise understanding of other classification algorithms.
arXiv Detail & Related papers (2020-11-16T05:17:29Z) - Class-Wise Difficulty-Balanced Loss for Solving Class-Imbalance [6.875312133832079]
We propose a novel loss function named Class-wise Difficulty-Balanced loss.
It dynamically distributes weights to each sample according to the difficulty of the class that the sample belongs to.
The results show that CDB loss consistently outperforms the recently proposed loss functions on class-imbalanced datasets.
arXiv Detail & Related papers (2020-10-05T07:19:19Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Long-Tailed Recognition Using Class-Balanced Experts [128.73438243408393]
We propose an ensemble of class-balanced experts that combines the strength of diverse classifiers.
Our ensemble of class-balanced experts reaches results close to state-of-the-art and an extended ensemble establishes a new state-of-the-art on two benchmarks for long-tailed recognition.
arXiv Detail & Related papers (2020-04-07T20:57:44Z) - Imbalanced Data Learning by Minority Class Augmentation using Capsule
Adversarial Networks [31.073558420480964]
We propose a method to restore the balance in imbalanced images, by coalescing two concurrent methods.
In our model, generative and discriminative networks play a novel competitive game.
The coalescing of capsule-GAN is effective at recognizing highly overlapping classes with much fewer parameters compared with the convolutional-GAN.
arXiv Detail & Related papers (2020-04-05T12:36:06Z) - M2m: Imbalanced Classification via Major-to-minor Translation [79.09018382489506]
In most real-world scenarios, labeled training datasets are highly class-imbalanced, where deep neural networks suffer from generalizing to a balanced testing criterion.
In this paper, we explore a novel yet simple way to alleviate this issue by augmenting less-frequent classes via translating samples from more-frequent classes.
Our experimental results on a variety of class-imbalanced datasets show that the proposed method improves the generalization on minority classes significantly compared to other existing re-sampling or re-weighting methods.
arXiv Detail & Related papers (2020-04-01T13:21:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.