Related papers: Throwing Away Data Improves Worst-Class Error in Imbalanced Classification

Throwing Away Data Improves Worst-Class Error in Imbalanced Classification

URL: http://arxiv.org/abs/2205.11672v1
Date: Mon, 23 May 2022 23:43:18 GMT
Title: Throwing Away Data Improves Worst-Class Error in Imbalanced Classification
Authors: Martin Arjovsky, Kamalika Chaudhuri, David Lopez-Paz
Abstract summary: Class imbalances pervade classification problems, yet their treatment differs in theory and practice. We take on the challenge of developing learning theory able to describe the worst-class error of classifiers over linearly-separable data.
Score: 36.91428748713018
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Class imbalances pervade classification problems, yet their treatment differs in theory and practice. On the one hand, learning theory instructs us that \emph{more data is better}, as sample size relates inversely to the average test error over the entire data distribution. On the other hand, practitioners have long developed a plethora of tricks to improve the performance of learning machines over imbalanced data. These include data reweighting and subsampling, synthetic construction of additional samples from minority classes, ensembling expensive one-versus all architectures, and tweaking classification losses and thresholds. All of these are efforts to minimize the worst-class error, which is often associated to the minority group in the training data, and finds additional motivation in the robustness, fairness, and out-of-distribution literatures. Here we take on the challenge of developing learning theory able to describe the worst-class error of classifiers over linearly-separable data when fitted either on (i) the full training set, or (ii) a subset where the majority class is subsampled to match in size the minority class. We borrow tools from extreme value theory to show that, under distributions with certain tail properties, \emph{throwing away most data from the majority class leads to better worst-class error}.

Related papers

Synthetic Tabular Data Generation for Imbalanced Classification: The Surprising Effectiveness of an Overlap Class [20.606333546028516]
We show that state-of-the-art deep generative models yield significantly lower-quality minority examples than majority examples. We propose a novel technique of converting the binary class labels to ternary class labels by introducing a class for the region where minority and majority distributions overlap.
arXiv Detail & Related papers (2024-12-20T08:15:20Z)
Exploring Vacant Classes in Label-Skewed Federated Learning [113.65301899666645]
Label skews, characterized by disparities in local label distribution across clients, pose a significant challenge in federated learning. This paper introduces FedVLS, a novel approach to label-skewed federated learning that integrates vacant-class distillation and logit suppression simultaneously.
arXiv Detail & Related papers (2024-01-04T16:06:31Z)
A Unified Generalization Analysis of Re-Weighting and Logit-Adjustment for Imbalanced Learning [129.63326990812234]
We propose a technique named data-dependent contraction to capture how modified losses handle different classes. On top of this technique, a fine-grained generalization bound is established for imbalanced learning, which helps reveal the mystery of re-weighting and logit-adjustment.
arXiv Detail & Related papers (2023-10-07T09:15:08Z)
When Noisy Labels Meet Long Tail Dilemmas: A Representation Calibration Method [40.25499257944916]
Real-world datasets are both noisily labeled and class-imbalanced. We propose a representation calibration method RCAL. We derive theoretical results to discuss the effectiveness of our representation calibration.
arXiv Detail & Related papers (2022-11-20T11:36:48Z)
Imbalanced Classification via Explicit Gradient Learning From Augmented Data [0.0]
We propose a novel deep meta-learning technique to augment a given imbalanced dataset with new minority instances. The advantage of the proposed method is demonstrated on synthetic and real-world datasets with various imbalance ratios.
arXiv Detail & Related papers (2022-02-21T22:16:50Z)
Fair-Net: A Network Architecture For Reducing Performance Disparity Between Identifiable Sub-Populations [0.522145960878624]
We introduce Fair-Net, a multitask neural network architecture that improves both classification accuracy and probability calibration across identifiable sub-populations. Empirical studies with three real world benchmark datasets demonstrate that Fair-Net improves classification and calibration performance.
arXiv Detail & Related papers (2021-06-01T18:26:08Z)
Theoretical Insights Into Multiclass Classification: A High-dimensional Asymptotic View [82.80085730891126]
We provide the first modernally precise analysis of linear multiclass classification. Our analysis reveals that the classification accuracy is highly distribution-dependent. The insights gained may pave the way for a precise understanding of other classification algorithms.
arXiv Detail & Related papers (2020-11-16T05:17:29Z)
Class-Wise Difficulty-Balanced Loss for Solving Class-Imbalance [6.875312133832079]
We propose a novel loss function named Class-wise Difficulty-Balanced loss. It dynamically distributes weights to each sample according to the difficulty of the class that the sample belongs to. The results show that CDB loss consistently outperforms the recently proposed loss functions on class-imbalanced datasets.
arXiv Detail & Related papers (2020-10-05T07:19:19Z)
Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers. We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model. Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)
Long-Tailed Recognition Using Class-Balanced Experts [128.73438243408393]
We propose an ensemble of class-balanced experts that combines the strength of diverse classifiers. Our ensemble of class-balanced experts reaches results close to state-of-the-art and an extended ensemble establishes a new state-of-the-art on two benchmarks for long-tailed recognition.
arXiv Detail & Related papers (2020-04-07T20:57:44Z)
Imbalanced Data Learning by Minority Class Augmentation using Capsule Adversarial Networks [31.073558420480964]
We propose a method to restore the balance in imbalanced images, by coalescing two concurrent methods. In our model, generative and discriminative networks play a novel competitive game. The coalescing of capsule-GAN is effective at recognizing highly overlapping classes with much fewer parameters compared with the convolutional-GAN.
arXiv Detail & Related papers (2020-04-05T12:36:06Z)
M2m: Imbalanced Classification via Major-to-minor Translation [79.09018382489506]
In most real-world scenarios, labeled training datasets are highly class-imbalanced, where deep neural networks suffer from generalizing to a balanced testing criterion. In this paper, we explore a novel yet simple way to alleviate this issue by augmenting less-frequent classes via translating samples from more-frequent classes. Our experimental results on a variety of class-imbalanced datasets show that the proposed method improves the generalization on minority classes significantly compared to other existing re-sampling or re-weighting methods.
arXiv Detail & Related papers (2020-04-01T13:21:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.