Classification Imbalance as Transfer Learning
- URL: http://arxiv.org/abs/2601.10630v1
- Date: Thu, 15 Jan 2026 17:49:36 GMT
- Title: Classification Imbalance as Transfer Learning
- Authors: Eric Xia, Jason M. Klusowski,
- Abstract summary: We study a family of oversampling procedures that augment the training data by generating synthetic samples from an estimated minority-class distribution.<n>We show that the excess risk decomposes into the rate achievable under balanced training.
- Score: 11.659383493623565
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Classification imbalance arises when one class is much rarer than the other. We frame this setting as transfer learning under label (prior) shift between an imbalanced source distribution induced by the observed data and a balanced target distribution under which performance is evaluated. Within this framework, we study a family of oversampling procedures that augment the training data by generating synthetic samples from an estimated minority-class distribution to roughly balance the classes, among which the celebrated SMOTE algorithm is a canonical example. We show that the excess risk decomposes into the rate achievable under balanced training (as if the data had been drawn from the balanced target distribution) and an additional term, the cost of transfer, which quantifies the discrepancy between the estimated and true minority-class distributions. In particular, we show that the cost of transfer for SMOTE dominates that of bootstrapping (random oversampling) in moderately high dimensions, suggesting that we should expect bootstrapping to have better performance than SMOTE in general. We corroborate these findings with experimental evidence. More broadly, our results provide guidance for choosing among augmentation strategies for imbalanced classification.
Related papers
- Rebalancing with Calibrated Sub-classes (RCS): A Statistical Fusion-based Framework for Robust Imbalanced Classification across Modalities [16.993547305381327]
Rebalancing with Calibrated Sub-classes (RCS) is a novel distribution calibration framework for robust imbalanced classification.<n>RCS fuses statistical information from the majority and intermediate class distributions via a weighted mixture of Gaussian components.
arXiv Detail & Related papers (2025-10-10T00:06:13Z) - Class-Conditional Distribution Balancing for Group Robust Classification [11.525201208566925]
Spurious correlations that lead models to correct predictions for the wrong reasons pose a critical challenge for robust real-world generalization.<n>We offer a novel perspective by reframing the spurious correlations as imbalances or mismatches in class-conditional distributions.<n>We propose a simple yet effective robust learning method that eliminates the need for both bias annotations and predictions.
arXiv Detail & Related papers (2025-04-24T07:15:53Z) - Class-Balancing Diffusion Models [57.38599989220613]
Class-Balancing Diffusion Models (CBDM) are trained with a distribution adjustment regularizer as a solution.
Our method benchmarked the generation results on CIFAR100/CIFAR100LT dataset and shows outstanding performance on the downstream recognition task.
arXiv Detail & Related papers (2023-04-30T20:00:14Z) - An Embarrassingly Simple Baseline for Imbalanced Semi-Supervised
Learning [103.65758569417702]
Semi-supervised learning (SSL) has shown great promise in leveraging unlabeled data to improve model performance.
We consider a more realistic and challenging setting called imbalanced SSL, where imbalanced class distributions occur in both labeled and unlabeled data.
We study a simple yet overlooked baseline -- SimiS -- which tackles data imbalance by simply supplementing labeled data with pseudo-labels.
arXiv Detail & Related papers (2022-11-20T21:18:41Z) - Adaptive Distribution Calibration for Few-Shot Learning with
Hierarchical Optimal Transport [78.9167477093745]
We propose a novel distribution calibration method by learning the adaptive weight matrix between novel samples and base classes.
Experimental results on standard benchmarks demonstrate that our proposed plug-and-play model outperforms competing approaches.
arXiv Detail & Related papers (2022-10-09T02:32:57Z) - Learnable Distribution Calibration for Few-Shot Class-Incremental
Learning [122.2241120474278]
Few-shot class-incremental learning (FSCIL) faces challenges of memorizing old class distributions and estimating new class distributions given few training samples.
We propose a learnable distribution calibration (LDC) approach, with the aim to systematically solve these two challenges using a unified framework.
arXiv Detail & Related papers (2022-10-01T09:40:26Z) - Learning to Re-weight Examples with Optimal Transport for Imbalanced
Classification [74.62203971625173]
Imbalanced data pose challenges for deep learning based classification models.
One of the most widely-used approaches for tackling imbalanced data is re-weighting.
We propose a novel re-weighting method based on optimal transport (OT) from a distributional point of view.
arXiv Detail & Related papers (2022-08-05T01:23:54Z) - Transfer and Share: Semi-Supervised Learning from Long-Tailed Data [27.88381366842497]
We present the TRAS (TRAnsfer and Share) to effectively utilize long-tailed semi-supervised data.
TRAS transforms the imbalanced pseudo-label distribution of a traditional SSL model.
It then transfers the distribution to a target model such that the minority class will receive significant attention.
arXiv Detail & Related papers (2022-05-26T13:37:59Z) - An Empirical Study on the Joint Impact of Feature Selection and Data
Resampling on Imbalance Classification [4.506770920842088]
This study focuses on the synergy between feature selection and data resampling for imbalance classification.
We conduct a large amount of experiments on 52 publicly available datasets, using 9 feature selection methods, 6 resampling approaches for class imbalance learning, and 3 well-known classification algorithms.
arXiv Detail & Related papers (2021-09-01T06:01:51Z) - M2m: Imbalanced Classification via Major-to-minor Translation [79.09018382489506]
In most real-world scenarios, labeled training datasets are highly class-imbalanced, where deep neural networks suffer from generalizing to a balanced testing criterion.
In this paper, we explore a novel yet simple way to alleviate this issue by augmenting less-frequent classes via translating samples from more-frequent classes.
Our experimental results on a variety of class-imbalanced datasets show that the proposed method improves the generalization on minority classes significantly compared to other existing re-sampling or re-weighting methods.
arXiv Detail & Related papers (2020-04-01T13:21:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.