Related papers: A Theoretical and Empirical Taxonomy of Imbalance in Binary Classification

A Theoretical and Empirical Taxonomy of Imbalance in Binary Classification

URL: http://arxiv.org/abs/2601.04149v1
Date: Wed, 07 Jan 2026 18:02:11 GMT
Title: A Theoretical and Empirical Taxonomy of Imbalance in Binary Classification
Authors: Rose Yvette Bandolo Essomba, Ernest Fokoué,
Abstract summary: We show how imbalance shifts the discriminant boundary, yielding a deterioration slope that predicts four regimes: Normal, Mild, Extreme, and Catastrophic.<n>Across parametric and non-parametric models, empirical degradation closely follows theoretical predictions.<n>These results show that the triplet $(,,)$ provides a model-agnostic, geometrically grounded explanation of imbalance-induced deterioration.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Class imbalance significantly degrades classification performance, yet its effects are rarely analyzed from a unified theoretical perspective. We propose a principled framework based on three fundamental scales: the imbalance coefficient $η$, the sample--dimension ratio $κ$, and the intrinsic separability $Δ$. Starting from the Gaussian Bayes classifier, we derive closed-form Bayes errors and show how imbalance shifts the discriminant boundary, yielding a deterioration slope that predicts four regimes: Normal, Mild, Extreme, and Catastrophic. Using a balanced high-dimensional genomic dataset, we vary only $η$ while keeping $κ$ and $Δ$ fixed. Across parametric and non-parametric models, empirical degradation closely follows theoretical predictions: minority Recall collapses once $\log(η)$ exceeds $Δ\sqrtκ$, Precision increases asymmetrically, and F1-score and PR-AUC decline in line with the predicted regimes. These results show that the triplet $(η,κ,Δ)$ provides a model-agnostic, geometrically grounded explanation of imbalance-induced deterioration.

Related papers

Regularized Online RLHF with Generalized Bilinear Preferences [68.44113000390544]
We consider the problem of contextual online RLHF with general preferences.<n>We adopt the Generalized Bilinear Preference Model to capture preferences via low-rank, skew-symmetric matrices.<n>We prove that the dual gap of the greedy policy is bounded by the square of the estimation error.
arXiv Detail & Related papers (2026-02-26T15:27:53Z)
Stability and Generalization of Push-Sum Based Decentralized Optimization over Directed Graphs [55.77845440440496]
Push-based decentralized communication enables optimization over communication networks, where information exchange may be asymmetric.<n>We develop a unified uniform-stability framework for the Gradient Push (SGP) algorithm.<n>A key technical ingredient is an imbalance-aware generalization bound through two quantities.
arXiv Detail & Related papers (2026-02-24T05:32:03Z)
Mutual Information Collapse Explains Disentanglement Failure in $β$-VAEs [4.155522769716163]
$$-VAE is a framework for unsupervised disentanglement.<n> benchmarks such as MIG and SAP typically peak at intermediate $$ and collapse as regularization increases.<n>We introduce the $$-VAE, which decouples regularization pressure from informational collapse.
arXiv Detail & Related papers (2026-02-09T23:38:11Z)
A Relative-Budget Theory for Reinforcement Learning with Verifiable Rewards in Large Language Model Reasoning [48.70183357021465]
Reinforcement learning (RL) is a dominant paradigm for improving the reasoning abilities of large language models.<n>We propose a emphrelative-budget theory explaining this variation through a single quantity called relative budget $:= H/mathbbE[T]$.<n>We show that $$ determines sample efficiency by controlling reward variance and the likelihood of informative trajectories.
arXiv Detail & Related papers (2026-02-02T01:31:52Z)
Phases of the $q$-deformed $\mathrm{SU}(N)$ Yang-Mills theory at large $N$ [0.0]
Yang-Mills theory is characterized by three parameters: the number of colors $N$, the coupling constant $g$, and the level $k$.<n>By treating these as tunable parameters, we explore how key properties of the theory, such as confinement and topological order, emerge in different regimes.
arXiv Detail & Related papers (2026-01-07T12:06:40Z)
Skewness-Robust Causal Discovery in Location-Scale Noise Models [47.09233752567902]
We propose SkewD, a likelihood-based algorithm for causal discovery under location-scale noise models.<n>SkewD extends the usual normal-distribution framework to the skew-normal setting, enabling reliable inference under symmetric and skewed noise.<n>We evaluate SkewD on novel synthetically generated datasets with skewed noise as well as established benchmark datasets.
arXiv Detail & Related papers (2025-11-18T12:40:41Z)
Understanding Robust Machine Learning for Nonparametric Regression with Heavy-Tailed Noise [10.844819221753042]
We use Huber regression as a close-up example within Tikhonov-regularized risk minimization.<n>We address two central challenges: (i) the breakdown of standard concentration tools under weak moment assumptions, and (ii) the analytical difficulties introduced by unbounded hypothesis spaces.<n>Our study delivers principled rules, extends beyond Huber to other robust losses, and highlights prediction error, not excess risk, as the fundamental lens for analyzing robust learning.
arXiv Detail & Related papers (2025-10-10T21:57:18Z)
Computational-Statistical Tradeoffs at the Next-Token Prediction Barrier: Autoregressive and Imitation Learning under Misspecification [50.717692060500696]
Next-token prediction with the logarithmic loss is a cornerstone of autoregressive sequence modeling.<n>Next-token prediction can be made robust so as to achieve $C=tilde O(H)$, representing moderate error amplification.<n>No computationally efficient algorithm can achieve sub-polynomial approximation factor $C=e(log H)1-Omega(1)$.
arXiv Detail & Related papers (2025-02-18T02:52:00Z)
Scaling of Stochastic Normalizing Flows in $\mathrm{SU}(3)$ lattice gauge theory [44.99833362998488]
Non-equilibrium Markov Chain Monte Carlo simulations provide a well-understood framework based on Jarzynski's equality to sample from a target probability distribution.<n>Out-of-equilibrium evolutions share the same framework of flow-based approaches and they can be naturally combined into a novel architecture called Normalizing Flows (SNFs)<n>We present the first implementation of SNFs for $mathrmSU(3)$ lattice gauge theory in 4 dimensions, defined by introducing gauge-equivariant layers between out-of-equilibrium Monte Carlo updates.
arXiv Detail & Related papers (2024-11-29T19:01:05Z)
Mind the Gap: A Causal Perspective on Bias Amplification in Prediction & Decision-Making [58.06306331390586]
We introduce the notion of a margin complement, which measures how much a prediction score $S$ changes due to a thresholding operation. We show that under suitable causal assumptions, the influences of $X$ on the prediction score $S$ are equal to the influences of $X$ on the true outcome $Y$.
arXiv Detail & Related papers (2024-05-24T11:22:19Z)
CARD: Classification and Regression Diffusion Models [51.0421331214229]
We introduce classification and regression diffusion (CARD) models, which combine a conditional generative model and a pre-trained conditional mean estimator. We demonstrate the outstanding ability of CARD in conditional distribution prediction with both toy examples and real-world datasets.
arXiv Detail & Related papers (2022-06-15T03:30:38Z)
Observable adjustments in single-index models for regularized M-estimators [3.5353632767823506]
In regime where sample size $n$ and dimension $p$ are both increasing, the behavior of the empirical distribution of $hatbeta$ and the predicted values $Xhatbeta$ has been previously characterized. This paper develops a different theory to describe the empirical distribution of $hatbeta$ and $Xhatbeta$.
arXiv Detail & Related papers (2022-04-14T14:32:02Z)
Single Trajectory Nonparametric Learning of Nonlinear Dynamics [8.438421942654292]
Given a single trajectory of a dynamical system, we analyze the performance of the nonparametric least squares estimator (LSE) We leverage recently developed information-theoretic methods to establish the optimality of the LSE for non hypotheses classes. We specialize our results to a number of scenarios of practical interest, such as Lipschitz dynamics, generalized linear models, and dynamics described by functions in certain classes of Reproducing Kernel Hilbert Spaces (RKHS)
arXiv Detail & Related papers (2022-02-16T19:38:54Z)
Analytic Characterization of the Hessian in Shallow ReLU Models: A Tale of Symmetry [9.695960412426672]
We analytically characterize the Hessian at various families of spurious minima. In particular, we prove that for $dge k$ standard Gaussian inputs: (a) of the $dk$ eigenvalues of the Hessian, $dk - O(d)$ concentrate near zero, (b) $Omega(d)$ of the eigenvalues grow linearly with $k$.
arXiv Detail & Related papers (2020-08-04T20:08:35Z)
A Precise High-Dimensional Asymptotic Theory for Boosting and Minimum-$\ell_1$-Norm Interpolated Classifiers [3.167685495996986]
This paper establishes a precise high-dimensional theory for boosting on separable data. Under a class of statistical models, we provide an exact analysis of the universality error of boosting. We also explicitly pin down the relation between the boosting test error and the optimal Bayes error.
arXiv Detail & Related papers (2020-02-05T00:24:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.