Related papers: Improved Balanced Classification with Theoretically Grounded Loss Functions

Improved Balanced Classification with Theoretically Grounded Loss Functions

URL: http://arxiv.org/abs/2512.23947v1
Date: Tue, 30 Dec 2025 02:34:02 GMT
Title: Improved Balanced Classification with Theoretically Grounded Loss Functions
Authors: Corinna Cortes, Mehryar Mohri, Yutao Zhong,
Abstract summary: Generalized Logit-Adjusted (GLA) loss functions and Generalized Class-Aware weighted (GCA) losses are studied.<n>We show that GLA losses are Bayes-consistent, but only $H$-consistent for complete (i.e., unbounded) hypothesis sets.<n>GCA losses are $H$-consistent for any hypothesis set that is bounded or complete, with $H$-consistency bounds that scale more favorably as $1/sqrtmathsf p_min$.
Score: 41.69461814486466
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The balanced loss is a widely adopted objective for multi-class classification under class imbalance. By assigning equal importance to all classes, regardless of their frequency, it promotes fairness and ensures that minority classes are not overlooked. However, directly minimizing the balanced classification loss is typically intractable, which makes the design of effective surrogate losses a central question. This paper introduces and studies two advanced surrogate loss families: Generalized Logit-Adjusted (GLA) loss functions and Generalized Class-Aware weighted (GCA) losses. GLA losses generalize Logit-Adjusted losses, which shift logits based on class priors, to the broader general cross-entropy loss family. GCA loss functions extend the standard class-weighted losses, which scale losses inversely by class frequency, by incorporating class-dependent confidence margins and extending them to the general cross-entropy family. We present a comprehensive theoretical analysis of consistency for both loss families. We show that GLA losses are Bayes-consistent, but only $H$-consistent for complete (i.e., unbounded) hypothesis sets. Moreover, their $H$-consistency bounds depend inversely on the minimum class probability, scaling at least as $1/\mathsf p_{\min}$. In contrast, GCA losses are $H$-consistent for any hypothesis set that is bounded or complete, with $H$-consistency bounds that scale more favorably as $1/\sqrt{\mathsf p_{\min}}$, offering significantly stronger theoretical guarantees in imbalanced settings. We report the results of experiments demonstrating that, empirically, both the GCA losses with calibrated class-dependent confidence margins and GLA losses can greatly outperform straightforward class-weighted losses as well as the LA losses. GLA generally performs slightly better in common benchmarks, whereas GCA exhibits a slight edge in highly imbalanced settings.

Related papers

Reducing Class-Wise Performance Disparity via Margin Regularization [82.81746960548382]
Deep neural networks often exhibit substantial disparities in class-wise accuracy, even when trained on class-balanced data.<n>We present Margin Regularization for Performance Disparity Reduction (MR$2$), a theoretically principled regularization for classification.<n>Our analysis reveals how per-class feature variability contributes to error, motivating the use of larger margins for hard classes.
arXiv Detail & Related papers (2026-01-30T12:56:08Z)
Fundamental Novel Consistency Theory: $H$-Consistency Bounds [19.493449206135296]
In machine learning, the loss functions optimized during training often differ from the target loss that defines task performance.<n>We present an in-depth study of the target loss estimation error relative to the surrogate loss estimation error.<n>Our analysis leads to $H$-consistency bounds, which are guarantees accounting for the hypothesis set $H$.
arXiv Detail & Related papers (2025-12-28T11:02:20Z)
Of Dice and Games: A Theory of Generalized Boosting [61.752303337418475]
We extend the celebrated theory of boosting to incorporate both cost-sensitive and multi-objective losses.<n>We develop a comprehensive theory of cost-sensitive and multi-objective boosting, providing a taxonomy of weak learning guarantees.<n>Our characterization relies on a geometric interpretation of boosting, revealing a surprising equivalence between cost-sensitive and multi-objective losses.
arXiv Detail & Related papers (2024-12-11T01:38:32Z)
A Universal Growth Rate for Learning with Smooth Surrogate Losses [30.389055604165222]
We prove a square-root growth rate near zero for smooth margin-based surrogate losses in binary classification. We extend this analysis to multi-class classification with a series of novel results.
arXiv Detail & Related papers (2024-05-09T17:59:55Z)
Cross-Entropy Loss Functions: Theoretical Analysis and Applications [27.3569897539488]
We present a theoretical analysis of a broad family of loss functions, that includes cross-entropy (or logistic loss), generalized cross-entropy, the mean absolute error and other cross-entropy-like loss functions. We show that these loss functions are beneficial in the adversarial setting by proving that they admit $H$-consistency bounds. This leads to new adversarial robustness algorithms that consist of minimizing a regularized smooth adversarial comp-sum loss.
arXiv Detail & Related papers (2023-04-14T17:58:23Z)
Label Distributionally Robust Losses for Multi-class Classification: Consistency, Robustness and Adaptivity [55.29408396918968]
We study a family of loss functions named label-distributionally robust (LDR) losses for multi-class classification. Our contributions include both consistency and robustness by establishing top-$k$ consistency of LDR losses for multi-class classification. We propose a new adaptive LDR loss that automatically adapts the individualized temperature parameter to the noise degree of class label of each instance.
arXiv Detail & Related papers (2021-12-30T00:27:30Z)
Distribution of Classification Margins: Are All Data Equal? [61.16681488656473]
We motivate theoretically and show empirically that the area under the curve of the margin distribution on the training set is in fact a good measure of generalization. The resulting subset of "high capacity" features is not consistent across different training runs.
arXiv Detail & Related papers (2021-07-21T16:41:57Z)
Striking the Right Balance: Recall Loss for Semantic Segmentation [24.047359482606307]
Class imbalance is a fundamental problem in computer vision applications such as semantic segmentation. We propose a hard-class mining loss by reshaping the vanilla cross entropy loss. We show that the novel recall loss changes gradually between the standard cross entropy loss and the inverse frequency weighted loss.
arXiv Detail & Related papers (2021-06-28T18:02:03Z)
Rethinking and Reweighting the Univariate Losses for Multi-Label Ranking: Consistency and Generalization [44.73295800450414]
(Partial) ranking loss is a commonly used evaluation measure for multi-label classification. There is a gap between existing theory and practice -- some pairwise losses can lead to promising performance but lack consistency.
arXiv Detail & Related papers (2021-05-10T09:23:27Z)
Lower-bounded proper losses for weakly supervised classification [73.974163801142]
We discuss the problem of weakly supervised learning of classification, in which instances are given weak labels. We derive a representation theorem for proper losses in supervised learning, which dualizes the Savage representation. We experimentally demonstrate the effectiveness of our proposed approach, as compared to improper or unbounded losses.
arXiv Detail & Related papers (2021-03-04T08:47:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.