Cross-Entropy Loss Functions: Theoretical Analysis and Applications
- URL: http://arxiv.org/abs/2304.07288v2
- Date: Tue, 20 Jun 2023 00:48:23 GMT
- Title: Cross-Entropy Loss Functions: Theoretical Analysis and Applications
- Authors: Anqi Mao, Mehryar Mohri, Yutao Zhong
- Abstract summary: We present a theoretical analysis of a broad family of loss functions, that includes cross-entropy (or logistic loss), generalized cross-entropy, the mean absolute error and other cross-entropy-like loss functions.
We show that these loss functions are beneficial in the adversarial setting by proving that they admit $H$-consistency bounds.
This leads to new adversarial robustness algorithms that consist of minimizing a regularized smooth adversarial comp-sum loss.
- Score: 27.3569897539488
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cross-entropy is a widely used loss function in applications. It coincides
with the logistic loss applied to the outputs of a neural network, when the
softmax is used. But, what guarantees can we rely on when using cross-entropy
as a surrogate loss? We present a theoretical analysis of a broad family of
loss functions, comp-sum losses, that includes cross-entropy (or logistic
loss), generalized cross-entropy, the mean absolute error and other
cross-entropy-like loss functions. We give the first $H$-consistency bounds for
these loss functions. These are non-asymptotic guarantees that upper bound the
zero-one loss estimation error in terms of the estimation error of a surrogate
loss, for the specific hypothesis set $H$ used. We further show that our bounds
are tight. These bounds depend on quantities called minimizability gaps. To
make them more explicit, we give a specific analysis of these gaps for comp-sum
losses. We also introduce a new family of loss functions, smooth adversarial
comp-sum losses, that are derived from their comp-sum counterparts by adding in
a related smooth term. We show that these loss functions are beneficial in the
adversarial setting by proving that they admit $H$-consistency bounds. This
leads to new adversarial robustness algorithms that consist of minimizing a
regularized smooth adversarial comp-sum loss. While our main purpose is a
theoretical analysis, we also present an extensive empirical analysis comparing
comp-sum losses. We further report the results of a series of experiments
demonstrating that our adversarial robustness algorithms outperform the current
state-of-the-art, while also achieving a superior non-adversarial accuracy.
Related papers
- LEARN: An Invex Loss for Outlier Oblivious Robust Online Optimization [56.67706781191521]
An adversary can introduce outliers by corrupting loss functions in an arbitrary number of k, unknown to the learner.
We present a robust online rounds optimization framework, where an adversary can introduce outliers by corrupting loss functions in an arbitrary number of k, unknown.
arXiv Detail & Related papers (2024-08-12T17:08:31Z) - Byzantine-resilient Federated Learning With Adaptivity to Data Heterogeneity [54.145730036889496]
This paper deals with Gradient learning (FL) in the presence of malicious attacks Byzantine data.
A novel Average Algorithm (RAGA) is proposed, which leverages robustness aggregation and can select a dataset.
arXiv Detail & Related papers (2024-03-20T08:15:08Z) - Expressive Losses for Verified Robustness via Convex Combinations [67.54357965665676]
We study the relationship between the over-approximation coefficient and performance profiles across different expressive losses.
We show that, while expressivity is essential, better approximations of the worst-case loss are not necessarily linked to superior robustness-accuracy trade-offs.
arXiv Detail & Related papers (2023-05-23T12:20:29Z) - An Analysis of Loss Functions for Binary Classification and Regression [0.0]
This paper explores connections between margin-based loss functions and consistency in binary classification and regression applications.
A simple characterization for conformable (consistent) loss functions is given, which allows for straightforward comparison of different losses.
A relation between the margin and standardized logistic regression residuals is derived, demonstrating that all margin-based loss can be viewed as loss functions of squared standardized logistic regression residuals.
arXiv Detail & Related papers (2023-01-18T16:26:57Z) - Loss Minimization through the Lens of Outcome Indistinguishability [11.709566373491619]
We present a new perspective on convex loss and the recent notion of Omniprediction.
By design, Loss OI implies omniprediction in a direct and intuitive manner.
We show that Loss OI for the important set of losses arising from Generalized Models, without requiring full multicalibration.
arXiv Detail & Related papers (2022-10-16T22:25:27Z) - $\mathscr{H}$-Consistency Estimation Error of Surrogate Loss Minimizers [38.56401704010528]
We present a detailed study of estimation errors in terms of surrogate loss estimation errors.
We refer to such guarantees as $mathscrH$-consistency estimation error bounds.
arXiv Detail & Related papers (2022-05-16T23:13:36Z) - On Convergence of Training Loss Without Reaching Stationary Points [62.41370821014218]
We show that Neural Network weight variables do not converge to stationary points where the gradient the loss function vanishes.
We propose a new perspective based on ergodic theory dynamical systems.
arXiv Detail & Related papers (2021-10-12T18:12:23Z) - Rethinking and Reweighting the Univariate Losses for Multi-Label
Ranking: Consistency and Generalization [44.73295800450414]
(Partial) ranking loss is a commonly used evaluation measure for multi-label classification.
There is a gap between existing theory and practice -- some pairwise losses can lead to promising performance but lack consistency.
arXiv Detail & Related papers (2021-05-10T09:23:27Z) - Calibration and Consistency of Adversarial Surrogate Losses [46.04004505351902]
Adrialversa robustness is an increasingly critical property of classifiers in applications.
But which surrogate losses should be used and when do they benefit from theoretical guarantees?
We present an extensive study of this question, including a detailed analysis of the H-calibration and H-consistency of adversarial surrogate losses.
arXiv Detail & Related papers (2021-04-19T21:58:52Z) - Approximation Schemes for ReLU Regression [80.33702497406632]
We consider the fundamental problem of ReLU regression.
The goal is to output the best fitting ReLU with respect to square loss given to draws from some unknown distribution.
arXiv Detail & Related papers (2020-05-26T16:26:17Z) - Supervised Learning: No Loss No Cry [51.07683542418145]
Supervised learning requires the specification of a loss function to minimise.
This paper revisits the sc SLIsotron algorithm of Kakade et al. (2011) through a novel lens.
We show how it provides a principled procedure for learning the loss.
arXiv Detail & Related papers (2020-02-10T05:30:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.