Related papers: Error Bounds of Supervised Classification from Information-Theoretic Perspective

Error Bounds of Supervised Classification from Information-Theoretic Perspective

URL: http://arxiv.org/abs/2406.04567v2
Date: Thu, 27 Jun 2024 04:47:43 GMT
Title: Error Bounds of Supervised Classification from Information-Theoretic Perspective
Authors: Binchuan Qi, Wei Gong, Li Li,
Abstract summary: We show that errors are bounded by the complexity, influenced by the smoothness of distribution and the sample size, and constitute an upper on the expected risk. Our empirical verification confirms a significant positive correlation between the derived theoretical bounds and the practical expected risk.
Score: 5.281849820329249
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: There remains a list of unanswered research questions on deep learning (DL), including the remarkable generalization power of overparametrized neural networks, the efficient optimization performance despite the non-convexity, and the mechanisms behind flat minima in generalization. In this paper, we adopt an information-theoretic perspective to explore the theoretical foundations of supervised classification using deep neural networks (DNNs). Our analysis introduces the concepts of fitting error and model risk, which, together with generalization error, constitute an upper bound on the expected risk. We demonstrate that the generalization errors are bounded by the complexity, influenced by both the smoothness of distribution and the sample size. Consequently, task complexity serves as a reliable indicator of the dataset's quality, guiding the setting of regularization hyperparameters. Furthermore, the derived upper bound fitting error links the back-propagated gradient, Neural Tangent Kernel (NTK), and the model's parameter count with the fitting error. Utilizing the triangle inequality, we establish an upper bound on the expected risk. This bound offers valuable insights into the effects of overparameterization, non-convex optimization, and the flat minima in DNNs.Finally, empirical verification confirms a significant positive correlation between the derived theoretical bounds and the practical expected risk, confirming the practical relevance of the theoretical findings.

Related papers

Wasserstein Distributionally Robust Nonparametric Regression [9.65010022854885]
This paper studies the generalization properties of Wasserstein distributionally robust nonparametric estimators.<n>We establish non-asymptotic error bounds for the excess local worst-case risk.<n>The robustness of the proposed estimator is evaluated through simulation studies and illustrated with an application to the MNIST dataset.
arXiv Detail & Related papers (2025-05-12T18:07:37Z)
Mitigating optimistic bias in entropic risk estimation and optimization with an application to insurance [5.407319151576265]
The entropic risk measure is widely used to account for tail risks associated with an uncertain loss. To mitigate the bias in the empirical entropic risk estimator, we propose a strongly consistent bootstrapping procedure. We show that our methods suggest a higher (and more accurate) premium to homeowners.
arXiv Detail & Related papers (2024-09-30T04:02:52Z)
Generalization Error of the Tilted Empirical Risk [17.48212403081267]
The generalization error (risk) of a supervised statistical learning algorithm quantifies its prediction ability on previously unseen data. Inspired by exponential tilting, Li et al. (2021) proposed the tilted empirical risk as a non-linear risk metric for machine learning applications.
arXiv Detail & Related papers (2024-09-28T18:31:51Z)
Risk and cross validation in ridge regression with correlated samples [72.59731158970894]
We provide training examples for the in- and out-of-sample risks of ridge regression when the data points have arbitrary correlations. We further extend our analysis to the case where the test point has non-trivial correlations with the training set, setting often encountered in time series forecasting. We validate our theory across a variety of high dimensional data.
arXiv Detail & Related papers (2024-08-08T17:27:29Z)
The Surprising Harmfulness of Benign Overfitting for Adversarial Robustness [13.120373493503772]
We prove a surprising result that even if the ground truth itself is robust to adversarial examples, the benignly overfitted model is benign in terms of the standard'' out-of-sample risk objective. Our finding provides theoretical insights into the puzzling phenomenon observed in practice, where the true target function (e.g., human) is robust against adverasrial attack, while beginly overfitted neural networks lead to models that are not robust.
arXiv Detail & Related papers (2024-01-19T15:40:46Z)
A PAC-Bayesian Perspective on the Interpolating Information Criterion [54.548058449535155]
We show how a PAC-Bayes bound is obtained for a general class of models, characterizing factors which influence performance in the interpolating regime. We quantify how the test error for overparameterized models achieving effectively zero training error depends on the quality of the implicit regularization imposed by e.g. the combination of model, parameter-initialization scheme.
arXiv Detail & Related papers (2023-11-13T01:48:08Z)
On the Importance of Gradient Norm in PAC-Bayesian Bounds [92.82627080794491]
We propose a new generalization bound that exploits the contractivity of the log-Sobolev inequalities. We empirically analyze the effect of this new loss-gradient norm term on different neural architectures.
arXiv Detail & Related papers (2022-10-12T12:49:20Z)
Optimizing the Performative Risk under Weak Convexity Assumptions [0.0]
In performative prediction, a predictive model impacts the distribution that generates future data. Prior work has identified a pair of general conditions on the loss and the mapping from model parameters to distributions that implies convexity the performative risk. In this paper, we relax these assumptions, without sacrificing the amenability of the performative minimization risk problem for iterative optimization methods.
arXiv Detail & Related papers (2022-09-02T01:07:09Z)
Excess risk analysis for epistemic uncertainty with application to variational inference [110.4676591819618]
We present a novel EU analysis in the frequentist setting, where data is generated from an unknown distribution. We show a relation between the generalization ability and the widely used EU measurements, such as the variance and entropy of the predictive distribution. We propose new variational inference that directly controls the prediction and EU evaluation performances based on the PAC-Bayesian theory.
arXiv Detail & Related papers (2022-06-02T12:12:24Z)
Mitigating multiple descents: A model-agnostic framework for risk monotonization [84.6382406922369]
We develop a general framework for risk monotonization based on cross-validation. We propose two data-driven methodologies, namely zero- and one-step, that are akin to bagging and boosting.
arXiv Detail & Related papers (2022-05-25T17:41:40Z)
Empirical Risk Minimization with Relative Entropy Regularization: Optimality and Sensitivity Analysis [7.953455469099826]
The sensitivity of the expected empirical risk to deviations from the solution of the ERM-RER problem is studied. The expectation of the sensitivity is upper bounded, up to a constant factor, by the square root of the lautum information between the models and the datasets.
arXiv Detail & Related papers (2022-02-09T10:55:14Z)
The Interplay Between Implicit Bias and Benign Overfitting in Two-Layer Linear Networks [51.1848572349154]
neural network models that perfectly fit noisy data can generalize well to unseen test data. We consider interpolating two-layer linear neural networks trained with gradient flow on the squared loss and derive bounds on the excess risk.
arXiv Detail & Related papers (2021-08-25T22:01:01Z)
Minimum Excess Risk in Bayesian Learning [23.681494934015927]
We analyze the best achievable performance of Bayesian learning under generative models by defining and upper-bounding the minimum excess risk (MER) The definition of MER provides a principled way to define different notions of uncertainties in Bayesian learning.
arXiv Detail & Related papers (2020-12-29T17:41:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.