On the Importance of Gradient Norm in PAC-Bayesian Bounds
- URL: http://arxiv.org/abs/2210.06143v1
- Date: Wed, 12 Oct 2022 12:49:20 GMT
- Title: On the Importance of Gradient Norm in PAC-Bayesian Bounds
- Authors: Itai Gat, Yossi Adi, Alexander Schwing, Tamir Hazan
- Abstract summary: We propose a new generalization bound that exploits the contractivity of the log-Sobolev inequalities.
We empirically analyze the effect of this new loss-gradient norm term on different neural architectures.
- Score: 92.82627080794491
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Generalization bounds which assess the difference between the true risk and
the empirical risk, have been studied extensively. However, to obtain bounds,
current techniques use strict assumptions such as a uniformly bounded or a
Lipschitz loss function. To avoid these assumptions, in this paper, we follow
an alternative approach: we relax uniform bounds assumptions by using
on-average bounded loss and on-average bounded gradient norm assumptions.
Following this relaxation, we propose a new generalization bound that exploits
the contractivity of the log-Sobolev inequalities. These inequalities add an
additional loss-gradient norm term to the generalization bound, which is
intuitively a surrogate of the model complexity. We apply the proposed bound on
Bayesian deep nets and empirically analyze the effect of this new loss-gradient
norm term on different neural architectures.
Related papers
- Error Bounds of Supervised Classification from Information-Theoretic Perspective [0.0]
We explore bounds on the expected risk when using deep neural networks for supervised classification from an information theoretic perspective.
We introduce model risk and fitting error, which are derived from further decomposing the empirical risk.
arXiv Detail & Related papers (2024-06-07T01:07:35Z) - Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning.
We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values.
We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z) - On Regularization and Inference with Label Constraints [62.60903248392479]
We compare two strategies for encoding label constraints in a machine learning pipeline, regularization with constraints and constrained inference.
For regularization, we show that it narrows the generalization gap by precluding models that are inconsistent with the constraints.
For constrained inference, we show that it reduces the population risk by correcting a model's violation, and hence turns the violation into an advantage.
arXiv Detail & Related papers (2023-07-08T03:39:22Z) - A Non-Asymptotic Moreau Envelope Theory for High-Dimensional Generalized
Linear Models [33.36787620121057]
We prove a new generalization bound that shows for any class of linear predictors in Gaussian space.
We use our finite-sample bound to directly recover the "optimistic rate" of Zhou et al. (2021)
We show that application of our bound generalization using localized Gaussian width will generally be sharp for empirical risk minimizers.
arXiv Detail & Related papers (2022-10-21T16:16:55Z) - Optimal variance-reduced stochastic approximation in Banach spaces [114.8734960258221]
We study the problem of estimating the fixed point of a contractive operator defined on a separable Banach space.
We establish non-asymptotic bounds for both the operator defect and the estimation error.
arXiv Detail & Related papers (2022-01-21T02:46:57Z) - Dimension Free Generalization Bounds for Non Linear Metric Learning [61.193693608166114]
We provide uniform generalization bounds for two regimes -- the sparse regime, and a non-sparse regime.
We show that by relying on a different, new property of the solutions, it is still possible to provide dimension free generalization guarantees.
arXiv Detail & Related papers (2021-02-07T14:47:00Z) - Failures of model-dependent generalization bounds for least-norm
interpolation [39.97534972432276]
We consider bounds on the generalization performance of the least-norm linear regressor.
For a variety of natural joint distributions on training examples, any valid generalization bound must sometimes be very loose.
arXiv Detail & Related papers (2020-10-16T16:30:05Z) - Relative Deviation Margin Bounds [55.22251993239944]
We give two types of learning bounds, both distribution-dependent and valid for general families, in terms of the Rademacher complexity.
We derive distribution-dependent generalization bounds for unbounded loss functions under the assumption of a finite moment.
arXiv Detail & Related papers (2020-06-26T12:37:17Z) - On the generalization of bayesian deep nets for multi-class
classification [27.39403411896995]
We propose a new generalization bound for Bayesian deep nets by exploiting the contractivity of the Log-Sobolev inequalities.
Using these inequalities adds an additional loss-gradient norm term to the generalization bound, which is intuitively a surrogate of the model complexity.
arXiv Detail & Related papers (2020-02-23T09:05:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.