Scale-invariant Bayesian Neural Networks with Connectivity Tangent
Kernel
- URL: http://arxiv.org/abs/2209.15208v1
- Date: Fri, 30 Sep 2022 03:31:13 GMT
- Title: Scale-invariant Bayesian Neural Networks with Connectivity Tangent
Kernel
- Authors: SungYub Kim, Sihwan Park, Kyungsu Kim, Eunho Yang
- Abstract summary: We show that flatness and generalization bounds can be changed arbitrarily according to the scale of a parameter.
We propose new prior and posterior distributions invariant to scaling transformations by textitdecomposing the scale and connectivity of parameters.
We empirically demonstrate our posterior provides effective flatness and calibration measures with low complexity.
- Score: 30.088226334627375
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Explaining generalizations and preventing over-confident predictions are
central goals of studies on the loss landscape of neural networks. Flatness,
defined as loss invariability on perturbations of a pre-trained solution, is
widely accepted as a predictor of generalization in this context. However, the
problem that flatness and generalization bounds can be changed arbitrarily
according to the scale of a parameter was pointed out, and previous studies
partially solved the problem with restrictions: Counter-intuitively, their
generalization bounds were still variant for the function-preserving parameter
scaling transformation or limited only to an impractical network structure. As
a more fundamental solution, we propose new prior and posterior distributions
invariant to scaling transformations by \textit{decomposing} the scale and
connectivity of parameters, thereby allowing the resulting generalization bound
to describe the generalizability of a broad class of networks with the more
practical class of transformations such as weight decay with batch
normalization. We also show that the above issue adversely affects the
uncertainty calibration of Laplace approximation and propose a solution using
our invariant posterior. We empirically demonstrate our posterior provides
effective flatness and calibration measures with low complexity in such a
practical parameter transformation case, supporting its practical effectiveness
in line with our rationale.
Related papers
- Error Feedback under $(L_0,L_1)$-Smoothness: Normalization and Momentum [56.37522020675243]
We provide the first proof of convergence for normalized error feedback algorithms across a wide range of machine learning problems.
We show that due to their larger allowable stepsizes, our new normalized error feedback algorithms outperform their non-normalized counterparts on various tasks.
arXiv Detail & Related papers (2024-10-22T10:19:27Z) - Reparameterization invariance in approximate Bayesian inference [32.88960624085645]
We develop a new geometric view of reparametrizations from which we explain the success of linearization.
We demonstrate that these re parameterization invariance properties can be extended to the original neural network predictive.
arXiv Detail & Related papers (2024-06-05T14:49:15Z) - Generalized Laplace Approximation [23.185126261153236]
We introduce a unified theoretical framework to attribute Bayesian inconsistency to model misspecification and inadequate priors.
We propose the generalized Laplace approximation, which involves a simple adjustment to the Hessian matrix of the regularized loss function.
We assess the performance and properties of the generalized Laplace approximation on state-of-the-art neural networks and real-world datasets.
arXiv Detail & Related papers (2024-05-22T11:11:42Z) - Scaling and renormalization in high-dimensional regression [72.59731158970894]
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models.
We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning.
arXiv Detail & Related papers (2024-05-01T15:59:00Z) - Function-Space Regularization in Neural Networks: A Probabilistic
Perspective [51.133793272222874]
We show that we can derive a well-motivated regularization technique that allows explicitly encoding information about desired predictive functions into neural network training.
We evaluate the utility of this regularization technique empirically and demonstrate that the proposed method leads to near-perfect semantic shift detection and highly-calibrated predictive uncertainty estimates.
arXiv Detail & Related papers (2023-12-28T17:50:56Z) - A Lifted Bregman Formulation for the Inversion of Deep Neural Networks [28.03724379169264]
We propose a novel framework for the regularised inversion of deep neural networks.
The framework lifts the parameter space into a higher dimensional space by introducing auxiliary variables.
We present theoretical results and support their practical application with numerical examples.
arXiv Detail & Related papers (2023-03-01T20:30:22Z) - Instance-Dependent Generalization Bounds via Optimal Transport [51.71650746285469]
Existing generalization bounds fail to explain crucial factors that drive the generalization of modern neural networks.
We derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the learned prediction function in the data space.
We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
arXiv Detail & Related papers (2022-11-02T16:39:42Z) - Provably tuning the ElasticNet across instances [53.0518090093538]
We consider the problem of tuning the regularization parameters of Ridge regression, LASSO, and the ElasticNet across multiple problem instances.
Our results are the first general learning-theoretic guarantees for this important class of problems.
arXiv Detail & Related papers (2022-07-20T21:22:40Z) - On the Explicit Role of Initialization on the Convergence and Implicit
Bias of Overparametrized Linear Networks [1.0323063834827415]
We present a novel analysis of single-hidden-layer linear networks trained under gradient flow.
We show that the squared loss converges exponentially to its optimum.
We derive a novel non-asymptotic upper-bound on the distance between the trained network and the min-norm solution.
arXiv Detail & Related papers (2021-05-13T15:13:51Z) - Convex Geometry and Duality of Over-parameterized Neural Networks [70.15611146583068]
We develop a convex analytic approach to analyze finite width two-layer ReLU networks.
We show that an optimal solution to the regularized training problem can be characterized as extreme points of a convex set.
In higher dimensions, we show that the training problem can be cast as a finite dimensional convex problem with infinitely many constraints.
arXiv Detail & Related papers (2020-02-25T23:05:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.