Addressing the Inconsistency in Bayesian Deep Learning via Generalized Laplace Approximation
- URL: http://arxiv.org/abs/2405.13535v5
- Date: Mon, 22 Sep 2025 15:14:35 GMT
- Title: Addressing the Inconsistency in Bayesian Deep Learning via Generalized Laplace Approximation
- Authors: Yinsong Chen, Samson S. Yu, Zhong Li, Chee Peng Lim,
- Abstract summary: We introduce the generalized Laplace approximation, which requires only a simple modification to the Hessian calculation of the regularized loss.<n>We evaluate the proposed method on state-of-the-art neural networks and real-world datasets, demonstrating that the generalized Laplace approximation enhances predictive performance.
- Score: 17.224114053715528
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, inconsistency in Bayesian deep learning has attracted significant attention. Tempered or generalized posterior distributions are frequently employed as direct and effective solutions. Nonetheless, the underlying mechanisms and the effectiveness of generalized posteriors remain active research topics. In this work, we interpret posterior tempering as a correction for model misspecification via adjustments to the joint probability, and as a recalibration of priors by reducing aleatoric uncertainty. We also introduce the generalized Laplace approximation, which requires only a simple modification to the Hessian calculation of the regularized loss and provides a flexible and scalable framework for high-quality posterior inference. We evaluate the proposed method on state-of-the-art neural networks and real-world datasets, demonstrating that the generalized Laplace approximation enhances predictive performance.
Related papers
- Variational Deep Learning via Implicit Regularization [11.296548737163599]
Modern deep learning models generalize remarkably well in-distribution, despite being overparametrized and trained with little to no explicit regularization.<n>We propose to regularize variational neural networks solely by relying on the implicit bias of (stochastic) gradient descent.
arXiv Detail & Related papers (2025-05-26T17:15:57Z) - In-Context Parametric Inference: Point or Distribution Estimators? [66.22308335324239]
We show that amortized point estimators generally outperform posterior inference, though the latter remain competitive in some low-dimensional problems.<n>Our experiments indicate that amortized point estimators generally outperform posterior inference, though the latter remain competitive in some low-dimensional problems.
arXiv Detail & Related papers (2025-02-17T10:00:24Z) - Error Feedback under $(L_0,L_1)$-Smoothness: Normalization and Momentum [56.37522020675243]
We provide the first proof of convergence for normalized error feedback algorithms across a wide range of machine learning problems.
We show that due to their larger allowable stepsizes, our new normalized error feedback algorithms outperform their non-normalized counterparts on various tasks.
arXiv Detail & Related papers (2024-10-22T10:19:27Z) - Predictive variational inference: Learn the predictively optimal posterior distribution [1.7648680700685022]
Vanilla variational inference finds an optimal approximation to the Bayesian posterior distribution, but even the exact Bayesian posterior is often not meaningful under model misspecification.
We propose predictive variational inference (PVI): a general inference framework that seeks and samples from an optimal posterior density.
This framework applies to both likelihood-exact and likelihood-free models.
arXiv Detail & Related papers (2024-10-18T19:44:57Z) - Reparameterization invariance in approximate Bayesian inference [32.88960624085645]
We develop a new geometric view of reparametrizations from which we explain the success of linearization.<n>We demonstrate that these re parameterization invariance properties can be extended to the original neural network predictive.
arXiv Detail & Related papers (2024-06-05T14:49:15Z) - Simulation-based Inference with the Generalized Kullback-Leibler
Divergence [8.868822699365618]
The goal is to solve the inverse problem when the likelihood is only known implicitly.
We investigate a hybrid model that offers the best of both worlds by learning a normalized base distribution and a learned ratio.
arXiv Detail & Related papers (2023-10-03T05:42:53Z) - Riemannian Laplace approximations for Bayesian neural networks [3.6990978741464904]
We propose a simple parametric approximate posterior that adapts to the shape of the true posterior.
We show that our approach consistently improves over the conventional Laplace approximation across tasks.
arXiv Detail & Related papers (2023-06-12T14:44:22Z) - Bayesian Renormalization [68.8204255655161]
We present a fully information theoretic approach to renormalization inspired by Bayesian statistical inference.
The main insight of Bayesian Renormalization is that the Fisher metric defines a correlation length that plays the role of an emergent RG scale.
We provide insight into how the Bayesian Renormalization scheme relates to existing methods for data compression and data generation.
arXiv Detail & Related papers (2023-05-17T18:00:28Z) - Variational Laplace Autoencoders [53.08170674326728]
Variational autoencoders employ an amortized inference model to approximate the posterior of latent variables.
We present a novel approach that addresses the limited posterior expressiveness of fully-factorized Gaussian assumption.
We also present a general framework named Variational Laplace Autoencoders (VLAEs) for training deep generative models.
arXiv Detail & Related papers (2022-11-30T18:59:27Z) - Instance-Dependent Generalization Bounds via Optimal Transport [51.71650746285469]
Existing generalization bounds fail to explain crucial factors that drive the generalization of modern neural networks.
We derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the learned prediction function in the data space.
We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
arXiv Detail & Related papers (2022-11-02T16:39:42Z) - Scale-invariant Bayesian Neural Networks with Connectivity Tangent
Kernel [30.088226334627375]
We show that flatness and generalization bounds can be changed arbitrarily according to the scale of a parameter.
We propose new prior and posterior distributions invariant to scaling transformations by textitdecomposing the scale and connectivity of parameters.
We empirically demonstrate our posterior provides effective flatness and calibration measures with low complexity.
arXiv Detail & Related papers (2022-09-30T03:31:13Z) - Generalised Bayesian Inference for Discrete Intractable Likelihood [9.331721990371769]
This paper develops a novel generalised Bayesian inference procedure suitable for discrete intractable likelihood.
Inspired by recent methodological advances for continuous data, the main idea is to update beliefs about model parameters using a discrete Fisher divergence.
The result is a generalised posterior that can be sampled from using standard computational tools, such as Markov Monte Carlo.
arXiv Detail & Related papers (2022-06-16T19:36:17Z) - Amortized backward variational inference in nonlinear state-space models [0.0]
We consider the problem of state estimation in general state-space models using variational inference.
We establish for the first time that, under mixing assumptions, the variational approximation of expectations of additive state functionals induces an error which grows at most linearly in the number of observations.
arXiv Detail & Related papers (2022-06-01T08:35:54Z) - Optimal regularizations for data generation with probabilistic graphical
models [0.0]
Empirically, well-chosen regularization schemes dramatically improve the quality of the inferred models.
We consider the particular case of L 2 and L 1 regularizations in the Maximum A Posteriori (MAP) inference of generative pairwise graphical models.
arXiv Detail & Related papers (2021-12-02T14:45:16Z) - On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD)
We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting.
We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z) - A Random Matrix Theory Approach to Damping in Deep Learning [0.7614628596146599]
We conjecture that the inherent difference in generalisation between adaptive and non-adaptive gradient methods in deep learning stems from the increased estimation noise.
We develop a novel random matrix theory based damping learner for second order optimiser inspired by linear shrinkage estimation.
arXiv Detail & Related papers (2020-11-15T18:19:42Z) - Bayesian Deep Learning and a Probabilistic Perspective of Generalization [56.69671152009899]
We show that deep ensembles provide an effective mechanism for approximate Bayesian marginalization.
We also propose a related approach that further improves the predictive distribution by marginalizing within basins of attraction.
arXiv Detail & Related papers (2020-02-20T15:13:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.