Bayes without Underfitting: Fully Correlated Deep Learning Posteriors via Alternating Projections
- URL: http://arxiv.org/abs/2410.16901v1
- Date: Tue, 22 Oct 2024 11:15:07 GMT
- Title: Bayes without Underfitting: Fully Correlated Deep Learning Posteriors via Alternating Projections
- Authors: Marco Miani, Hrittik Roy, Søren Hauberg,
- Abstract summary: Bayesian deep learning all too often underfits so that the Bayesian prediction is less accurate than a simple point estimate.
We propose to build Bayesian approximations in a null space, thereby guaranteeing that the Bayesian predictive does not underfit.
An empirical evaluation shows that the approach scales to large models, including vision transformers with 28 million parameters.
- Score: 11.893371164199312
- License:
- Abstract: Bayesian deep learning all too often underfits so that the Bayesian prediction is less accurate than a simple point estimate. Uncertainty quantification then comes at the cost of accuracy. For linearized models, the null space of the generalized Gauss-Newton matrix corresponds to parameters that preserve the training predictions of the point estimate. We propose to build Bayesian approximations in this null space, thereby guaranteeing that the Bayesian predictive does not underfit. We suggest a matrix-free algorithm for projecting onto this null space, which scales linearly with the number of parameters and quadratically with the number of output dimensions. We further propose an approximation that only scales linearly with parameters to make the method applicable to generative models. An extensive empirical evaluation shows that the approach scales to large models, including vision transformers with 28 million parameters.
Related papers
- A Bayesian Approach Toward Robust Multidimensional Ellipsoid-Specific Fitting [0.0]
This work presents a novel and effective method for fitting multidimensional ellipsoids to scattered data in the contamination of noise and outliers.
We incorporate a uniform prior distribution to constrain the search for primitive parameters within an ellipsoidal domain.
We apply it to a wide range of practical applications such as microscopy cell counting, 3D reconstruction, geometric shape approximation, and magnetometer calibration tasks.
arXiv Detail & Related papers (2024-07-27T14:31:51Z) - Multivariate root-n-consistent smoothing parameter free matching estimators and estimators of inverse density weighted expectations [51.000851088730684]
We develop novel modifications of nearest-neighbor and matching estimators which converge at the parametric $sqrt n $-rate.
We stress that our estimators do not involve nonparametric function estimators and in particular do not rely on sample-size dependent parameters smoothing.
arXiv Detail & Related papers (2024-07-11T13:28:34Z) - A variational Bayes approach to debiased inference for low-dimensional parameters in high-dimensional linear regression [2.7498981662768536]
We propose a scalable variational Bayes method for statistical inference in sparse linear regression.
Our approach relies on assigning a mean-field approximation to the nuisance coordinates.
This requires only a preprocessing step and preserves the computational advantages of mean-field variational Bayes.
arXiv Detail & Related papers (2024-06-18T14:27:44Z) - Hessian-Free Laplace in Bayesian Deep Learning [44.16006844888796]
Hessian-free Laplace (HFL) approximation uses curvature of both the log posterior and network prediction to estimate its variance.
We show that, under standard assumptions of LA in Bayesian deep learning, HFL targets the same variance as LA, and can be efficiently amortized in a pre-trained network.
arXiv Detail & Related papers (2024-03-15T20:47:39Z) - A Mean Field Approach to Empirical Bayes Estimation in High-dimensional
Linear Regression [8.345523969593492]
We study empirical Bayes estimation in high-dimensional linear regression.
We adopt a variational empirical Bayes approach, introduced originally in Carbonetto and Stephens (2012) and Kim et al. (2022).
This provides the first rigorous empirical Bayes method in a high-dimensional regression setting without sparsity.
arXiv Detail & Related papers (2023-09-28T20:51:40Z) - Stochastic Marginal Likelihood Gradients using Neural Tangent Kernels [78.6096486885658]
We introduce lower bounds to the linearized Laplace approximation of the marginal likelihood.
These bounds are amenable togradient-based optimization and allow to trade off estimation accuracy against computational complexity.
arXiv Detail & Related papers (2023-06-06T19:02:57Z) - Bayesian Analysis for Over-parameterized Linear Model without Sparsity [8.1585306387285]
This study introduces a Bayesian approach that employs a prior distribution dependent on the eigenvectors of data covariance matrices without inducing parameter sparsity.
We also provide contraction rates of the derived posterior estimation and develop a truncated Gaussian approximation of the posterior distribution.
These findings suggest that Bayesian methods capable of handling data spectra and estimating non-sparse high-dimensional parameters are feasible.
arXiv Detail & Related papers (2023-05-25T06:07:47Z) - Distributed Sketching for Randomized Optimization: Exact
Characterization, Concentration and Lower Bounds [54.51566432934556]
We consider distributed optimization methods for problems where forming the Hessian is computationally challenging.
We leverage randomized sketches for reducing the problem dimensions as well as preserving privacy and improving straggler resilience in asynchronous distributed systems.
arXiv Detail & Related papers (2022-03-18T05:49:13Z) - Pathologies in priors and inference for Bayesian transformers [71.97183475225215]
No successful attempts to improve transformer models in terms of predictive uncertainty using Bayesian inference exist.
We find that weight-space inference in transformers does not work well, regardless of the approximate posterior.
We propose a novel method based on the implicit reparameterization of the Dirichlet distribution to apply variational inference directly to the attention weights.
arXiv Detail & Related papers (2021-10-08T10:35:27Z) - Improving predictions of Bayesian neural nets via local linearization [79.21517734364093]
We argue that the Gauss-Newton approximation should be understood as a local linearization of the underlying Bayesian neural network (BNN)
Because we use this linearized model for posterior inference, we should also predict using this modified model instead of the original one.
We refer to this modified predictive as "GLM predictive" and show that it effectively resolves common underfitting problems of the Laplace approximation.
arXiv Detail & Related papers (2020-08-19T12:35:55Z) - Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks [78.76880041670904]
In neural networks with binary activations and or binary weights the training by gradient descent is complicated.
We propose a new method for this estimation problem combining sampling and analytic approximation steps.
We experimentally show higher accuracy in gradient estimation and demonstrate a more stable and better performing training in deep convolutional models.
arXiv Detail & Related papers (2020-06-04T21:51:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.