VIKING: Deep variational inference with stochastic projections
- URL: http://arxiv.org/abs/2510.23684v1
- Date: Mon, 27 Oct 2025 15:38:35 GMT
- Title: VIKING: Deep variational inference with stochastic projections
- Authors: Samuel G. Fadel, Hrittik Roy, Nicholas Krämer, Yevgen Zainchkovskyy, Stas Syrota, Alejandro Valverde Mahou, Carl Henrik Ek, Søren Hauberg,
- Abstract summary: Variational mean field approximations tend to struggle with contemporary overparametrized deep neural networks.<n>We propose a simple variational family that considers two independent linear subspaces of the parameter space.<n>This allows us to build a fully-correlated approximate posterior reflecting the overparametrization.
- Score: 48.946143517489496
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Variational mean field approximations tend to struggle with contemporary overparametrized deep neural networks. Where a Bayesian treatment is usually associated with high-quality predictions and uncertainties, the practical reality has been the opposite, with unstable training, poor predictive power, and subpar calibration. Building upon recent work on reparametrizations of neural networks, we propose a simple variational family that considers two independent linear subspaces of the parameter space. These represent functional changes inside and outside the support of training data. This allows us to build a fully-correlated approximate posterior reflecting the overparametrization that tunes easy-to-interpret hyperparameters. We develop scalable numerical routines that maximize the associated evidence lower bound (ELBO) and sample from the approximate posterior. Empirically, we observe state-of-the-art performance across tasks, models, and datasets compared to a wide array of baseline methods. Our results show that approximate Bayesian inference applied to deep neural networks is far from a lost cause when constructing inference mechanisms that reflect the geometry of reparametrizations.
Related papers
- Tubular Riemannian Laplace Approximations for Bayesian Neural Networks [0.0]
Laplace approximations are among the simplest and most practical methods for approximate Bayesian inference in neural networks.<n>Recent work has proposed geometric Gaussian approximations to adapt to this structure.<n>We introduce the Tubular Riemannian Laplace (TRL) approximation.
arXiv Detail & Related papers (2025-12-30T17:50:55Z) - Deep Fréchet Regression [4.915744683251151]
We propose a flexible regression model capable of handling high-dimensional predictors without imposing parametric assumptions.<n>The proposed model outperforms existing methods for non-Euclidean responses.
arXiv Detail & Related papers (2024-07-31T07:54:14Z) - Reparameterization invariance in approximate Bayesian inference [32.88960624085645]
We develop a new geometric view of reparametrizations from which we explain the success of linearization.<n>We demonstrate that these re parameterization invariance properties can be extended to the original neural network predictive.
arXiv Detail & Related papers (2024-06-05T14:49:15Z) - A variational neural Bayes framework for inference on intractable posterior distributions [1.0801976288811024]
Posterior distributions of model parameters are efficiently obtained by feeding observed data into a trained neural network.
We show theoretically that our posteriors converge to the true posteriors in Kullback-Leibler divergence.
arXiv Detail & Related papers (2024-04-16T20:40:15Z) - Neural variational Data Assimilation with Uncertainty Quantification using SPDE priors [28.804041716140194]
Recent advances in the deep learning community enables to address the problem through a neural architecture a variational data assimilation framework.<n>In this work we use the theory of Partial Differential Equations (SPDE) and Gaussian Processes (GP) to estimate both space-and time covariance of the state.
arXiv Detail & Related papers (2024-02-02T19:18:12Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Implicit Variational Inference for High-Dimensional Posteriors [7.924706533725115]
In variational inference, the benefits of Bayesian models rely on accurately capturing the true posterior distribution.
We propose using neural samplers that specify implicit distributions, which are well-suited for approximating complex multimodal and correlated posteriors.
Our approach introduces novel bounds for approximate inference using implicit distributions by locally linearising the neural sampler.
arXiv Detail & Related papers (2023-10-10T14:06:56Z) - Structured Radial Basis Function Network: Modelling Diversity for
Multiple Hypotheses Prediction [51.82628081279621]
Multi-modal regression is important in forecasting nonstationary processes or with a complex mixture of distributions.
A Structured Radial Basis Function Network is presented as an ensemble of multiple hypotheses predictors for regression problems.
It is proved that this structured model can efficiently interpolate this tessellation and approximate the multiple hypotheses target distribution.
arXiv Detail & Related papers (2023-09-02T01:27:53Z) - The Geometry of Neural Nets' Parameter Spaces Under Reparametrization [35.5848464226014]
We study the invariance of neural nets under reparametrization from the perspective of Riemannian geometry.
We discuss implications for measuring the flatness of minima, optimization, and for probability-density.
arXiv Detail & Related papers (2023-02-14T22:48:24Z) - Learning Low Dimensional State Spaces with Overparameterized Recurrent
Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory.
Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z) - A Bayesian Perspective on Training Speed and Model Selection [51.15664724311443]
We show that a measure of a model's training speed can be used to estimate its marginal likelihood.
We verify our results in model selection tasks for linear models and for the infinite-width limit of deep neural networks.
Our results suggest a promising new direction towards explaining why neural networks trained with gradient descent are biased towards functions that generalize well.
arXiv Detail & Related papers (2020-10-27T17:56:14Z) - Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks [78.76880041670904]
In neural networks with binary activations and or binary weights the training by gradient descent is complicated.
We propose a new method for this estimation problem combining sampling and analytic approximation steps.
We experimentally show higher accuracy in gradient estimation and demonstrate a more stable and better performing training in deep convolutional models.
arXiv Detail & Related papers (2020-06-04T21:51:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.