Related papers: Geometric-Aware Variational Inference: Robust and Adaptive Regularization with Directional Weight Uncertainty

Geometric-Aware Variational Inference: Robust and Adaptive Regularization with Directional Weight Uncertainty

URL: http://arxiv.org/abs/2506.19726v1
Date: Tue, 24 Jun 2025 15:42:00 GMT
Title: Geometric-Aware Variational Inference: Robust and Adaptive Regularization with Directional Weight Uncertainty
Authors: Carlos Stein Brito,
Abstract summary: Concentration-Adapted Perturbations (CAP) is a variational framework that models weight uncertainties directly on the unit hypersphere.<n>CAP provides the first complete theoretical framework connecting directional statistics to practical noise regularization in neural networks.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep neural networks require principled uncertainty quantification, yet existing variational inference methods often employ isotropic Gaussian approximations in weight space that poorly match the network's inherent geometry. We address this mismatch by introducing Concentration-Adapted Perturbations (CAP), a variational framework that models weight uncertainties directly on the unit hypersphere using von Mises-Fisher distributions. Building on recent work in radial-directional posterior decompositions and spherical weight constraints, CAP provides the first complete theoretical framework connecting directional statistics to practical noise regularization in neural networks. Our key contribution is an analytical derivation linking vMF concentration parameters to activation noise variance, enabling each layer to learn its optimal uncertainty level through a novel closed-form KL divergence regularizer. In experiments on CIFAR-10, CAP significantly improves model calibration - reducing Expected Calibration Error by 5.6x - while providing interpretable layer-wise uncertainty profiles. CAP requires minimal computational overhead and integrates seamlessly into standard architectures, offering a theoretically grounded yet practical approach to uncertainty quantification in deep learning.

Related papers

CLUE: Neural Networks Calibration via Learning Uncertainty-Error alignment [7.702016079410588]
We introduce CLUE (Calibration via Learning Uncertainty-Error Alignment), a novel approach that aligns predicted uncertainty with observed error during training.<n>We show that CLUE achieves superior calibration quality and competitive predictive performance with respect to state-of-the-art approaches.
arXiv Detail & Related papers (2025-05-28T19:23:47Z)
Scale-Insensitive Neural Network Significance Tests [0.0]
This paper develops a scale-insensitive framework for neural network significance testing.<n>We replace metric entropy calculations with Rademacher complexity bounds.<n>We weaken the regularity conditions on the target function to require only Sobolev space membership.
arXiv Detail & Related papers (2025-01-27T03:45:26Z)
Error Feedback under $(L_0,L_1)$-Smoothness: Normalization and Momentum [56.37522020675243]
We provide the first proof of convergence for normalized error feedback algorithms across a wide range of machine learning problems. We show that due to their larger allowable stepsizes, our new normalized error feedback algorithms outperform their non-normalized counterparts on various tasks.
arXiv Detail & Related papers (2024-10-22T10:19:27Z)
Revisiting Essential and Nonessential Settings of Evidential Deep Learning [70.82728812001807]
Evidential Deep Learning (EDL) is an emerging method for uncertainty estimation. We propose Re-EDL, a simplified yet more effective variant of EDL.
arXiv Detail & Related papers (2024-10-01T04:27:07Z)
Generalized Gaussian Temporal Difference Error for Uncertainty-aware Reinforcement Learning [0.19418036471925312]
We introduce a novel framework for generalized Gaussian error modeling in deep reinforcement learning.<n>We improve the estimation and mitigation of data-dependent aleatoric uncertainty.<n> Experiments with policy gradient algorithms demonstrate significant performance gains.
arXiv Detail & Related papers (2024-08-05T08:12:25Z)
Alpha-VI DeepONet: A prior-robust variational Bayesian approach for enhancing DeepONets with uncertainty quantification [0.0]
We introduce a novel deep operator network (DeepONet) framework that incorporates generalised variational inference (GVI) By incorporating Bayesian neural networks as the building blocks for the branch and trunk networks, our framework endows DeepONet with uncertainty quantification. We demonstrate that modifying the variational objective function yields superior results in terms of minimising the mean squared error.
arXiv Detail & Related papers (2024-08-01T16:22:03Z)
Ensemble Kalman Filtering Meets Gaussian Process SSM for Non-Mean-Field and Online Inference [47.460898983429374]
We introduce an ensemble Kalman filter (EnKF) into the non-mean-field (NMF) variational inference framework to approximate the posterior distribution of the latent states. This novel marriage between EnKF and GPSSM not only eliminates the need for extensive parameterization in learning variational distributions, but also enables an interpretable, closed-form approximation of the evidence lower bound (ELBO) We demonstrate that the resulting EnKF-aided online algorithm embodies a principled objective function by ensuring data-fitting accuracy while incorporating model regularizations to mitigate overfitting.
arXiv Detail & Related papers (2023-12-10T15:22:30Z)
A PAC-Bayesian Perspective on the Interpolating Information Criterion [54.548058449535155]
We show how a PAC-Bayes bound is obtained for a general class of models, characterizing factors which influence performance in the interpolating regime. We quantify how the test error for overparameterized models achieving effectively zero training error depends on the quality of the implicit regularization imposed by e.g. the combination of model, parameter-initialization scheme.
arXiv Detail & Related papers (2023-11-13T01:48:08Z)
On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD) We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting. We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z)
Quantifying Model Predictive Uncertainty with Perturbation Theory [21.591460685054546]
We propose a framework for predictive uncertainty quantification of a neural network. We use perturbation theory from quantum physics to formulate a moment decomposition problem. Our approach provides fast model predictive uncertainty estimates with much greater precision and calibration.
arXiv Detail & Related papers (2021-09-22T17:55:09Z)
A Kernel Framework to Quantify a Model's Local Predictive Uncertainty under Data Distributional Shifts [21.591460685054546]
Internal layer outputs of a trained neural network contain all of the information related to both its mapping function and its input data distribution. We propose a framework for predictive uncertainty quantification of a trained neural network that explicitly estimates the PDF of its raw prediction space. The kernel framework is observed to provide model uncertainty estimates with much greater precision based on the ability to detect model prediction errors.
arXiv Detail & Related papers (2021-03-02T00:31:53Z)
Amortized Conditional Normalized Maximum Likelihood: Reliable Out of Distribution Uncertainty Estimation [99.92568326314667]
We propose the amortized conditional normalized maximum likelihood (ACNML) method as a scalable general-purpose approach for uncertainty estimation. Our algorithm builds on the conditional normalized maximum likelihood (CNML) coding scheme, which has minimax optimal properties according to the minimum description length principle. We demonstrate that ACNML compares favorably to a number of prior techniques for uncertainty estimation in terms of calibration on out-of-distribution inputs.
arXiv Detail & Related papers (2020-11-05T08:04:34Z)
Neural Control Variates [71.42768823631918]
We show that a set of neural networks can face the challenge of finding a good approximation of the integrand. We derive a theoretically optimal, variance-minimizing loss function, and propose an alternative, composite loss for stable online training in practice. Specifically, we show that the learned light-field approximation is of sufficient quality for high-order bounces, allowing us to omit the error correction and thereby dramatically reduce the noise at the cost of negligible visible bias.
arXiv Detail & Related papers (2020-06-02T11:17:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.