Variational Deep Learning via Implicit Regularization
- URL: http://arxiv.org/abs/2505.20235v1
- Date: Mon, 26 May 2025 17:15:57 GMT
- Title: Variational Deep Learning via Implicit Regularization
- Authors: Jonathan Wenger, Beau Coker, Juraj Marusic, John P. Cunningham,
- Abstract summary: We show how to regularize a variational deep network implicitly via the optimization procedure.<n>We fully characterize the inductive bias of gradient descent in the case of an overparametrized linear model.
- Score: 20.449095674026363
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Modern deep learning models generalize remarkably well in-distribution, despite being overparametrized and trained with little to no explicit regularization. Instead, current theory credits implicit regularization imposed by the choice of architecture, hyperparameters and optimization procedure. However, deploying deep learning models out-of-distribution, in sequential decision-making tasks, or in safety-critical domains, necessitates reliable uncertainty quantification, not just a point estimate. The machinery of modern approximate inference -- Bayesian deep learning -- should answer the need for uncertainty quantification, but its effectiveness has been challenged by our inability to define useful explicit inductive biases through priors, as well as the associated computational burden. Instead, in this work we demonstrate, both theoretically and empirically, how to regularize a variational deep network implicitly via the optimization procedure, just as for standard deep learning. We fully characterize the inductive bias of (stochastic) gradient descent in the case of an overparametrized linear model as generalized variational inference and demonstrate the importance of the choice of parametrization. Finally, we show empirically that our approach achieves strong in- and out-of-distribution performance without tuning of additional hyperparameters and with minimal time and memory overhead over standard deep learning.
Related papers
- Deep Equilibrium models for Poisson Imaging Inverse problems via Mirror Descent [7.248102801711294]
Deep Equilibrium Models (DEQs) are implicit neural networks with fixed points.<n>We introduce a novel DEQ formulation based on Mirror Descent defined in terms of a tailored non-Euclidean geometry.<n>We propose computational strategies that enable both efficient training and fully parameter-free inference.
arXiv Detail & Related papers (2025-07-15T16:33:01Z) - Conflicting Biases at the Edge of Stability: Norm versus Sharpness Regularization [12.58055746943097]
We argue that a comprehensive understanding of the generalization performance of gradient descent requires analyzing the interaction between these various forms of implicit regularization.<n>We prove for diagonal linear networks trained on a simple regression task that neither implicit bias alone minimizes the generalization error.
arXiv Detail & Related papers (2025-05-27T16:51:06Z) - Towards Understanding the Optimization Mechanisms in Deep Learning [5.281849820329249]
In this paper, we adopt a distribution estimation perspective to explore the mechanisms of supervised classification using deep neural networks.<n>For the latter, we provide theoretical insights into mechanisms such as over- and probability randomization.
arXiv Detail & Related papers (2025-03-29T08:46:13Z) - Adaptive Conformal Inference by Betting [51.272991377903274]
We consider the problem of adaptive conformal inference without any assumptions about the data generating process.<n>Existing approaches for adaptive conformal inference are based on optimizing the pinball loss using variants of online gradient descent.<n>We propose a different approach for adaptive conformal inference that leverages parameter-free online convex optimization techniques.
arXiv Detail & Related papers (2024-12-26T18:42:08Z) - Generalized Gaussian Temporal Difference Error for Uncertainty-aware Reinforcement Learning [0.19418036471925312]
We introduce a novel framework for generalized Gaussian error modeling in deep reinforcement learning.<n>We improve the estimation and mitigation of data-dependent aleatoric uncertainty.<n> Experiments with policy gradient algorithms demonstrate significant performance gains.
arXiv Detail & Related papers (2024-08-05T08:12:25Z) - Bayesian Inference for Consistent Predictions in Overparameterized Nonlinear Regression [0.0]
This study explores the predictive properties of over parameterized nonlinear regression within the Bayesian framework.
Posterior contraction is established for generalized linear and single-neuron models with Lipschitz continuous activation functions.
The proposed method was validated via numerical simulations and a real data application.
arXiv Detail & Related papers (2024-04-06T04:22:48Z) - Selective Learning: Towards Robust Calibration with Dynamic Regularization [79.92633587914659]
Miscalibration in deep learning refers to there is a discrepancy between the predicted confidence and performance.
We introduce Dynamic Regularization (DReg) which aims to learn what should be learned during training thereby circumventing the confidence adjusting trade-off.
arXiv Detail & Related papers (2024-02-13T11:25:20Z) - Bayesian Nonparametrics Meets Data-Driven Distributionally Robust Optimization [29.24821214671497]
Training machine learning and statistical models often involve optimizing a data-driven risk criterion.
We propose a novel robust criterion by combining insights from Bayesian nonparametric (i.e., Dirichlet process) theory and a recent decision-theoretic model of smooth ambiguity-averse preferences.
For practical implementation, we propose and study tractable approximations of the criterion based on well-known Dirichlet process representations.
arXiv Detail & Related papers (2024-01-28T21:19:15Z) - Function-Space Regularization in Neural Networks: A Probabilistic
Perspective [51.133793272222874]
We show that we can derive a well-motivated regularization technique that allows explicitly encoding information about desired predictive functions into neural network training.
We evaluate the utility of this regularization technique empirically and demonstrate that the proposed method leads to near-perfect semantic shift detection and highly-calibrated predictive uncertainty estimates.
arXiv Detail & Related papers (2023-12-28T17:50:56Z) - Scalable Bayesian Meta-Learning through Generalized Implicit Gradients [64.21628447579772]
Implicit Bayesian meta-learning (iBaML) method broadens the scope of learnable priors, but also quantifies the associated uncertainty.
Analytical error bounds are established to demonstrate the precision and efficiency of the generalized implicit gradient over the explicit one.
arXiv Detail & Related papers (2023-03-31T02:10:30Z) - The Unreasonable Effectiveness of Deep Evidential Regression [72.30888739450343]
A new approach with uncertainty-aware regression-based neural networks (NNs) shows promise over traditional deterministic methods and typical Bayesian NNs.
We detail the theoretical shortcomings and analyze the performance on synthetic and real-world data sets, showing that Deep Evidential Regression is a quantification rather than an exact uncertainty.
arXiv Detail & Related papers (2022-05-20T10:10:32Z) - Squared $\ell_2$ Norm as Consistency Loss for Leveraging Augmented Data
to Learn Robust and Invariant Representations [76.85274970052762]
Regularizing distance between embeddings/representations of original samples and augmented counterparts is a popular technique for improving robustness of neural networks.
In this paper, we explore these various regularization choices, seeking to provide a general understanding of how we should regularize the embeddings.
We show that the generic approach we identified (squared $ell$ regularized augmentation) outperforms several recent methods, which are each specially designed for one task.
arXiv Detail & Related papers (2020-11-25T22:40:09Z) - Revisiting Explicit Regularization in Neural Networks for
Well-Calibrated Predictive Uncertainty [6.09170287691728]
In this work, we revisit the importance of explicit regularization for obtaining well-calibrated predictive uncertainty.
We introduce a measure of calibration performance, which is lower bounded by the log-likelihood.
We then explore explicit regularization techniques for improving the log-likelihood on unseen samples, which provides well-calibrated predictive uncertainty.
arXiv Detail & Related papers (2020-06-11T13:14:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.