Stability-based Generalization Bounds for Variational Inference
- URL: http://arxiv.org/abs/2502.12353v1
- Date: Mon, 17 Feb 2025 22:40:26 GMT
- Title: Stability-based Generalization Bounds for Variational Inference
- Authors: Yadi Wei, Roni Khardon,
- Abstract summary: Variational inference (VI) is widely used for approximate inference in Bayesian machine learning.
This paper develops stability based generalization bounds for a class of approximate Bayesian algorithms.
The new approach complements PAC-Bayes analysis and can provide tighter bounds in some cases.
- Score: 3.146069168382982
- License:
- Abstract: Variational inference (VI) is widely used for approximate inference in Bayesian machine learning. In addition to this practical success, generalization bounds for variational inference and related algorithms have been developed, mostly through the connection to PAC-Bayes analysis. A second line of work has provided algorithm-specific generalization bounds through stability arguments or using mutual information bounds, and has shown that the bounds are tight in practice, but unfortunately these bounds do not directly apply to approximate Bayesian algorithms. This paper fills this gap by developing algorithm-specific stability based generalization bounds for a class of approximate Bayesian algorithms that includes VI, specifically when using stochastic gradient descent to optimize their objective. As in the non-Bayesian case, the generalization error is bounded by by expected parameter differences on a perturbed dataset. The new approach complements PAC-Bayes analysis and can provide tighter bounds in some cases. An experimental illustration shows that the new approach yields non-vacuous bounds on modern neural network architectures and datasets and that it can shed light on performance differences between variant approximate Bayesian algorithms.
Related papers
- Quantitative Error Bounds for Scaling Limits of Stochastic Iterative Algorithms [10.022615790746466]
We derive non-asymptotic functional approximation error bounds between the algorithm sample paths and the Ornstein-Uhlenbeck approximation.
We use our main result to construct error bounds in terms of two common metrics: the L'evy-Prokhorov and bounded Wasserstein distances.
arXiv Detail & Related papers (2025-01-21T15:29:11Z) - Error Feedback under $(L_0,L_1)$-Smoothness: Normalization and Momentum [56.37522020675243]
We provide the first proof of convergence for normalized error feedback algorithms across a wide range of machine learning problems.
We show that due to their larger allowable stepsizes, our new normalized error feedback algorithms outperform their non-normalized counterparts on various tasks.
arXiv Detail & Related papers (2024-10-22T10:19:27Z) - An Information-Theoretic Approach to Generalization Theory [27.87324770020133]
We analyze information-theoretic bounds that quantify the dependence between a learning algorithm and the training data.
We show that algorithms with a bounded maximal leakage guarantee generalization even with a constant privacy parameter.
arXiv Detail & Related papers (2024-08-20T10:08:21Z) - A general error analysis for randomized low-rank approximation with application to data assimilation [42.57210316104905]
We propose a framework for the analysis of the low-rank approximation error in Frobenius norm for centered and non-standard matrices.
Under minimal assumptions, we derive accurate bounds in expectation and probability.
Our bounds have clear interpretations that enable us to derive properties and motivate practical choices.
arXiv Detail & Related papers (2024-05-08T04:51:56Z) - Scalable Bayesian Meta-Learning through Generalized Implicit Gradients [64.21628447579772]
Implicit Bayesian meta-learning (iBaML) method broadens the scope of learnable priors, but also quantifies the associated uncertainty.
Analytical error bounds are established to demonstrate the precision and efficiency of the generalized implicit gradient over the explicit one.
arXiv Detail & Related papers (2023-03-31T02:10:30Z) - Amortized Implicit Differentiation for Stochastic Bilevel Optimization [53.12363770169761]
We study a class of algorithms for solving bilevel optimization problems in both deterministic and deterministic settings.
We exploit a warm-start strategy to amortize the estimation of the exact gradient.
By using this framework, our analysis shows these algorithms to match the computational complexity of methods that have access to an unbiased estimate of the gradient.
arXiv Detail & Related papers (2021-11-29T15:10:09Z) - Simple Stochastic and Online Gradient DescentAlgorithms for Pairwise
Learning [65.54757265434465]
Pairwise learning refers to learning tasks where the loss function depends on a pair instances.
Online descent (OGD) is a popular approach to handle streaming data in pairwise learning.
In this paper, we propose simple and online descent to methods for pairwise learning.
arXiv Detail & Related papers (2021-11-23T18:10:48Z) - Fractal Structure and Generalization Properties of Stochastic
Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure.
We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z) - Optimal oracle inequalities for solving projected fixed-point equations [53.31620399640334]
We study methods that use a collection of random observations to compute approximate solutions by searching over a known low-dimensional subspace of the Hilbert space.
We show how our results precisely characterize the error of a class of temporal difference learning methods for the policy evaluation problem with linear function approximation.
arXiv Detail & Related papers (2020-12-09T20:19:32Z) - Distributed Variational Bayesian Algorithms Over Sensor Networks [6.572330981878818]
We propose two novel distributed VB algorithms for general Bayesian inference problem.
The proposed algorithms have excellent performance, which are almost as good as the corresponding centralized VB algorithm relying on all data available in a fusion center.
arXiv Detail & Related papers (2020-11-27T08:12:18Z) - Distributed Value Function Approximation for Collaborative Multi-Agent
Reinforcement Learning [2.7071541526963805]
We propose several novel distributed gradient-based temporal difference algorithms for multi-agent off-policy learning.
The proposed algorithms differ by their form, definition of eligibility traces, selection of time scales and the way of incorporating consensus iterations.
It is demonstrated how the adopted methodology can be applied to temporal-difference algorithms under weaker information structure constraints.
arXiv Detail & Related papers (2020-06-18T11:46:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.