Related papers: Geometric Convergence Analysis of Variational Inference via Bregman Divergences

Geometric Convergence Analysis of Variational Inference via Bregman Divergences

URL: http://arxiv.org/abs/2510.15548v1
Date: Fri, 17 Oct 2025 11:30:05 GMT
Title: Geometric Convergence Analysis of Variational Inference via Bregman Divergences
Authors: Sushil Bohara, Amedeo Roberto Esposito,
Abstract summary: Vari rigorous Inference (VI) provides a scalable framework for inference by the Lower Evidence (ELBO)<n>We establish a novel theoretical framework for analyzing objective convergence by exploiting the exponential family distributions.
Score: 3.7098038388802252
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Variational Inference (VI) provides a scalable framework for Bayesian inference by optimizing the Evidence Lower Bound (ELBO), but convergence analysis remains challenging due to the objective's non-convexity and non-smoothness in Euclidean space. We establish a novel theoretical framework for analyzing VI convergence by exploiting the exponential family structure of distributions. We express negative ELBO as a Bregman divergence with respect to the log-partition function, enabling a geometric analysis of the optimization landscape. We show that this Bregman representation admits a weak monotonicity property that, while weaker than convexity, provides sufficient structure for rigorous convergence analysis. By deriving bounds on the objective function along rays in parameter space, we establish properties governed by the spectral characteristics of the Fisher information matrix. Under this geometric framework, we prove non-asymptotic convergence rates for gradient descent algorithms with both constant and diminishing step sizes.

Related papers

Algorithmic Stability in Infinite Dimensions: Characterizing Unconditional Convergence in Banach Spaces [0.0]
A distinction between conditional, unconditional, and absolute convergence in infinite-dimensional spaces has fundamental implications for computational algorithms.<n>We present a comprehensive characterization theorem unifying seven equivalent conditions for unconditional convergence.<n>Our work bridges classical functional analysis with contemporary computational practice, providing rigorous foundations for order-independent and numerically robust processes.
arXiv Detail & Related papers (2026-01-13T12:51:58Z)
Elucidating Subspace Perturbation in Zeroth-Order Optimization: Theory and Practice at Scale [33.38543010618118]
Zeroth-order (ZO) optimization has emerged as a promising alternative to gradient-based backpropagation methods.<n>We show that high dimensionality is the primary bottleneck and introduce the notion of textitsubspace alignment to explain how the subspace perturbations reduce gradient noise and accelerate convergence.<n>We propose an efficient ZO method using block coordinate descent (MeZO-BCD), which perturbs and updates only a subset of parameters at each step.
arXiv Detail & Related papers (2025-01-31T12:46:04Z)
A Unified Theory of Stochastic Proximal Point Methods without Smoothness [52.30944052987393]
Proximal point methods have attracted considerable interest owing to their numerical stability and robustness against imperfect tuning. This paper presents a comprehensive analysis of a broad range of variations of the proximal point method (SPPM)
arXiv Detail & Related papers (2024-05-24T21:09:19Z)
On the Uniform Convergence of Subdifferentials in Stochastic Optimization and Learning [1.5229257192293195]
We investigate the uniform convergence of subdifferential mappings from empirical risk to population risk in nonsmooth, non-valued to deterministic optimization.<n>These guarantees offer new insight into the geometry of problems arising in robust statistics and related applications.
arXiv Detail & Related papers (2024-05-16T17:49:46Z)
Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training. We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z)
Curvature-Independent Last-Iterate Convergence for Games on Riemannian Manifolds [77.4346324549323]
We show that a step size agnostic to the curvature of the manifold achieves a curvature-independent and linear last-iterate convergence rate. To the best of our knowledge, the possibility of curvature-independent rates and/or last-iterate convergence has not been considered before.
arXiv Detail & Related papers (2023-06-29T01:20:44Z)
Convergence of Adam Under Relaxed Assumptions [72.24779199744954]
We show that Adam converges to $epsilon$-stationary points with $O(epsilon-4)$ gradient complexity under far more realistic conditions. We also propose a variance-reduced version of Adam with an accelerated gradient complexity of $O(epsilon-3)$.
arXiv Detail & Related papers (2023-04-27T06:27:37Z)
Convergence beyond the over-parameterized regime using Rayleigh quotients [18.728779959566946]
We show that Rayleigh quotients provide a unified view for several convergence analysis techniques in the literature. Our strategy produces a proof of convergence for various examples of parametric learning.
arXiv Detail & Related papers (2023-01-19T15:18:23Z)
Reinforcement Learning from Partial Observation: Linear Function Approximation with Provable Sample Efficiency [111.83670279016599]
We study reinforcement learning for partially observed decision processes (POMDPs) with infinite observation and state spaces. We make the first attempt at partial observability and function approximation for a class of POMDPs with a linear structure.
arXiv Detail & Related papers (2022-04-20T21:15:38Z)
Convex Analysis of the Mean Field Langevin Dynamics [49.66486092259375]
convergence rate analysis of the mean field Langevin dynamics is presented. $p_q$ associated with the dynamics allows us to develop a convergence theory parallel to classical results in convex optimization.
arXiv Detail & Related papers (2022-01-25T17:13:56Z)
On Asymptotic Linear Convergence of Projected Gradient Descent for Constrained Least Squares [22.851500417035947]
This manuscript presents a unified framework for the analysis of projected gradient descent in the context of constrained least squares. We present a recipe for the convergence analysis of PGD and demonstrate it via a beginning-to-end application of the recipe on four fundamental problems.
arXiv Detail & Related papers (2021-12-22T09:49:51Z)
The Last-Iterate Convergence Rate of Optimistic Mirror Descent in Stochastic Variational Inequalities [29.0058976973771]
We show an intricate relation between the algorithm's rate of convergence and the local geometry induced by the method's underlying Bregman function. We show that this exponent determines both the optimal step-size policy of the algorithm and the optimal rates attained.
arXiv Detail & Related papers (2021-07-05T09:54:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.