Related papers: From Tail Universality to Bernstein-von Mises: A Unified Statistical Theory of Semi-Implicit Variational Inference

From Tail Universality to Bernstein-von Mises: A Unified Statistical Theory of Semi-Implicit Variational Inference

URL: http://arxiv.org/abs/2512.06107v1
Date: Fri, 05 Dec 2025 19:26:25 GMT
Title: From Tail Universality to Bernstein-von Mises: A Unified Statistical Theory of Semi-Implicit Variational Inference
Authors: Sean Plummer,
Abstract summary: Semi-implicit variational inference (SIVI) constructs approximate posteriors of the form $q() = int k(| z) r(dz)$<n>This paper develops a unified "approximation-optimization-statistics'' theory for such families.
Score: 0.12183405753834557
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Semi-implicit variational inference (SIVI) constructs approximate posteriors of the form $q(θ) = \int k(θ| z) r(dz)$, where the conditional kernel is parameterized and the mixing base is fixed and tractable. This paper develops a unified "approximation-optimization-statistics'' theory for such families. On the approximation side, we show that under compact L1-universality and a mild tail-dominance condition, semi-implicit families are dense in L1 and can achieve arbitrarily small forward Kullback-Leibler (KL) error. We also identify two sharp obstructions to global approximation: (i) an Orlicz tail-mismatch condition that induces a strictly positive forward-KL gap, and (ii) structural restrictions, such as non-autoregressive Gaussian kernels, that force "branch collapse'' in conditional distributions. For each obstruction we give a minimal structural modification that restores approximability. On the optimization side, we establish finite-sample oracle inequalities and prove that the empirical SIVI objectives L(K,n) $Γ$-converge to their population limit as n and K tend to infinity. These results give consistency of empirical maximizers, quantitative control of finite-K surrogate bias, and stability of the resulting variational posteriors. Combining the approximation and optimization analyses yields the first general end-to-end statistical theory for SIVI: we characterize precisely when SIVI can recover the target distribution, when it cannot, and how architectural and algorithmic choices govern the attainable asymptotic behavior.

Related papers

Stability and Generalization of Push-Sum Based Decentralized Optimization over Directed Graphs [55.77845440440496]
Push-based decentralized communication enables optimization over communication networks, where information exchange may be asymmetric.<n>We develop a unified uniform-stability framework for the Gradient Push (SGP) algorithm.<n>A key technical ingredient is an imbalance-aware generalization bound through two quantities.
arXiv Detail & Related papers (2026-02-24T05:32:03Z)
Statistical Inference under Adaptive Sampling with LinUCB [15.167069362020426]
We show that the linear upper confidence bound (LinUCB) algorithm for linear bandits satisfies a property called stability.<n>We establish a central limit theorem for the LinUCB algorithm, establishing normality for the limiting distribution of the estimation error.
arXiv Detail & Related papers (2025-11-28T21:48:18Z)
Revisiting Zeroth-Order Optimization: Minimum-Variance Two-Point Estimators and Directionally Aligned Perturbations [57.179679246370114]
We identify the distribution of random perturbations that minimizes the estimator's variance as the perturbation stepsize tends to zero.<n>Our findings reveal that such desired perturbations can align directionally with the true gradient, instead of maintaining a fixed length.
arXiv Detail & Related papers (2025-10-22T19:06:39Z)
Natural Gradient VI: Guarantees for Non-Conjugate Models [29.717497256432186]
Natural Natural Variational Inference (NGVI) is a widely used method for approxingimating posterior distribution in probabilistic models.<n>We show that NGVI convergence guarantees do not extend to the non-cliconjugate setting.<n>We propose a modified NGVI algorithm incorporating non-Euference and prove its global non-asymptotic convergence to a stationary point.
arXiv Detail & Related papers (2025-10-22T01:46:31Z)
Graph-based Clustering Revisited: A Relaxation of Kernel $k$-Means Perspective [73.18641268511318]
We propose a graph-based clustering algorithm that only relaxes the orthonormal constraint to derive clustering results.<n>To ensure a doubly constraint into a gradient, we transform the non-negative constraint into a class probability parameter.
arXiv Detail & Related papers (2025-09-23T09:14:39Z)
Asymptotically Unbiased Instance-wise Regularized Partial AUC Optimization: Theory and Algorithm [101.44676036551537]
One-way Partial AUC (OPAUC) and Two-way Partial AUC (TPAUC) measures the average performance of a binary classifier. Most of the existing methods could only optimize PAUC approximately, leading to inevitable biases that are not controllable. We present a simpler reformulation of the PAUC problem via distributional robust optimization AUC.
arXiv Detail & Related papers (2022-10-08T08:26:22Z)
Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth Games: Convergence Analysis under Expected Co-coercivity [49.66890309455787]
We introduce the expected co-coercivity condition, explain its benefits, and provide the first last-iterate convergence guarantees of SGDA and SCO. We prove linear convergence of both methods to a neighborhood of the solution when they use constant step-size. Our convergence guarantees hold under the arbitrary sampling paradigm, and we give insights into the complexity of minibatching.
arXiv Detail & Related papers (2021-06-30T18:32:46Z)
Convergence Rates of Stochastic Gradient Descent under Infinite Noise Variance [14.06947898164194]
Heavy tails emerge in gradient descent (SGD) in various scenarios. We provide convergence guarantees for SGD under a state-dependent and heavy-tailed noise with a potentially infinite variance. Our results indicate that even under heavy-tailed noise with infinite variance, SGD can converge to the global optimum.
arXiv Detail & Related papers (2021-02-20T13:45:11Z)
Posterior-Aided Regularization for Likelihood-Free Inference [23.708122045184698]
Posterior-Aided Regularization (PAR) is applicable to learning the density estimator, regardless of the model structure. We provide a unified estimation method of PAR to estimate both reverse KL term and mutual information term with a single neural network.
arXiv Detail & Related papers (2021-02-15T16:59:30Z)
Linear Last-iterate Convergence in Constrained Saddle-point Optimization [48.44657553192801]
We significantly expand the understanding of last-rate uniqueness for Optimistic Gradient Descent Ascent (OGDA) and Optimistic Multiplicative Weights Update (OMWU) We show that when the equilibrium is unique, linear lastiterate convergence is achieved with a learning rate whose value is set to a universal constant. We show that bilinear games over any polytope satisfy this condition and OGDA converges exponentially fast even without the unique equilibrium assumption.
arXiv Detail & Related papers (2020-06-16T20:53:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.