Related papers: Kernel Semi-Implicit Variational Inference

Kernel Semi-Implicit Variational Inference

URL: http://arxiv.org/abs/2405.18997v1
Date: Wed, 29 May 2024 11:21:25 GMT
Title: Kernel Semi-Implicit Variational Inference
Authors: Ziheng Cheng, Longlin Yu, Tianyu Xie, Shiyue Zhang, Cheng Zhang,
Abstract summary: Semi-implicit variational inference (SIVI) extends traditional variational families with semi-implicit distributions defined in a hierarchical manner. A recent advancement in SIVI, named SIVI-SM, utilizes an alternative score matching objective made tractable via a minimax formulation. We propose kernel SIVI (KSIVI), a variant of SIVI-SM that eliminates the need for lower-level optimization through kernel tricks.
Score: 27.61976547543748
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Semi-implicit variational inference (SIVI) extends traditional variational families with semi-implicit distributions defined in a hierarchical manner. Due to the intractable densities of semi-implicit distributions, classical SIVI often resorts to surrogates of evidence lower bound (ELBO) that would introduce biases for training. A recent advancement in SIVI, named SIVI-SM, utilizes an alternative score matching objective made tractable via a minimax formulation, albeit requiring an additional lower-level optimization. In this paper, we propose kernel SIVI (KSIVI), a variant of SIVI-SM that eliminates the need for lower-level optimization through kernel tricks. Specifically, we show that when optimizing over a reproducing kernel Hilbert space (RKHS), the lower-level problem has an explicit solution. This way, the upper-level objective becomes the kernel Stein discrepancy (KSD), which is readily computable for stochastic gradient descent due to the hierarchical structure of semi-implicit variational distributions. An upper bound for the variance of the Monte Carlo gradient estimators of the KSD objective is derived, which allows us to establish novel convergence guarantees of KSIVI. We demonstrate the effectiveness and efficiency of KSIVI on both synthetic distributions and a variety of real data Bayesian inference tasks.

Related papers

Semi-Implicit Variational Inference via Kernelized Path Gradient Descent [12.300415631357406]
Training with the Kullback-Leibler divergence can be challenging due to high variance and bias in high-dimensional settings.<n>We propose a kernelized KL divergence estimator that stabilizes training through nonparametric smoothing.<n>Our method's bias in function space is benign, leading to more stable and efficient optimization.
arXiv Detail & Related papers (2025-06-05T14:34:37Z)
Stochastic Optimization with Optimal Importance Sampling [49.484190237840714]
We propose an iterative-based algorithm that jointly updates the decision and the IS distribution without requiring time-scale separation between the two. Our method achieves the lowest possible variable variance and guarantees global convergence under convexity of the objective and mild assumptions on the IS distribution family.
arXiv Detail & Related papers (2025-04-04T16:10:18Z)
A New Formulation of Lipschitz Constrained With Functional Gradient Learning for GANs [52.55025869932486]
This paper introduces a promising alternative method for training Generative Adversarial Networks (GANs) on large-scale datasets with clear theoretical guarantees. We propose a novel Lipschitz-constrained Functional Gradient GANs learning (Li-CFG) method to stabilize the training of GAN. We demonstrate that the neighborhood size of the latent vector can be reduced by increasing the norm of the discriminator gradient.
arXiv Detail & Related papers (2025-01-20T02:48:07Z)
A Fresh Look at Generalized Category Discovery through Non-negative Matrix Factorization [83.12938977698988]
Generalized Category Discovery (GCD) aims to classify both base and novel images using labeled base data. Current approaches inadequately address the intrinsic optimization of the co-occurrence matrix $barA$ based on cosine similarity. We propose a Non-Negative Generalized Category Discovery (NN-GCD) framework to address these deficiencies.
arXiv Detail & Related papers (2024-10-29T07:24:11Z)
A Trust-Region Method for Graphical Stein Variational Inference [3.5516599670943774]
Stein variational (SVI) is a sample-based approximate inference technique that generates a sample set by jointly optimizing the samples locations to an information-theoretic measure. We propose a novel trust-conditioned approach for SVI that successfully addresses each these challenges.
arXiv Detail & Related papers (2024-10-21T16:59:01Z)
Particle Semi-Implicit Variational Inference [2.555222031881788]
Semi-implicit variational inference (SIVI) enriches the expressiveness of variational families by utilizing a kernel and a mixing distribution. Existing SIVI methods parameterize the mixing distribution using implicit distributions, leading to intractable variational densities. We propose a novel method for SIVI called Particle Variational Inference (PVI) which employs empirical measures to approximate the optimal mixing distributions.
arXiv Detail & Related papers (2024-06-30T10:21:41Z)
Taming Nonconvex Stochastic Mirror Descent with General Bregman Divergence [25.717501580080846]
This paper revisits the convergence of gradient Forward Mirror (SMD) in the contemporary non optimization setting. For the training, we develop provably convergent algorithms for the problem of linear networks.
arXiv Detail & Related papers (2024-02-27T17:56:49Z)
Hierarchical Semi-Implicit Variational Inference with Application to Diffusion Model Acceleration [20.820242501141834]
Semi-implicit variational inference (SIVI) has been introduced to expand the analytical variational families. We propose hierarchical semi-implicit variational inference, called HSIVI, which generalizes SIVI to allow more expressive multi-layer construction.
arXiv Detail & Related papers (2023-10-26T04:52:28Z)
Semi-Implicit Variational Inference via Score Matching [9.654640791869431]
Semi-implicit variational inference (SIVI) greatly enriches the expressiveness of variational families. Current SIVI approaches often use surrogate evidence lower bounds (ELBOs) or employ expensive inner-loop MCMC runs for unbiased ELBOs for training. We propose SIVI-SM, a new method for SIVI based on an alternative training objective via score matching.
arXiv Detail & Related papers (2023-08-19T13:32:54Z)
Adaptive Annealed Importance Sampling with Constant Rate Progress [68.8204255655161]
Annealed Importance Sampling (AIS) synthesizes weighted samples from an intractable distribution. We propose the Constant Rate AIS algorithm and its efficient implementation for $alpha$-divergences.
arXiv Detail & Related papers (2023-06-27T08:15:28Z)
FEDNEST: Federated Bilevel, Minimax, and Compositional Optimization [53.78643974257301]
Many contemporary ML problems fall under nested bilevel programming that subsumes minimax and compositional optimization. We propose FedNest: A federated alternating gradient method to address general nested problems.
arXiv Detail & Related papers (2022-05-04T17:48:55Z)
Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process. We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator. We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z)
Variational Refinement for Importance Sampling Using the Forward Kullback-Leibler Divergence [77.06203118175335]
Variational Inference (VI) is a popular alternative to exact sampling in Bayesian inference. Importance sampling (IS) is often used to fine-tune and de-bias the estimates of approximate Bayesian inference procedures. We propose a novel combination of optimization and sampling techniques for approximate Bayesian inference.
arXiv Detail & Related papers (2021-06-30T11:00:24Z)
A Note on Optimizing Distributions using Kernel Mean Embeddings [94.96262888797257]
Kernel mean embeddings represent probability measures by their infinite-dimensional mean embeddings in a reproducing kernel Hilbert space. We show that when the kernel is characteristic, distributions with a kernel sum-of-squares density are dense. We provide algorithms to optimize such distributions in the finite-sample setting.
arXiv Detail & Related papers (2021-06-18T08:33:45Z)
Efficient Semi-Implicit Variational Inference [65.07058307271329]
We propose an efficient and scalable semi-implicit extrapolational (SIVI) Our method maps SIVI's evidence to a rigorous inference of lower gradient values.
arXiv Detail & Related papers (2021-01-15T11:39:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.