Alpha-divergence Variational Inference Meets Importance Weighted
Auto-Encoders: Methodology and Asymptotics
- URL: http://arxiv.org/abs/2210.06226v2
- Date: Wed, 19 Jul 2023 13:08:21 GMT
- Title: Alpha-divergence Variational Inference Meets Importance Weighted
Auto-Encoders: Methodology and Asymptotics
- Authors: Kam\'elia Daudel, Joe Benton, Yuyang Shi, Arnaud Doucet
- Abstract summary: We formalize and study the VR-IWAE bound, a generalization of the Importance Weighted Auto-Encoder (IWAE) bound.
We show that the VR-IWAE bound enjoys several desirable properties and notably leads to the same gradient descent procedure as the VR bound.
We then provide two complementary theoretical analyses of the VR-IWAE bound and thus of the standard IWAE bound.
- Score: 21.51266421854127
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Several algorithms involving the Variational R\'enyi (VR) bound have been
proposed to minimize an alpha-divergence between a target posterior
distribution and a variational distribution. Despite promising empirical
results, those algorithms resort to biased stochastic gradient descent
procedures and thus lack theoretical guarantees. In this paper, we formalize
and study the VR-IWAE bound, a generalization of the Importance Weighted
Auto-Encoder (IWAE) bound. We show that the VR-IWAE bound enjoys several
desirable properties and notably leads to the same stochastic gradient descent
procedure as the VR bound in the reparameterized case, but this time by relying
on unbiased gradient estimators. We then provide two complementary theoretical
analyses of the VR-IWAE bound and thus of the standard IWAE bound. Those
analyses shed light on the benefits or lack thereof of these bounds. Lastly, we
illustrate our theoretical claims over toy and real-data examples.
Related papers
- Theoretical Convergence Guarantees for Variational Autoencoders [2.8167997311962942]
Variational Autoencoders (VAE) are popular generative models used to sample from complex data distributions.
This paper aims to bridge that gap by providing non-asymptotic convergence guarantees for VAE trained using both Gradient Descent and Adam algorithms.
Our theoretical analysis applies to both Linear VAE and Deep Gaussian VAE, as well as several VAE variants, including $beta$-VAE and IWAE.
arXiv Detail & Related papers (2024-10-22T07:12:38Z) - Learning with Importance Weighted Variational Inference: Asymptotics for Gradient Estimators of the VR-IWAE Bound [3.115375810642661]
Several popular variational bounds involving importance weighting ideas have been proposed to generalize and improve on the Evidence Lower BOund.
The VR-IWAE bound was introduced as a variational bound that unifies the ELBO, IWAE and VR bounds methodologies.
arXiv Detail & Related papers (2024-10-15T20:09:06Z) - A Unified Theory of Stochastic Proximal Point Methods without Smoothness [52.30944052987393]
Proximal point methods have attracted considerable interest owing to their numerical stability and robustness against imperfect tuning.
This paper presents a comprehensive analysis of a broad range of variations of the proximal point method (SPPM)
arXiv Detail & Related papers (2024-05-24T21:09:19Z) - Optimal Rates for Vector-Valued Spectral Regularization Learning Algorithms [28.046728466038022]
We study theoretical properties of a broad class of regularized algorithms with vector-valued output.
We rigorously confirm the so-called saturation effect for ridge regression with vector-valued output.
We present the upper bound for the finite sample risk general vector-valued spectral algorithms.
arXiv Detail & Related papers (2024-05-23T16:45:52Z) - Taming Nonconvex Stochastic Mirror Descent with General Bregman
Divergence [25.717501580080846]
This paper revisits the convergence of gradient Forward Mirror (SMD) in the contemporary non optimization setting.
For the training, we develop provably convergent algorithms for the problem of linear networks.
arXiv Detail & Related papers (2024-02-27T17:56:49Z) - SOFARI: High-Dimensional Manifold-Based Inference [8.860162863559163]
We introduce two SOFARI variants to handle strongly and weakly latent factors, where the latter covers a broader range of applications.
We show that SOFARI provides bias-corrected estimators for both latent left factor vectors and singular values, for which we show to enjoy the mean-zero normal distributions with sparse estimable variances.
We illustrate the effectiveness of SOFARI and justify our theoretical results through simulation examples and a real data application in economic forecasting.
arXiv Detail & Related papers (2023-09-26T16:01:54Z) - Variational Laplace Autoencoders [53.08170674326728]
Variational autoencoders employ an amortized inference model to approximate the posterior of latent variables.
We present a novel approach that addresses the limited posterior expressiveness of fully-factorized Gaussian assumption.
We also present a general framework named Variational Laplace Autoencoders (VLAEs) for training deep generative models.
arXiv Detail & Related papers (2022-11-30T18:59:27Z) - Regularizing Variational Autoencoder with Diversity and Uncertainty
Awareness [61.827054365139645]
Variational Autoencoder (VAE) approximates the posterior of latent variables based on amortized variational inference.
We propose an alternative model, DU-VAE, for learning a more Diverse and less Uncertain latent space.
arXiv Detail & Related papers (2021-10-24T07:58:13Z) - On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD)
We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting.
We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.