The ELBO of Variational Autoencoders Converges to a Sum of Three
Entropies
- URL: http://arxiv.org/abs/2010.14860v5
- Date: Thu, 20 Apr 2023 08:00:35 GMT
- Title: The ELBO of Variational Autoencoders Converges to a Sum of Three
Entropies
- Authors: Simon Damm, Dennis Forster, Dmytro Velychko, Zhenwen Dai, Asja
Fischer, J\"org L\"ucke
- Abstract summary: The central objective function of a variational autoencoder (VAE) is its variational lower bound (the ELBO)
Here we show that for standard (i.e., Gaussian) VAEs the ELBO converges to a value given by the sum of three entropies.
Our derived analytical results are exact and apply for small as well as for intricate deep networks for encoder and decoder.
- Score: 16.119724102324934
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The central objective function of a variational autoencoder (VAE) is its
variational lower bound (the ELBO). Here we show that for standard (i.e.,
Gaussian) VAEs the ELBO converges to a value given by the sum of three
entropies: the (negative) entropy of the prior distribution, the expected
(negative) entropy of the observable distribution, and the average entropy of
the variational distributions (the latter is already part of the ELBO). Our
derived analytical results are exact and apply for small as well as for
intricate deep networks for encoder and decoder. Furthermore, they apply for
finitely and infinitely many data points and at any stationary point (including
local maxima and saddle points). The result implies that the ELBO can for
standard VAEs often be computed in closed-form at stationary points while the
original ELBO requires numerical approximations of integrals. As a main
contribution, we provide the proof that the ELBO for VAEs is at stationary
points equal to entropy sums. Numerical experiments then show that the obtained
analytical results are sufficiently precise also in those vicinities of
stationary points that are reached in practice. Furthermore, we discuss how the
novel entropy form of the ELBO can be used to analyze and understand learning
behavior. More generally, we believe that our contributions can be useful for
future theoretical and practical studies on VAE learning as they provide novel
information on those points in parameters space that optimization of VAEs
converges to.
Related papers
- Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks.
In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z) - Learning Sparse Codes with Entropy-Based ELBOs [13.906627869457232]
We derive a solely entropy-based learning objective for the parameters of standard sparse coding.
The novel variational objective has the following features: (A) unlike MAP approximations, it uses non-trivial posterior approximations for probabilistic inference.
arXiv Detail & Related papers (2023-11-03T13:03:41Z) - Adaptive Annealed Importance Sampling with Constant Rate Progress [68.8204255655161]
Annealed Importance Sampling (AIS) synthesizes weighted samples from an intractable distribution.
We propose the Constant Rate AIS algorithm and its efficient implementation for $alpha$-divergences.
arXiv Detail & Related papers (2023-06-27T08:15:28Z) - Statistical Efficiency of Score Matching: The View from Isoperimetry [96.65637602827942]
We show a tight connection between statistical efficiency of score matching and the isoperimetric properties of the distribution being estimated.
We formalize these results both in the sample regime and in the finite regime.
arXiv Detail & Related papers (2022-10-03T06:09:01Z) - On the Convergence of the ELBO to Entropy Sums [3.345575993695074]
We show that the variational lower bound is at all stationary points of learning equal to a sum of entropies.
For a very large class of generative models, the variational lower bound is at all stationary points of learning.
arXiv Detail & Related papers (2022-09-07T11:33:32Z) - Structural aspects of FRG in quantum tunnelling computations [68.8204255655161]
We probe both the unidimensional quartic harmonic oscillator and the double well potential.
Two partial differential equations for the potential V_k(varphi) and the wave function renormalization Z_k(varphi) are studied.
arXiv Detail & Related papers (2022-06-14T15:23:25Z) - Interpretable transformed ANOVA approximation on the example of the
prevention of forest fires [0.0]
In this paper, we apply transformation ideas in order to design a complete orthonormal system in the $mathrmL$ space of functions.
We are able to apply the explainable ANOVA approximation for this basis and use Z-score transformed data in the UCI method.
We demonstrate the applicability of this procedure on the well-known forest fires data set from the machine learning repository.
arXiv Detail & Related papers (2021-10-14T13:39:05Z) - Robust Implicit Networks via Non-Euclidean Contractions [63.91638306025768]
Implicit neural networks show improved accuracy and significant reduction in memory consumption.
They can suffer from ill-posedness and convergence instability.
This paper provides a new framework to design well-posed and robust implicit neural networks.
arXiv Detail & Related papers (2021-06-06T18:05:02Z) - Efficient Semi-Implicit Variational Inference [65.07058307271329]
We propose an efficient and scalable semi-implicit extrapolational (SIVI)
Our method maps SIVI's evidence to a rigorous inference of lower gradient values.
arXiv Detail & Related papers (2021-01-15T11:39:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.