On the Convergence of the ELBO to Entropy Sums
- URL: http://arxiv.org/abs/2209.03077v5
- Date: Mon, 29 Apr 2024 07:41:34 GMT
- Title: On the Convergence of the ELBO to Entropy Sums
- Authors: Jörg Lücke, Jan Warnken,
- Abstract summary: We show that the variational lower bound is at all stationary points of learning equal to a sum of entropies.
For a very large class of generative models, the variational lower bound is at all stationary points of learning.
- Score: 3.345575993695074
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The variational lower bound (a.k.a. ELBO or free energy) is the central objective for many established as well as many novel algorithms for unsupervised learning. During learning such algorithms change model parameters to increase the variational lower bound. Learning usually proceeds until parameters have converged to values close to a stationary point of the learning dynamics. In this purely theoretical contribution, we show that (for a very large class of generative models) the variational lower bound is at all stationary points of learning equal to a sum of entropies. For standard machine learning models with one set of latents and one set of observed variables, the sum consists of three entropies: (A) the (average) entropy of the variational distributions, (B) the negative entropy of the model's prior distribution, and (C) the (expected) negative entropy of the observable distribution. The obtained result applies under realistic conditions including: finite numbers of data points, at any stationary point (including saddle points) and for any family of (well behaved) variational distributions. The class of generative models for which we show the equality to entropy sums contains many well-known generative models. As concrete examples we discuss Sigmoid Belief Networks, probabilistic PCA and (Gaussian and non-Gaussian) mixture models. The result also applies for standard (Gaussian) variational autoencoders, a special case that has been shown previously (Damm et al., 2023). The prerequisites we use to show equality to entropy sums are relatively mild. Concretely, the distributions of a given generative model have to be of the exponential family, and the model has to satisfy a parameterization criterion (which is usually fulfilled). Proving the equality of the ELBO to entropy sums at stationary points (under the stated conditions) is the main contribution of this work.
Related papers
- Learning Sparse Codes with Entropy-Based ELBOs [13.906627869457232]
We derive a solely entropy-based learning objective for the parameters of standard sparse coding.
The novel variational objective has the following features: (A) unlike MAP approximations, it uses non-trivial posterior approximations for probabilistic inference.
arXiv Detail & Related papers (2023-11-03T13:03:41Z) - Statistical Properties of the Entropy from Ordinal Patterns [55.551675080361335]
Knowing the joint distribution of the pair Entropy-Statistical Complexity for a large class of time series models would allow statistical tests that are unavailable to date.
We characterize the distribution of the empirical Shannon's Entropy for any model under which the true normalized Entropy is neither zero nor one.
We present a bilateral test that verifies if there is enough evidence to reject the hypothesis that two signals produce ordinal patterns with the same Shannon's Entropy.
arXiv Detail & Related papers (2022-09-15T23:55:58Z) - On the Strong Correlation Between Model Invariance and Generalization [54.812786542023325]
Generalization captures a model's ability to classify unseen data.
Invariance measures consistency of model predictions on transformations of the data.
From a dataset-centric view, we find a certain model's accuracy and invariance linearly correlated on different test sets.
arXiv Detail & Related papers (2022-07-14T17:08:25Z) - Entropy Production and the Role of Correlations in Quantum Brownian
Motion [77.34726150561087]
We perform a study on quantum entropy production, different kinds of correlations, and their interplay in the driven Caldeira-Leggett model of quantum Brownian motion.
arXiv Detail & Related papers (2021-08-05T13:11:05Z) - Loss function based second-order Jensen inequality and its application
to particle variational inference [112.58907653042317]
Particle variational inference (PVI) uses an ensemble of models as an empirical approximation for the posterior distribution.
PVI iteratively updates each model with a repulsion force to ensure the diversity of the optimized models.
We derive a novel generalization error bound and show that it can be reduced by enhancing the diversity of models.
arXiv Detail & Related papers (2021-06-09T12:13:51Z) - The ELBO of Variational Autoencoders Converges to a Sum of Three
Entropies [16.119724102324934]
The central objective function of a variational autoencoder (VAE) is its variational lower bound (the ELBO)
Here we show that for standard (i.e., Gaussian) VAEs the ELBO converges to a value given by the sum of three entropies.
Our derived analytical results are exact and apply for small as well as for intricate deep networks for encoder and decoder.
arXiv Detail & Related papers (2020-10-28T10:13:28Z) - Flexible mean field variational inference using mixtures of
non-overlapping exponential families [6.599344783327053]
I show that using standard mean field variational inference can fail to produce sensible results for models with sparsity-inducing priors.
I show that any mixture of a diffuse exponential family and a point mass at zero to model sparsity forms an exponential family.
arXiv Detail & Related papers (2020-10-14T01:46:56Z) - GANs with Variational Entropy Regularizers: Applications in Mitigating
the Mode-Collapse Issue [95.23775347605923]
Building on the success of deep learning, Generative Adversarial Networks (GANs) provide a modern approach to learn a probability distribution from observed samples.
GANs often suffer from the mode collapse issue where the generator fails to capture all existing modes of the input distribution.
We take an information-theoretic approach and maximize a variational lower bound on the entropy of the generated samples to increase their diversity.
arXiv Detail & Related papers (2020-09-24T19:34:37Z) - Eigenstate Entanglement Entropy in Random Quadratic Hamiltonians [0.0]
In integrable models, the volume-law coefficient depends on the subsystem fraction.
We show that the average entanglement entropy of eigenstates of the power-law random banded matrix model is close but not the same as the result for quadratic models.
arXiv Detail & Related papers (2020-06-19T18:01:15Z) - Generalized Entropy Regularization or: There's Nothing Special about
Label Smoothing [83.78668073898001]
We introduce a family of entropy regularizers, which includes label smoothing as a special case.
We find that variance in model performance can be explained largely by the resulting entropy of the model.
We advise the use of other entropy regularization methods in its place.
arXiv Detail & Related papers (2020-05-02T12:46:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.