Related papers: Learning Sparse Codes with Entropy-Based ELBOs

Learning Sparse Codes with Entropy-Based ELBOs

URL: http://arxiv.org/abs/2311.01888v2
Date: Tue, 9 Apr 2024 18:23:58 GMT
Title: Learning Sparse Codes with Entropy-Based ELBOs
Authors: Dmytro Velychko, Simon Damm, Asja Fischer, Jörg Lücke,
Abstract summary: We derive a solely entropy-based learning objective for the parameters of standard sparse coding. The novel variational objective has the following features: (A) unlike MAP approximations, it uses non-trivial posterior approximations for probabilistic inference.
Score: 13.906627869457232
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Standard probabilistic sparse coding assumes a Laplace prior, a linear mapping from latents to observables, and Gaussian observable distributions. We here derive a solely entropy-based learning objective for the parameters of standard sparse coding. The novel variational objective has the following features: (A) unlike MAP approximations, it uses non-trivial posterior approximations for probabilistic inference; (B) unlike for previous non-trivial approximations, the novel objective is fully analytical; and (C) the objective allows for a novel principled form of annealing. The objective is derived by first showing that the standard ELBO objective converges to a sum of entropies, which matches similar recent results for generative models with Gaussian priors. The conditions under which the ELBO becomes equal to entropies are then shown to have analytical solutions, which leads to the fully analytical objective. Numerical experiments are used to demonstrate the feasibility of learning with such entropy-based ELBOs. We investigate different posterior approximations including Gaussians with correlated latents and deep amortized approximations. Furthermore, we numerically investigate entropy-based annealing which results in improved learning. Our main contributions are theoretical, however, and they are twofold: (1) for non-trivial posterior approximations, we provide the (to the knowledge of the authors) first analytical ELBO objective for standard probabilistic sparse coding; and (2) we provide the first demonstration on how a recently shown convergence of the ELBO to entropy sums can be used for learning.

Related papers

Unregularized Linear Convergence in Zero-Sum Game from Preference Feedback [50.89125374999765]
We provide the first convergence guarantee for Optimistic Multiplicative Weights Update ($mathtOMWU$) in NLHF.<n>Our analysis identifies a novel marginal convergence behavior, where the probability of rarely played actions grows exponentially from exponentially small values.
arXiv Detail & Related papers (2025-12-31T12:08:29Z)
Prior shift estimation for positive unlabeled data through the lens of kernel embedding [0.3568466510804538]
We study estimation of a class prior for unlabeled target samples which possibly differs from that of source population.<n>We introduce a novel direct estimator of a class prior which avoids estimation of posterior probabilities in both populations.<n>We show that the proposal works consistently on par or better than its competitors.
arXiv Detail & Related papers (2025-02-28T16:12:53Z)
Graph Stochastic Neural Process for Inductive Few-shot Knowledge Graph Completion [63.68647582680998]
We focus on a task called inductive few-shot knowledge graph completion (I-FKGC) Inspired by the idea of inductive reasoning, we cast I-FKGC as an inductive reasoning problem. We present a neural process-based hypothesis extractor that models the joint distribution of hypothesis, from which we can sample a hypothesis for predictions. In the second module, based on the hypothesis, we propose a graph attention-based predictor to test if the triple in the query set aligns with the extracted hypothesis.
arXiv Detail & Related papers (2024-08-03T13:37:40Z)
von Mises Quasi-Processes for Bayesian Circular Regression [57.88921637944379]
We explore a family of expressive and interpretable distributions over circle-valued random functions. The resulting probability model has connections with continuous spin models in statistical physics. For posterior inference, we introduce a new Stratonovich-like augmentation that lends itself to fast Markov Chain Monte Carlo sampling.
arXiv Detail & Related papers (2024-06-19T01:57:21Z)
On the Convergence of the ELBO to Entropy Sums [3.345575993695074]
We show that the variational lower bound is at all stationary points of learning equal to a sum of entropies. For a very large class of generative models, the variational lower bound is at all stationary points of learning.
arXiv Detail & Related papers (2022-09-07T11:33:32Z)
Optimal regularizations for data generation with probabilistic graphical models [0.0]
Empirically, well-chosen regularization schemes dramatically improve the quality of the inferred models. We consider the particular case of L 2 and L 1 regularizations in the Maximum A Posteriori (MAP) inference of generative pairwise graphical models.
arXiv Detail & Related papers (2021-12-02T14:45:16Z)
Loss function based second-order Jensen inequality and its application to particle variational inference [112.58907653042317]
Particle variational inference (PVI) uses an ensemble of models as an empirical approximation for the posterior distribution. PVI iteratively updates each model with a repulsion force to ensure the diversity of the optimized models. We derive a novel generalization error bound and show that it can be reduced by enhancing the diversity of models.
arXiv Detail & Related papers (2021-06-09T12:13:51Z)
Posterior-Aided Regularization for Likelihood-Free Inference [23.708122045184698]
Posterior-Aided Regularization (PAR) is applicable to learning the density estimator, regardless of the model structure. We provide a unified estimation method of PAR to estimate both reverse KL term and mutual information term with a single neural network.
arXiv Detail & Related papers (2021-02-15T16:59:30Z)
Efficient Semi-Implicit Variational Inference [65.07058307271329]
We propose an efficient and scalable semi-implicit extrapolational (SIVI) Our method maps SIVI's evidence to a rigorous inference of lower gradient values.
arXiv Detail & Related papers (2021-01-15T11:39:09Z)
The ELBO of Variational Autoencoders Converges to a Sum of Three Entropies [16.119724102324934]
The central objective function of a variational autoencoder (VAE) is its variational lower bound (the ELBO) Here we show that for standard (i.e., Gaussian) VAEs the ELBO converges to a value given by the sum of three entropies. Our derived analytical results are exact and apply for small as well as for intricate deep networks for encoder and decoder.
arXiv Detail & Related papers (2020-10-28T10:13:28Z)
SODEN: A Scalable Continuous-Time Survival Model through Ordinary Differential Equation Networks [14.564168076456822]
We propose a flexible model for survival analysis using neural networks along with scalable optimization algorithms. We demonstrate the effectiveness of the proposed method in comparison to existing state-of-the-art deep learning survival analysis models.
arXiv Detail & Related papers (2020-08-19T19:11:25Z)
Mean-Field Approximation to Gaussian-Softmax Integral with Application to Uncertainty Estimation [23.38076756988258]
We propose a new single-model based approach to quantify uncertainty in deep neural networks. We use a mean-field approximation formula to compute an analytically intractable integral. Empirically, the proposed approach performs competitively when compared to state-of-the-art methods.
arXiv Detail & Related papers (2020-06-13T07:32:38Z)
On the Convergence Rate of Projected Gradient Descent for a Back-Projection based Objective [58.33065918353532]
We consider a back-projection based fidelity term as an alternative to the common least squares (LS) We show that using the BP term, rather than the LS term, requires fewer iterations of optimization algorithms.
arXiv Detail & Related papers (2020-05-03T00:58:23Z)
SLEIPNIR: Deterministic and Provably Accurate Feature Expansion for Gaussian Process Regression with Derivatives [86.01677297601624]
We propose a novel approach for scaling GP regression with derivatives based on quadrature Fourier features. We prove deterministic, non-asymptotic and exponentially fast decaying error bounds which apply for both the approximated kernel as well as the approximated posterior.
arXiv Detail & Related papers (2020-03-05T14:33:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.