GFlowNet-EM for learning compositional latent variable models
- URL: http://arxiv.org/abs/2302.06576v2
- Date: Sat, 3 Jun 2023 18:02:08 GMT
- Title: GFlowNet-EM for learning compositional latent variable models
- Authors: Edward J. Hu, Nikolay Malkin, Moksh Jain, Katie Everett, Alexandros
Graikos, Yoshua Bengio
- Abstract summary: A key tradeoff in modeling the posteriors over latents is between expressivity and tractable optimization.
We propose the use of GFlowNets, algorithms for sampling from an unnormalized density.
By training GFlowNets to sample from the posterior over latents, we take advantage of their strengths as amortized variational algorithms.
- Score: 115.96660869630227
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Latent variable models (LVMs) with discrete compositional latents are an
important but challenging setting due to a combinatorially large number of
possible configurations of the latents. A key tradeoff in modeling the
posteriors over latents is between expressivity and tractable optimization. For
algorithms based on expectation-maximization (EM), the E-step is often
intractable without restrictive approximations to the posterior. We propose the
use of GFlowNets, algorithms for sampling from an unnormalized density by
learning a stochastic policy for sequential construction of samples, for this
intractable E-step. By training GFlowNets to sample from the posterior over
latents, we take advantage of their strengths as amortized variational
inference algorithms for complex distributions over discrete structures. Our
approach, GFlowNet-EM, enables the training of expressive LVMs with discrete
compositional latents, as shown by experiments on non-context-free grammar
induction and on images using discrete variational autoencoders (VAEs) without
conditional independence enforced in the encoder.
Related papers
- On the Trajectory Regularity of ODE-based Diffusion Sampling [79.17334230868693]
Diffusion-based generative models use differential equations to establish a smooth connection between a complex data distribution and a tractable prior distribution.
In this paper, we identify several intriguing trajectory properties in the ODE-based sampling process of diffusion models.
arXiv Detail & Related papers (2024-05-18T15:59:41Z) - Amortizing intractable inference in large language models [56.92471123778389]
We use amortized Bayesian inference to sample from intractable posterior distributions.
We empirically demonstrate that this distribution-matching paradigm of LLM fine-tuning can serve as an effective alternative to maximum-likelihood training.
As an important application, we interpret chain-of-thought reasoning as a latent variable modeling problem.
arXiv Detail & Related papers (2023-10-06T16:36:08Z) - Latent Variable Representation for Reinforcement Learning [131.03944557979725]
It remains unclear theoretically and empirically how latent variable models may facilitate learning, planning, and exploration to improve the sample efficiency of model-based reinforcement learning.
We provide a representation view of the latent variable models for state-action value functions, which allows both tractable variational learning algorithm and effective implementation of the optimism/pessimism principle.
In particular, we propose a computationally efficient planning algorithm with UCB exploration by incorporating kernel embeddings of latent variable models.
arXiv Detail & Related papers (2022-12-17T00:26:31Z) - SIReN-VAE: Leveraging Flows and Amortized Inference for Bayesian
Networks [2.8597160727750564]
This work explores incorporating arbitrary dependency structures, as specified by Bayesian networks, into VAEs.
This is achieved by extending both the prior and inference network with graphical residual flows.
We compare our model's performance on several synthetic datasets and show its potential in data-sparse settings.
arXiv Detail & Related papers (2022-04-23T10:31:08Z) - Direct Evolutionary Optimization of Variational Autoencoders With Binary
Latents [0.0]
We show that it is possible to train Variational Autoencoders (VAEs) with discrete latents without sampling-based approximation and re parameterization.
In contrast to large supervised networks, the here investigated VAEs can, e.g., denoise a single image without previous training on clean data and/or training on large image datasets.
arXiv Detail & Related papers (2020-11-27T12:42:12Z) - Autoregressive Score Matching [113.4502004812927]
We propose autoregressive conditional score models (AR-CSM) where we parameterize the joint distribution in terms of the derivatives of univariable log-conditionals (scores)
For AR-CSM models, this divergence between data and model distributions can be computed and optimized efficiently, requiring no expensive sampling or adversarial training.
We show with extensive experimental results that it can be applied to density estimation on synthetic data, image generation, image denoising, and training latent variable models with implicit encoders.
arXiv Detail & Related papers (2020-10-24T07:01:24Z) - Relaxed-Responsibility Hierarchical Discrete VAEs [3.976291254896486]
We introduce textitRelaxed-Responsibility Vector-Quantisation, a novel way to parameterise discrete latent variables.
We achieve state-of-the-art bits-per-dim results for various standard datasets.
arXiv Detail & Related papers (2020-07-14T19:10:05Z) - Efficient Marginalization of Discrete and Structured Latent Variables
via Sparsity [26.518803984578867]
Training neural network models with discrete (categorical or structured) latent variables can be computationally challenging.
One typically resorts to sampling-based approximations of the true marginal.
We propose a new training strategy which replaces these estimators by an exact yet efficient marginalization.
arXiv Detail & Related papers (2020-07-03T19:36:35Z) - Improve Variational Autoencoder for Text Generationwith Discrete Latent
Bottleneck [52.08901549360262]
Variational autoencoders (VAEs) are essential tools in end-to-end representation learning.
VAEs tend to ignore latent variables with a strong auto-regressive decoder.
We propose a principled approach to enforce an implicit latent feature matching in a more compact latent space.
arXiv Detail & Related papers (2020-04-22T14:41:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.