Latent Optimal Paths by Gumbel Propagation for Variational Bayesian Dynamic Programming
- URL: http://arxiv.org/abs/2306.02568v3
- Date: Tue, 25 Jun 2024 06:13:38 GMT
- Title: Latent Optimal Paths by Gumbel Propagation for Variational Bayesian Dynamic Programming
- Authors: Xinlei Niu, Christian Walder, Jing Zhang, Charles Patrick Martin,
- Abstract summary: We show the equivalence of the Gibbs distribution to a message-passing algorithm by the properties of the Gumbel distribution.
We propose the BDP-VAE which captures structured sparse optimal paths as latent variables.
- Score: 12.249274845167415
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We propose the stochastic optimal path which solves the classical optimal path problem by a probability-softening solution. This unified approach transforms a wide range of DP problems into directed acyclic graphs in which all paths follow a Gibbs distribution. We show the equivalence of the Gibbs distribution to a message-passing algorithm by the properties of the Gumbel distribution and give all the ingredients required for variational Bayesian inference of a latent path, namely Bayesian dynamic programming (BDP). We demonstrate the usage of BDP in the latent space of variational autoencoders (VAEs) and propose the BDP-VAE which captures structured sparse optimal paths as latent variables. This enables end-to-end training for generative tasks in which models rely on unobserved structural information. At last, we validate the behavior of our approach and showcase its applicability in two real-world applications: text-to-speech and singing voice synthesis. Our implementation code is available at \url{https://github.com/XinleiNIU/LatentOptimalPathsBayesianDP}.
Related papers
- Learning local neighborhoods of non-Gaussian graphical models: A measure transport approach [0.3749861135832072]
We propose a scalable algorithm to infer the conditional independence relationships of each variable by exploiting the local Markov property.
The proposed method, named Localized Sparsity Identification for Non-Gaussian Distributions (L-SING), estimates the graph by using flexible classes of transport maps.
arXiv Detail & Related papers (2025-03-18T04:53:22Z) - Beyond Bayesian Model Averaging over Paths in Probabilistic Programs with Stochastic Support [20.53123189114551]
We show that making predictions with this full posterior implicitly performs a Bayesian model averaging (BMA) over paths.
We propose alternative mechanisms for path weighting: one based on stacking and one based on ideas from PAC-Bayes.
arXiv Detail & Related papers (2023-10-23T12:57:03Z) - Amortizing intractable inference in large language models [56.92471123778389]
We use amortized Bayesian inference to sample from intractable posterior distributions.
We empirically demonstrate that this distribution-matching paradigm of LLM fine-tuning can serve as an effective alternative to maximum-likelihood training.
As an important application, we interpret chain-of-thought reasoning as a latent variable modeling problem.
arXiv Detail & Related papers (2023-10-06T16:36:08Z) - Delta-AI: Local objectives for amortized inference in sparse graphical models [64.5938437823851]
We present a new algorithm for amortized inference in sparse probabilistic graphical models (PGMs)
Our approach is based on the observation that when the sampling of variables in a PGM is seen as a sequence of actions taken by an agent, sparsity of the PGM enables local credit assignment in the agent's policy learning objective.
We illustrate $Delta$-AI's effectiveness for sampling from synthetic PGMs and training latent variable models with sparse factor structure.
arXiv Detail & Related papers (2023-10-03T20:37:03Z) - Thompson sampling for improved exploration in GFlowNets [75.89693358516944]
Generative flow networks (GFlowNets) are amortized variational inference algorithms that treat sampling from a distribution over compositional objects as a sequential decision-making problem with a learnable action policy.
We show in two domains that TS-GFN yields improved exploration and thus faster convergence to the target distribution than the off-policy exploration strategies used in past work.
arXiv Detail & Related papers (2023-06-30T14:19:44Z) - A Langevin-like Sampler for Discrete Distributions [15.260564469562542]
discrete Langevin proposal (DLP) is a simple and scalable gradient-based proposal for sampling complex high-dimensional discrete distributions.
DLP is able to update all coordinates in parallel in a single step and the magnitude of changes is controlled by a stepsize.
We develop several variants of sampling algorithms, including unadjusted, adjusted, and preconditioned versions.
arXiv Detail & Related papers (2022-06-20T17:36:03Z) - Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states.
Our method is widely applicable to classical DP-based inference.
It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z) - LDC-VAE: A Latent Distribution Consistency Approach to Variational
AutoEncoders [26.349085280990657]
We propose a latent distribution consistency approach to avoid substantial inconsistency between the posterior and prior latent distributions.
Our method has achieved comparable or even better performance than several powerful improvements of VAEs.
arXiv Detail & Related papers (2021-09-22T10:34:40Z) - Branch-and-Pruning Optimization Towards Global Optimality in Deep
Learning [34.5772952287821]
We propose a novel approximation algorithm, em BPGrad, towards optimizing deep models globally via branch and pruning.
We prove that, by repeating such a branch-and-pruning procedure, we can locate the global optimality within finite iterations.
Empirically an efficient adaptive solver based on BPGrad for DL is proposed as well, and it outperforms conventional DL solvers such as Adagrad, Adadelta, RMSProp, and Adam.
arXiv Detail & Related papers (2021-04-05T00:43:03Z) - Accelerating Metropolis-Hastings with Lightweight Inference Compilation [1.2633299843878945]
Inference Compilation (LIC) implements amortized inference within an open-universe probabilistic programming language.
LIC forgoes importance sampling of linear execution traces in favor of operating directly on Bayesian networks.
Experimental results show LIC can produce proposers which have less parameters, greater robustness to nuisance random variables, and improved posterior sampling.
arXiv Detail & Related papers (2020-10-23T02:05:37Z) - Probabilistic Circuits for Variational Inference in Discrete Graphical
Models [101.28528515775842]
Inference in discrete graphical models with variational methods is difficult.
Many sampling-based methods have been proposed for estimating Evidence Lower Bound (ELBO)
We propose a new approach that leverages the tractability of probabilistic circuit models, such as Sum Product Networks (SPN)
We show that selective-SPNs are suitable as an expressive variational distribution, and prove that when the log-density of the target model is aweighted the corresponding ELBO can be computed analytically.
arXiv Detail & Related papers (2020-10-22T05:04:38Z) - Implicit Distributional Reinforcement Learning [61.166030238490634]
implicit distributional actor-critic (IDAC) built on two deep generator networks (DGNs)
Semi-implicit actor (SIA) powered by a flexible policy distribution.
We observe IDAC outperforms state-of-the-art algorithms on representative OpenAI Gym environments.
arXiv Detail & Related papers (2020-07-13T02:52:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.