Related papers: Discrete Probabilistic Inference as Control in Multi-path Environments

Discrete Probabilistic Inference as Control in Multi-path Environments

URL: http://arxiv.org/abs/2402.10309v2
Date: Mon, 27 May 2024 20:58:38 GMT
Title: Discrete Probabilistic Inference as Control in Multi-path Environments
Authors: Tristan Deleu, Padideh Nouri, Nikolay Malkin, Doina Precup, Yoshua Bengio,
Abstract summary: We consider the problem of sampling from a discrete and structured distribution as a sequential decision problem. We show that GFlowNets learn a policy that samples objects proportionally to their reward by enforcing a conservation of flows. We also prove that some flow-matching objectives found in the GFlowNet literature are in fact equivalent to well-established MaxEnt RL algorithms with a corrected reward.
Score: 84.67055173040107
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We consider the problem of sampling from a discrete and structured distribution as a sequential decision problem, where the objective is to find a stochastic policy such that objects are sampled at the end of this sequential process proportionally to some predefined reward. While we could use maximum entropy Reinforcement Learning (MaxEnt RL) to solve this problem for some distributions, it has been shown that in general, the distribution over states induced by the optimal policy may be biased in cases where there are multiple ways to generate the same object. To address this issue, Generative Flow Networks (GFlowNets) learn a stochastic policy that samples objects proportionally to their reward by approximately enforcing a conservation of flows across the whole Markov Decision Process (MDP). In this paper, we extend recent methods correcting the reward in order to guarantee that the marginal distribution induced by the optimal MaxEnt RL policy is proportional to the original reward, regardless of the structure of the underlying MDP. We also prove that some flow-matching objectives found in the GFlowNet literature are in fact equivalent to well-established MaxEnt RL algorithms with a corrected reward. Finally, we study empirically the performance of multiple MaxEnt RL and GFlowNet algorithms on multiple problems involving sampling from discrete distributions.

Related papers

Boosting Maximum Entropy Reinforcement Learning via One-Step Flow Matching [8.665369041430969]
Flow Matching (FM) enables one-step generation, but integrating it into Entropy Reinforcement Learning (MaxEnt RL) is challenging.<n>We propose textbfFlow-based textbfLog-likelihood-textbfAware textbfMaximum textbfEntropy RL (textbfFLAME), a principled framework that addresses these challenges.
arXiv Detail & Related papers (2026-02-02T03:54:11Z)
Generative Diffusion Models for Resource Allocation in Wireless Networks [77.36145730415045]
We train a policy to imitate an expert and generate new samples from the optimal distribution. We achieve near-optimal performance through sequential execution of the generated samples. We present numerical results in a case study of power control in multi-user interference networks.
arXiv Detail & Related papers (2025-04-28T21:44:31Z)
Optimizing Backward Policies in GFlowNets via Trajectory Likelihood Maximization [4.158255103170876]
GFlowNets are a family of generative models that learn to sample objects proportional to a given reward function. Recent results show a close relationship between GFlowNet training and entropy-regularized reinforcement learning problems. We introduce a simple backward policy optimization algorithm that involves direct sequentially of the value function in an entropy-regularized Markov Decision Process.
arXiv Detail & Related papers (2024-10-20T19:12:14Z)
Sampling from Energy-based Policies using Diffusion [14.542411354617983]
We introduce a diffusion-based approach for sampling from energy-based policies, where the negative Q-function defines the energy function.<n>We show that our approach enhances sample efficiency in continuous control tasks and captures multimodal behaviors, addressing key limitations of existing methods.
arXiv Detail & Related papers (2024-10-02T08:09:33Z)
On Policy Evaluation Algorithms in Distributional Reinforcement Learning [0.0]
We introduce a novel class of algorithms to efficiently approximate the unknown return distributions in policy evaluation problems from distributional reinforcement learning (DRL) For a plain instance of our proposed class of algorithms we prove error bounds, both within Wasserstein and Kolmogorov--Smirnov distances. For return distributions having probability density functions the algorithms yield approximations for these densities; error bounds are given within supremum norm.
arXiv Detail & Related papers (2024-07-19T10:06:01Z)
Random Policy Evaluation Uncovers Policies of Generative Flow Networks [12.294107455811496]
GFlowNets share a strong connection with reinforcement learning (RL) that typically aims to maximize reward. In this paper, we reveal a fundamental connection between GFlowNets and one of the most basic components of RL -- policy evaluation. We introduce a rectified random policy evaluation (RPE) algorithm, which achieves the same reward-matching effect as GFlowNets.
arXiv Detail & Related papers (2024-06-04T11:11:53Z)
Amortizing intractable inference in large language models [56.92471123778389]
We use amortized Bayesian inference to sample from intractable posterior distributions. We empirically demonstrate that this distribution-matching paradigm of LLM fine-tuning can serve as an effective alternative to maximum-likelihood training. As an important application, we interpret chain-of-thought reasoning as a latent variable modeling problem.
arXiv Detail & Related papers (2023-10-06T16:36:08Z)
Conditional Sampling of Variational Autoencoders via Iterated Approximate Ancestral Sampling [7.357511266926065]
Conditional sampling of variational autoencoders (VAEs) is needed in various applications, such as missing data imputation, but is computationally intractable. A principled choice forally exact conditional sampling is Metropolis-within-Gibbs (MWG)
arXiv Detail & Related papers (2023-08-17T16:08:18Z)
Policy Evaluation in Distributional LQR [70.63903506291383]
We provide a closed-form expression of the distribution of the random return. We show that this distribution can be approximated by a finite number of random variables. Using the approximate return distribution, we propose a zeroth-order policy gradient algorithm for risk-averse LQR.
arXiv Detail & Related papers (2023-03-23T20:27:40Z)
GFlowNet Foundations [66.69854262276391]
Generative Flow Networks (GFlowNets) have been introduced as a method to sample a diverse set of candidates in an active learning context. We show a number of additional theoretical properties of GFlowNets.
arXiv Detail & Related papers (2021-11-17T17:59:54Z)
Adaptive Sampling for Best Policy Identification in Markov Decision Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model. The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z)
Implicit Distributional Reinforcement Learning [61.166030238490634]
implicit distributional actor-critic (IDAC) built on two deep generator networks (DGNs) Semi-implicit actor (SIA) powered by a flexible policy distribution. We observe IDAC outperforms state-of-the-art algorithms on representative OpenAI Gym environments.
arXiv Detail & Related papers (2020-07-13T02:52:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.