Generative Flow Networks as Entropy-Regularized RL
- URL: http://arxiv.org/abs/2310.12934v3
- Date: Sun, 25 Feb 2024 19:39:24 GMT
- Title: Generative Flow Networks as Entropy-Regularized RL
- Authors: Daniil Tiapkin, Nikita Morozov, Alexey Naumov, Dmitry Vetrov
- Abstract summary: generative flow networks (GFlowNets) are a method of training a policy to sample compositional objects with proportional probabilities to a given reward via a sequence of actions.
We demonstrate how the task of learning a generative flow network can be efficiently as an entropy-regularized reinforcement learning problem.
Contrary to previously reported results, we show that entropic RL approaches can be competitive against established GFlowNet training methods.
- Score: 4.857649518812728
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The recently proposed generative flow networks (GFlowNets) are a method of
training a policy to sample compositional discrete objects with probabilities
proportional to a given reward via a sequence of actions. GFlowNets exploit the
sequential nature of the problem, drawing parallels with reinforcement learning
(RL). Our work extends the connection between RL and GFlowNets to a general
case. We demonstrate how the task of learning a generative flow network can be
efficiently redefined as an entropy-regularized RL problem with a specific
reward and regularizer structure. Furthermore, we illustrate the practical
efficiency of this reformulation by applying standard soft RL algorithms to
GFlowNet training across several probabilistic modeling tasks. Contrary to
previously reported results, we show that entropic RL approaches can be
competitive against established GFlowNet training methods. This perspective
opens a direct path for integrating RL principles into the realm of generative
flow networks.
Related papers
- Optimizing Backward Policies in GFlowNets via Trajectory Likelihood Maximization [4.158255103170876]
GFlowNets are a family of generative models that learn to sample objects proportional to a given reward function.
Recent results show a close relationship between GFlowNet training and entropy-regularized reinforcement learning problems.
We introduce a simple backward policy optimization algorithm that involves direct sequentially of the value function in an entropy-regularized Markov Decision Process.
arXiv Detail & Related papers (2024-10-20T19:12:14Z) - GFlowNet Training by Policy Gradients [11.02335801879944]
We propose a new GFlowNet training framework, with policy-dependent rewards, that bridges keeping flow balance of GFlowNets to optimizing the expected accumulated reward in traditional Reinforcement-Learning (RL)
This enables the derivation of new policy-based GFlowNet training methods, in contrast to existing ones resembling value-based RL.
arXiv Detail & Related papers (2024-08-12T01:24:49Z) - On Generalization for Generative Flow Networks [54.20924253330039]
Generative Flow Networks (GFlowNets) have emerged as an innovative learning paradigm designed to address the challenge of sampling from an unnormalized probability distribution.
This paper attempts to formalize generalization in the context of GFlowNets, to link generalization with stability, and also to design experiments that assess the capacity of these models to uncover unseen parts of the reward function.
arXiv Detail & Related papers (2024-07-03T13:42:21Z) - Rectifying Reinforcement Learning for Reward Matching [12.294107455811496]
We establish a new connection between GFlowNets and policy evaluation for a uniform policy.
We propose a novel rectified policy evaluation algorithm, which achieves the same reward-matching effect as GFlowNets.
arXiv Detail & Related papers (2024-06-04T11:11:53Z) - Evolution Guided Generative Flow Networks [11.609895436955242]
Generative Flow Networks (GFlowNets) learn to sample compositional objects proportional to their rewards.
One big challenge of GFlowNets is training them effectively when dealing with long time horizons and sparse rewards.
We propose Evolution guided generative flow networks (EGFN), a simple but powerful augmentation to the GFlowNets training using Evolutionary algorithms (EA)
arXiv Detail & Related papers (2024-02-03T15:28:53Z) - Pre-Training and Fine-Tuning Generative Flow Networks [61.90529626590415]
We introduce a novel approach for reward-free pre-training of GFlowNets.
By framing the training as a self-supervised problem, we propose an outcome-conditioned GFlowNet that learns to explore the candidate space.
We show that the pre-trained OC-GFN model can allow for a direct extraction of a policy capable of sampling from any new reward functions in downstream tasks.
arXiv Detail & Related papers (2023-10-05T09:53:22Z) - Stochastic Generative Flow Networks [89.34644133901647]
Generative Flow Networks (or GFlowNets) learn to sample complex structures through the lens of "inference as control"
Existing GFlowNets can be applied only to deterministic environments, and fail in more general tasks with dynamics.
This paper introduces GFlowNets, a new algorithm that extends GFlowNets to environments.
arXiv Detail & Related papers (2023-02-19T03:19:40Z) - Distributional GFlowNets with Quantile Flows [73.73721901056662]
Generative Flow Networks (GFlowNets) are a new family of probabilistic samplers where an agent learns a policy for generating complex structure through a series of decision-making steps.
In this work, we adopt a distributional paradigm for GFlowNets, turning each flow function into a distribution, thus providing more informative learning signals during training.
Our proposed textitquantile matching GFlowNet learning algorithm is able to learn a risk-sensitive policy, an essential component for handling scenarios with risk uncertainty.
arXiv Detail & Related papers (2023-02-11T22:06:17Z) - Learning GFlowNets from partial episodes for improved convergence and
stability [56.99229746004125]
Generative flow networks (GFlowNets) are algorithms for training a sequential sampler of discrete objects under an unnormalized target density.
Existing training objectives for GFlowNets are either local to states or transitions, or propagate a reward signal over an entire sampling trajectory.
Inspired by the TD($lambda$) algorithm in reinforcement learning, we introduce subtrajectory balance or SubTB($lambda$), a GFlowNet training objective that can learn from partial action subsequences of varying lengths.
arXiv Detail & Related papers (2022-09-26T15:44:24Z) - Generative Flow Networks for Discrete Probabilistic Modeling [118.81967600750428]
We present energy-based generative flow networks (EB-GFN)
EB-GFN is a novel probabilistic modeling algorithm for high-dimensional discrete data.
We show how GFlowNets can approximately perform large-block Gibbs sampling to mix between modes.
arXiv Detail & Related papers (2022-02-03T01:27:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.