Better Training of GFlowNets with Local Credit and Incomplete
Trajectories
- URL: http://arxiv.org/abs/2302.01687v2
- Date: Sun, 18 Jun 2023 08:45:28 GMT
- Title: Better Training of GFlowNets with Local Credit and Incomplete
Trajectories
- Authors: Ling Pan, Nikolay Malkin, Dinghuai Zhang, Yoshua Bengio
- Abstract summary: We consider the case where the energy function can be applied not just to terminal states but also to intermediate states.
This is for example achieved when the energy function is additive, with terms available along the trajectory.
This enables a training objective that can be applied to update parameters even with incomplete trajectories.
- Score: 81.14310509871935
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generative Flow Networks or GFlowNets are related to Monte-Carlo Markov chain
methods (as they sample from a distribution specified by an energy function),
reinforcement learning (as they learn a policy to sample composed objects
through a sequence of steps), generative models (as they learn to represent and
sample from a distribution) and amortized variational methods (as they can be
used to learn to approximate and sample from an otherwise intractable
posterior, given a prior and a likelihood). They are trained to generate an
object $x$ through a sequence of steps with probability proportional to some
reward function $R(x)$ (or $\exp(-\mathcal{E}(x))$ with $\mathcal{E}(x)$
denoting the energy function), given at the end of the generative trajectory.
Like for other RL settings where the reward is only given at the end, the
efficiency of training and credit assignment may suffer when those trajectories
are longer. With previous GFlowNet work, no learning was possible from
incomplete trajectories (lacking a terminal state and the computation of the
associated reward). In this paper, we consider the case where the energy
function can be applied not just to terminal states but also to intermediate
states. This is for example achieved when the energy function is additive, with
terms available along the trajectory. We show how to reparameterize the
GFlowNet state flow function to take advantage of the partial reward already
accrued at each state. This enables a training objective that can be applied to
update parameters even with incomplete trajectories. Even when complete
trajectories are available, being able to obtain more localized credit and
gradients is found to speed up training convergence, as demonstrated across
many simulations.
Related papers
- Towards Understanding and Improving GFlowNet Training [71.85707593318297]
We introduce an efficient evaluation strategy to compare the learned sampling distribution to the target reward distribution.
We propose prioritized replay training of high-reward $x$, relative edge flow policy parametrization, and a novel guided trajectory balance objective.
arXiv Detail & Related papers (2023-05-11T22:50:41Z) - Learning GFlowNets from partial episodes for improved convergence and
stability [56.99229746004125]
Generative flow networks (GFlowNets) are algorithms for training a sequential sampler of discrete objects under an unnormalized target density.
Existing training objectives for GFlowNets are either local to states or transitions, or propagate a reward signal over an entire sampling trajectory.
Inspired by the TD($lambda$) algorithm in reinforcement learning, we introduce subtrajectory balance or SubTB($lambda$), a GFlowNet training objective that can learn from partial action subsequences of varying lengths.
arXiv Detail & Related papers (2022-09-26T15:44:24Z) - Provably Efficient Offline Reinforcement Learning with Trajectory-Wise
Reward [66.81579829897392]
We propose a novel offline reinforcement learning algorithm called Pessimistic vAlue iteRaTion with rEward Decomposition (PARTED)
PARTED decomposes the trajectory return into per-step proxy rewards via least-squares-based reward redistribution, and then performs pessimistic value based on the learned proxy reward.
To the best of our knowledge, PARTED is the first offline RL algorithm that is provably efficient in general MDP with trajectory-wise reward.
arXiv Detail & Related papers (2022-06-13T19:11:22Z) - GFlowNet Foundations [66.69854262276391]
Generative Flow Networks (GFlowNets) have been introduced as a method to sample a diverse set of candidates in an active learning context.
We show a number of additional theoretical properties of GFlowNets.
arXiv Detail & Related papers (2021-11-17T17:59:54Z) - Flow Network based Generative Models for Non-Iterative Diverse Candidate
Generation [110.09855163856326]
This paper is about the problem of learning a policy for generating an object from a sequence of actions.
We propose GFlowNet, based on a view of the generative process as a flow network.
We prove that any global minimum of the proposed objectives yields a policy which samples from the desired distribution.
arXiv Detail & Related papers (2021-06-08T14:21:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.