Trajectory balance: Improved credit assignment in GFlowNets
- URL: http://arxiv.org/abs/2201.13259v3
- Date: Wed, 4 Oct 2023 16:30:14 GMT
- Title: Trajectory balance: Improved credit assignment in GFlowNets
- Authors: Nikolay Malkin, Moksh Jain, Emmanuel Bengio, Chen Sun, Yoshua Bengio
- Abstract summary: We find previously proposed learning objectives for GFlowNets, flow matching and detailed balance, to be prone to inefficient credit propagation across long action sequences.
We propose a new learning objective for GFlowNets, trajectory balance, as a more efficient alternative to previously used objectives.
In experiments on four distinct domains, we empirically demonstrate the benefits of the trajectory balance objective for GFlowNet convergence, diversity of generated samples, and robustness to long action sequences and large action spaces.
- Score: 63.687669765579585
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generative flow networks (GFlowNets) are a method for learning a stochastic
policy for generating compositional objects, such as graphs or strings, from a
given unnormalized density by sequences of actions, where many possible action
sequences may lead to the same object. We find previously proposed learning
objectives for GFlowNets, flow matching and detailed balance, which are
analogous to temporal difference learning, to be prone to inefficient credit
propagation across long action sequences. We thus propose a new learning
objective for GFlowNets, trajectory balance, as a more efficient alternative to
previously used objectives. We prove that any global minimizer of the
trajectory balance objective can define a policy that samples exactly from the
target distribution. In experiments on four distinct domains, we empirically
demonstrate the benefits of the trajectory balance objective for GFlowNet
convergence, diversity of generated samples, and robustness to long action
sequences and large action spaces.
Related papers
- On Generalization for Generative Flow Networks [54.20924253330039]
Generative Flow Networks (GFlowNets) have emerged as an innovative learning paradigm designed to address the challenge of sampling from an unnormalized probability distribution.
This paper attempts to formalize generalization in the context of GFlowNets, to link generalization with stability, and also to design experiments that assess the capacity of these models to uncover unseen parts of the reward function.
arXiv Detail & Related papers (2024-07-03T13:42:21Z) - Baking Symmetry into GFlowNets [58.932776403471635]
GFlowNets have exhibited promising performance in generating diverse candidates with high rewards.
This study aims to integrate symmetries into GFlowNets by identifying equivalent actions during the generation process.
arXiv Detail & Related papers (2024-06-08T10:11:10Z) - Looking Backward: Retrospective Backward Synthesis for Goal-Conditioned GFlowNets [27.33222647437964]
Generative Flow Networks (GFlowNets) are amortized sampling methods for learning a policy to sequentially generate objects with probabilities to their rewards.
GFlowNets exhibit a remarkable ability to generate diverse sets of high-reward proportional objects, in contrast to standard reinforcement learning approaches.
Recent works have arisen for learning goal-conditioned GFlowNets to acquire various useful properties, aiming to train a single GFlowNet capable of achieving different goals as the task specifies.
We propose a novel method named Retrospective Backward Synthesis (RBS) to address these challenges. Specifically, RBS synthesizes a new backward trajectory
arXiv Detail & Related papers (2024-06-03T09:44:10Z) - Pre-Training and Fine-Tuning Generative Flow Networks [61.90529626590415]
We introduce a novel approach for reward-free pre-training of GFlowNets.
By framing the training as a self-supervised problem, we propose an outcome-conditioned GFlowNet that learns to explore the candidate space.
We show that the pre-trained OC-GFN model can allow for a direct extraction of a policy capable of sampling from any new reward functions in downstream tasks.
arXiv Detail & Related papers (2023-10-05T09:53:22Z) - Distributional GFlowNets with Quantile Flows [73.73721901056662]
Generative Flow Networks (GFlowNets) are a new family of probabilistic samplers where an agent learns a policy for generating complex structure through a series of decision-making steps.
In this work, we adopt a distributional paradigm for GFlowNets, turning each flow function into a distribution, thus providing more informative learning signals during training.
Our proposed textitquantile matching GFlowNet learning algorithm is able to learn a risk-sensitive policy, an essential component for handling scenarios with risk uncertainty.
arXiv Detail & Related papers (2023-02-11T22:06:17Z) - A Variational Perspective on Generative Flow Networks [21.97829447881589]
Generative flow networks (GFNs) are models for sequential sampling of composite objects.
We define variational objectives for GFNs in terms of the Kullback-Leibler (KL) divergences between the forward and backward distribution.
arXiv Detail & Related papers (2022-10-14T17:45:59Z) - Improving Generative Flow Networks with Path Regularization [8.848799220256366]
Generative Flow Networks (GFlowNets) are recently proposed models for learning policies that generate compositional objects by sequences of actions with the probability proportional to a given reward function.
In this work, we propose a novel path regularization method based on optimal transport theory that places prior constraints on the underlying structure of the GFlowNets.
arXiv Detail & Related papers (2022-09-29T20:54:41Z) - Learning GFlowNets from partial episodes for improved convergence and
stability [56.99229746004125]
Generative flow networks (GFlowNets) are algorithms for training a sequential sampler of discrete objects under an unnormalized target density.
Existing training objectives for GFlowNets are either local to states or transitions, or propagate a reward signal over an entire sampling trajectory.
Inspired by the TD($lambda$) algorithm in reinforcement learning, we introduce subtrajectory balance or SubTB($lambda$), a GFlowNet training objective that can learn from partial action subsequences of varying lengths.
arXiv Detail & Related papers (2022-09-26T15:44:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.