Related papers: Pessimistic Backward Policy for GFlowNets

Pessimistic Backward Policy for GFlowNets

URL: http://arxiv.org/abs/2405.16012v3
Date: Tue, 29 Oct 2024 03:11:17 GMT
Title: Pessimistic Backward Policy for GFlowNets
Authors: Hyosoon Jang, Yunhui Jang, Minsu Kim, Jinkyoo Park, Sungsoo Ahn,
Abstract summary: We study Generative Flow Networks (GFlowNets), which learn to sample objects proportionally to a given reward function. In this work, we observe that GFlowNets tend to under-exploit the high-reward objects due to training on insufficient number of trajectories. We propose a pessimistic backward policy for GFlowNets, which maximizes the observed flow to align closely with the true reward for the object.
Score: 40.00805723326561
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper studies Generative Flow Networks (GFlowNets), which learn to sample objects proportionally to a given reward function through the trajectory of state transitions. In this work, we observe that GFlowNets tend to under-exploit the high-reward objects due to training on insufficient number of trajectories, which may lead to a large gap between the estimated flow and the (known) reward value. In response to this challenge, we propose a pessimistic backward policy for GFlowNets (PBP-GFN), which maximizes the observed flow to align closely with the true reward for the object. We extensively evaluate PBP-GFN across eight benchmarks, including hyper-grid environment, bag generation, structured set generation, molecular generation, and four RNA sequence generation tasks. In particular, PBP-GFN enhances the discovery of high-reward objects, maintains the diversity of the objects, and consistently outperforms existing methods.

Related papers

Optimizing Backward Policies in GFlowNets via Trajectory Likelihood Maximization [4.158255103170876]
GFlowNets are a family of generative models that learn to sample objects proportional to a given reward function. Recent results show a close relationship between GFlowNet training and entropy-regularized reinforcement learning problems. We introduce a simple backward policy optimization algorithm that involves direct sequentially of the value function in an entropy-regularized Markov Decision Process.
arXiv Detail & Related papers (2024-10-20T19:12:14Z)
Baking Symmetry into GFlowNets [58.932776403471635]
GFlowNets have exhibited promising performance in generating diverse candidates with high rewards. This study aims to integrate symmetries into GFlowNets by identifying equivalent actions during the generation process.
arXiv Detail & Related papers (2024-06-08T10:11:10Z)
Random Policy Evaluation Uncovers Policies of Generative Flow Networks [12.294107455811496]
GFlowNets share a strong connection with reinforcement learning (RL) that typically aims to maximize reward. In this paper, we reveal a fundamental connection between GFlowNets and one of the most basic components of RL -- policy evaluation. We introduce a rectified random policy evaluation (RPE) algorithm, which achieves the same reward-matching effect as GFlowNets.
arXiv Detail & Related papers (2024-06-04T11:11:53Z)
Looking Backward: Retrospective Backward Synthesis for Goal-Conditioned GFlowNets [27.33222647437964]
Generative Flow Networks (GFlowNets) are amortized sampling methods for learning a policy to sequentially generate objects with probabilities to their rewards. GFlowNets exhibit a remarkable ability to generate diverse sets of high-reward proportional objects, in contrast to standard reinforcement learning approaches. Recent works have arisen for learning goal-conditioned GFlowNets to acquire various useful properties, aiming to train a single GFlowNet capable of achieving different goals as the task specifies. We propose a novel method named Retrospective Backward Synthesis (RBS) to address these challenges. Specifically, RBS synthesizes a new backward trajectory
arXiv Detail & Related papers (2024-06-03T09:44:10Z)
Evolution Guided Generative Flow Networks [11.609895436955242]
Generative Flow Networks (GFlowNets) learn to sample compositional objects proportional to their rewards. One big challenge of GFlowNets is training them effectively when dealing with long time horizons and sparse rewards. We propose Evolution guided generative flow networks (EGFN), a simple but powerful augmentation to the GFlowNets training using Evolutionary algorithms (EA)
arXiv Detail & Related papers (2024-02-03T15:28:53Z)
Pre-Training and Fine-Tuning Generative Flow Networks [61.90529626590415]
We introduce a novel approach for reward-free pre-training of GFlowNets. By framing the training as a self-supervised problem, we propose an outcome-conditioned GFlowNet that learns to explore the candidate space. We show that the pre-trained OC-GFN model can allow for a direct extraction of a policy capable of sampling from any new reward functions in downstream tasks.
arXiv Detail & Related papers (2023-10-05T09:53:22Z)
Stochastic Generative Flow Networks [89.34644133901647]
Generative Flow Networks (or GFlowNets) learn to sample complex structures through the lens of "inference as control" Existing GFlowNets can be applied only to deterministic environments, and fail in more general tasks with dynamics. This paper introduces GFlowNets, a new algorithm that extends GFlowNets to environments.
arXiv Detail & Related papers (2023-02-19T03:19:40Z)
Generative Augmented Flow Networks [88.50647244459009]
We propose Generative Augmented Flow Networks (GAFlowNets) to incorporate intermediate rewards into GFlowNets. GAFlowNets can leverage edge-based and state-based intrinsic rewards in a joint way to improve exploration.
arXiv Detail & Related papers (2022-10-07T03:33:56Z)
Trajectory balance: Improved credit assignment in GFlowNets [63.687669765579585]
We find previously proposed learning objectives for GFlowNets, flow matching and detailed balance, to be prone to inefficient credit propagation across long action sequences. We propose a new learning objective for GFlowNets, trajectory balance, as a more efficient alternative to previously used objectives. In experiments on four distinct domains, we empirically demonstrate the benefits of the trajectory balance objective for GFlowNet convergence, diversity of generated samples, and robustness to long action sequences and large action spaces.
arXiv Detail & Related papers (2022-01-31T14:07:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.