Looking Backward: Retrospective Backward Synthesis for Goal-Conditioned GFlowNets
- URL: http://arxiv.org/abs/2406.01150v2
- Date: Sun, 23 Feb 2025 12:56:58 GMT
- Title: Looking Backward: Retrospective Backward Synthesis for Goal-Conditioned GFlowNets
- Authors: Haoran He, Can Chang, Huazhe Xu, Ling Pan,
- Abstract summary: Generative Flow Networks (GFlowNets) have demonstrated remarkable capabilities to generate diverse sets of high-reward candidates.<n>However, training such models is challenging due to extremely sparse rewards.<n>We propose a novel method called textbfRetrospective textbfBackward textbfSynthesis (textbfRBS) to address these problems.
- Score: 27.33222647437964
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generative Flow Networks (GFlowNets), a new family of probabilistic samplers, have demonstrated remarkable capabilities to generate diverse sets of high-reward candidates, in contrast to standard return maximization approaches (e.g., reinforcement learning) which often converge to a single optimal solution. Recent works have focused on developing goal-conditioned GFlowNets, which aim to train a single GFlowNet capable of achieving different outcomes as the task specifies. However, training such models is challenging due to extremely sparse rewards, particularly in high-dimensional problems. Moreover, previous methods suffer from the limited coverage of explored trajectories during training, which presents more pronounced challenges when only offline data is available. In this work, we propose a novel method called \textbf{R}etrospective \textbf{B}ackward \textbf{S}ynthesis (\textbf{RBS}) to address these critical problems. Specifically, RBS synthesizes new backward trajectories in goal-conditioned GFlowNets to enrich training trajectories with enhanced quality and diversity, thereby introducing copious learnable signals for effectively tackling the sparse reward problem. Extensive empirical results show that our method improves sample efficiency by a large margin and outperforms strong baselines on various standard evaluation benchmarks.
Related papers
- Optimizing Backward Policies in GFlowNets via Trajectory Likelihood Maximization [4.158255103170876]
GFlowNets are a family of generative models that learn to sample objects proportional to a given reward function.
Recent results show a close relationship between GFlowNet training and entropy-regularized reinforcement learning problems.
We introduce a simple backward policy optimization algorithm that involves direct sequentially of the value function in an entropy-regularized Markov Decision Process.
arXiv Detail & Related papers (2024-10-20T19:12:14Z) - Beyond Squared Error: Exploring Loss Design for Enhanced Training of Generative Flow Networks [36.084318189865066]
We show that distinct regression losses correspond to specific divergence measures, enabling us to design and analyze regression losses according to the desired properties of the corresponding divergence measures.
Based on our theoretical framework, we propose three novel regression losses, namely, Shifted-Cosh, Linex(1/2), and Linex(1).
Our proposed losses are compatible with most existing training algorithms, and significantly improve the performances of the algorithms concerning convergence speed, sample diversity, and robustness.
arXiv Detail & Related papers (2024-10-03T15:37:22Z) - On Generalization for Generative Flow Networks [54.20924253330039]
Generative Flow Networks (GFlowNets) have emerged as an innovative learning paradigm designed to address the challenge of sampling from an unnormalized probability distribution.
This paper attempts to formalize generalization in the context of GFlowNets, to link generalization with stability, and also to design experiments that assess the capacity of these models to uncover unseen parts of the reward function.
arXiv Detail & Related papers (2024-07-03T13:42:21Z) - Bifurcated Generative Flow Networks [32.40020432840822]
Bifurcated GFlowNets (BN) are a novel approach to factorize the flows into separate representations for state flows and edge-based flow allocation.
We show that BN significantly improves learning efficiency and effectiveness compared to strong baselines.
arXiv Detail & Related papers (2024-06-04T02:12:27Z) - LIRE: listwise reward enhancement for preference alignment [27.50204023448716]
We propose a gradient-based reward optimization approach that incorporates the offline rewards of multiple responses into a streamlined listwise framework.
LIRE is straightforward to implement, requiring minimal parameter tuning, and seamlessly aligns with the pairwise paradigm.
Our experiments demonstrate that LIRE consistently outperforms existing methods across several benchmarks on dialogue and summarization tasks.
arXiv Detail & Related papers (2024-05-22T10:21:50Z) - Pre-Training and Fine-Tuning Generative Flow Networks [61.90529626590415]
We introduce a novel approach for reward-free pre-training of GFlowNets.
By framing the training as a self-supervised problem, we propose an outcome-conditioned GFlowNet that learns to explore the candidate space.
We show that the pre-trained OC-GFN model can allow for a direct extraction of a policy capable of sampling from any new reward functions in downstream tasks.
arXiv Detail & Related papers (2023-10-05T09:53:22Z) - Local Search GFlowNets [85.0053493167887]
Generative Flow Networks (GFlowNets) are amortized sampling methods that learn a distribution over discrete objects proportional to their rewards.
GFlowNets exhibit a remarkable ability to generate diverse samples, yet occasionally struggle to consistently produce samples with high rewards due to over-exploration on wide sample space.
This paper proposes to train GFlowNets with local search, which focuses on exploiting high-rewarded sample space to resolve this issue.
arXiv Detail & Related papers (2023-10-04T10:27:17Z) - Stochastic Generative Flow Networks [89.34644133901647]
Generative Flow Networks (or GFlowNets) learn to sample complex structures through the lens of "inference as control"
Existing GFlowNets can be applied only to deterministic environments, and fail in more general tasks with dynamics.
This paper introduces GFlowNets, a new algorithm that extends GFlowNets to environments.
arXiv Detail & Related papers (2023-02-19T03:19:40Z) - Generative Augmented Flow Networks [88.50647244459009]
We propose Generative Augmented Flow Networks (GAFlowNets) to incorporate intermediate rewards into GFlowNets.
GAFlowNets can leverage edge-based and state-based intrinsic rewards in a joint way to improve exploration.
arXiv Detail & Related papers (2022-10-07T03:33:56Z) - Learning GFlowNets from partial episodes for improved convergence and
stability [56.99229746004125]
Generative flow networks (GFlowNets) are algorithms for training a sequential sampler of discrete objects under an unnormalized target density.
Existing training objectives for GFlowNets are either local to states or transitions, or propagate a reward signal over an entire sampling trajectory.
Inspired by the TD($lambda$) algorithm in reinforcement learning, we introduce subtrajectory balance or SubTB($lambda$), a GFlowNet training objective that can learn from partial action subsequences of varying lengths.
arXiv Detail & Related papers (2022-09-26T15:44:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.