Related papers: Looking Backward: Retrospective Backward Synthesis for Goal-Conditioned GFlowNets

Looking Backward: Retrospective Backward Synthesis for Goal-Conditioned GFlowNets

URL: http://arxiv.org/abs/2406.01150v2
Date: Sun, 23 Feb 2025 12:56:58 GMT
Title: Looking Backward: Retrospective Backward Synthesis for Goal-Conditioned GFlowNets
Authors: Haoran He, Can Chang, Huazhe Xu, Ling Pan,
Abstract summary: Generative Flow Networks (GFlowNets) have demonstrated remarkable capabilities to generate diverse sets of high-reward candidates.<n>However, training such models is challenging due to extremely sparse rewards.<n>We propose a novel method called textbfRetrospective textbfBackward textbfSynthesis (textbfRBS) to address these problems.
Score: 27.33222647437964
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generative Flow Networks (GFlowNets), a new family of probabilistic samplers, have demonstrated remarkable capabilities to generate diverse sets of high-reward candidates, in contrast to standard return maximization approaches (e.g., reinforcement learning) which often converge to a single optimal solution. Recent works have focused on developing goal-conditioned GFlowNets, which aim to train a single GFlowNet capable of achieving different outcomes as the task specifies. However, training such models is challenging due to extremely sparse rewards, particularly in high-dimensional problems. Moreover, previous methods suffer from the limited coverage of explored trajectories during training, which presents more pronounced challenges when only offline data is available. In this work, we propose a novel method called \textbf{R}etrospective \textbf{B}ackward \textbf{S}ynthesis (\textbf{RBS}) to address these critical problems. Specifically, RBS synthesizes new backward trajectories in goal-conditioned GFlowNets to enrich training trajectories with enhanced quality and diversity, thereby introducing copious learnable signals for effectively tackling the sparse reward problem. Extensive empirical results show that our method improves sample efficiency by a large margin and outperforms strong baselines on various standard evaluation benchmarks.

Related papers

Boosted GFlowNets: Improving Exploration via Sequential Learning [13.119757506183392]
Boosted GFlowNets are a method that sequentially trains an ensemble of GFlowNets, each optimizing a residual reward that compensates for the mass already captured by previous models.<n>We show that Boosted GFlowNets achieve substantially better exploration and sample diversity on multimodal synthetic benchmarks and peptide design tasks.
arXiv Detail & Related papers (2025-11-12T19:30:11Z)
Proxy-Free GFlowNet [39.964801793885485]
Generative Flow Networks (GFlowNets) are designed to sample diverse, high-reward structures by modeling distributions over compositional objects.<n>Most existing methods adopt a model-based approach, learning a proxy model from the dataset to approximate the reward function.<n>We propose textbfTrajectory-Distilled GFlowNet (TD-GFN), a emphproxy-free training framework that eliminates the need for out-of-dataset reward queries.
arXiv Detail & Related papers (2025-05-26T15:12:22Z)
Loss-Guided Auxiliary Agents for Overcoming Mode Collapse in GFlowNets [22.653875450786444]
Loss-Guided GFlowNets (LGGFN) is a novel approach where an auxiliary GFlowNet's exploration is textbfdirectly driven by the main GFlowNet's training loss<n>This targeted exploration significantly accelerates the discovery of diverse, high-reward samples.
arXiv Detail & Related papers (2025-05-21T08:27:10Z)
Optimizing Backward Policies in GFlowNets via Trajectory Likelihood Maximization [4.158255103170876]
GFlowNets are a family of generative models that learn to sample objects proportional to a given reward function. Recent results show a close relationship between GFlowNet training and entropy-regularized reinforcement learning problems. We introduce a simple backward policy optimization algorithm that involves direct sequentially of the value function in an entropy-regularized Markov Decision Process.
arXiv Detail & Related papers (2024-10-20T19:12:14Z)
Beyond Squared Error: Exploring Loss Design for Enhanced Training of Generative Flow Networks [36.084318189865066]
We show that distinct regression losses correspond to specific divergence measures, enabling us to design and analyze regression losses according to the desired properties of the corresponding divergence measures. Based on our theoretical framework, we propose three novel regression losses, namely, Shifted-Cosh, Linex(1/2), and Linex(1). Our proposed losses are compatible with most existing training algorithms, and significantly improve the performances of the algorithms concerning convergence speed, sample diversity, and robustness.
arXiv Detail & Related papers (2024-10-03T15:37:22Z)
On Generalization for Generative Flow Networks [54.20924253330039]
Generative Flow Networks (GFlowNets) have emerged as an innovative learning paradigm designed to address the challenge of sampling from an unnormalized probability distribution. This paper attempts to formalize generalization in the context of GFlowNets, to link generalization with stability, and also to design experiments that assess the capacity of these models to uncover unseen parts of the reward function.
arXiv Detail & Related papers (2024-07-03T13:42:21Z)
Bifurcated Generative Flow Networks [32.40020432840822]
Bifurcated GFlowNets (BN) are a novel approach to factorize the flows into separate representations for state flows and edge-based flow allocation. We show that BN significantly improves learning efficiency and effectiveness compared to strong baselines.
arXiv Detail & Related papers (2024-06-04T02:12:27Z)
LIRE: listwise reward enhancement for preference alignment [27.50204023448716]
We propose a gradient-based reward optimization approach that incorporates the offline rewards of multiple responses into a streamlined listwise framework. LIRE is straightforward to implement, requiring minimal parameter tuning, and seamlessly aligns with the pairwise paradigm. Our experiments demonstrate that LIRE consistently outperforms existing methods across several benchmarks on dialogue and summarization tasks.
arXiv Detail & Related papers (2024-05-22T10:21:50Z)
Pre-Training and Fine-Tuning Generative Flow Networks [61.90529626590415]
We introduce a novel approach for reward-free pre-training of GFlowNets. By framing the training as a self-supervised problem, we propose an outcome-conditioned GFlowNet that learns to explore the candidate space. We show that the pre-trained OC-GFN model can allow for a direct extraction of a policy capable of sampling from any new reward functions in downstream tasks.
arXiv Detail & Related papers (2023-10-05T09:53:22Z)
Local Search GFlowNets [85.0053493167887]
Generative Flow Networks (GFlowNets) are amortized sampling methods that learn a distribution over discrete objects proportional to their rewards. GFlowNets exhibit a remarkable ability to generate diverse samples, yet occasionally struggle to consistently produce samples with high rewards due to over-exploration on wide sample space. This paper proposes to train GFlowNets with local search, which focuses on exploiting high-rewarded sample space to resolve this issue.
arXiv Detail & Related papers (2023-10-04T10:27:17Z)
Stochastic Generative Flow Networks [89.34644133901647]
Generative Flow Networks (or GFlowNets) learn to sample complex structures through the lens of "inference as control" Existing GFlowNets can be applied only to deterministic environments, and fail in more general tasks with dynamics. This paper introduces GFlowNets, a new algorithm that extends GFlowNets to environments.
arXiv Detail & Related papers (2023-02-19T03:19:40Z)
Generative Augmented Flow Networks [88.50647244459009]
We propose Generative Augmented Flow Networks (GAFlowNets) to incorporate intermediate rewards into GFlowNets. GAFlowNets can leverage edge-based and state-based intrinsic rewards in a joint way to improve exploration.
arXiv Detail & Related papers (2022-10-07T03:33:56Z)
Learning GFlowNets from partial episodes for improved convergence and stability [56.99229746004125]
Generative flow networks (GFlowNets) are algorithms for training a sequential sampler of discrete objects under an unnormalized target density. Existing training objectives for GFlowNets are either local to states or transitions, or propagate a reward signal over an entire sampling trajectory. Inspired by the TD($lambda$) algorithm in reinforcement learning, we introduce subtrajectory balance or SubTB($lambda$), a GFlowNet training objective that can learn from partial action subsequences of varying lengths.
arXiv Detail & Related papers (2022-09-26T15:44:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.