Related papers: Looking Backward: Retrospective Backward Synthesis for Goal-Conditioned GFlowNets

Looking Backward: Retrospective Backward Synthesis for Goal-Conditioned GFlowNets

URL: http://arxiv.org/abs/2406.01150v1
Date: Mon, 3 Jun 2024 09:44:10 GMT
Title: Looking Backward: Retrospective Backward Synthesis for Goal-Conditioned GFlowNets
Authors: Haoran He, Can Chang, Huazhe Xu, Ling Pan,
Abstract summary: Generative Flow Networks (GFlowNets) are amortized sampling methods for learning a policy to sequentially generate objects with probabilities to their rewards. GFlowNets exhibit a remarkable ability to generate diverse sets of high-reward proportional objects, in contrast to standard reinforcement learning approaches. Recent works have arisen for learning goal-conditioned GFlowNets to acquire various useful properties, aiming to train a single GFlowNet capable of achieving different goals as the task specifies. We propose a novel method named Retrospective Backward Synthesis (RBS) to address these challenges. Specifically, RBS synthesizes a new backward trajectory
Score: 27.33222647437964
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generative Flow Networks (GFlowNets) are amortized sampling methods for learning a stochastic policy to sequentially generate compositional objects with probabilities proportional to their rewards. GFlowNets exhibit a remarkable ability to generate diverse sets of high-reward objects, in contrast to standard return maximization reinforcement learning approaches, which often converge to a single optimal solution. Recent works have arisen for learning goal-conditioned GFlowNets to acquire various useful properties, aiming to train a single GFlowNet capable of achieving different goals as the task specifies. However, training a goal-conditioned GFlowNet poses critical challenges due to extremely sparse rewards, which is further exacerbated in large state spaces. In this work, we propose a novel method named Retrospective Backward Synthesis (RBS) to address these challenges. Specifically, RBS synthesizes a new backward trajectory based on the backward policy in GFlowNets to enrich training trajectories with enhanced quality and diversity, thereby efficiently solving the sparse reward problem. Extensive empirical results show that our method improves sample efficiency by a large margin and outperforms strong baselines on various standard evaluation benchmarks.

Related papers

Optimizing Backward Policies in GFlowNets via Trajectory Likelihood Maximization [4.158255103170876]
GFlowNets are a family of generative models that learn to sample objects proportional to a given reward function. Recent results show a close relationship between GFlowNet training and entropy-regularized reinforcement learning problems. We introduce a simple backward policy optimization algorithm that involves direct sequentially of the value function in an entropy-regularized Markov Decision Process.
arXiv Detail & Related papers (2024-10-20T19:12:14Z)
Beyond Squared Error: Exploring Loss Design for Enhanced Training of Generative Flow Networks [36.084318189865066]
We show that distinct regression losses correspond to specific divergence measures, enabling us to design and analyze regression losses according to the desired properties of the corresponding divergence measures. Based on our theoretical framework, we propose three novel regression losses, namely, Shifted-Cosh, Linex(1/2), and Linex(1). Our proposed losses are compatible with most existing training algorithms, and significantly improve the performances of the algorithms concerning convergence speed, sample diversity, and robustness.
arXiv Detail & Related papers (2024-10-03T15:37:22Z)
On Generalization for Generative Flow Networks [54.20924253330039]
Generative Flow Networks (GFlowNets) have emerged as an innovative learning paradigm designed to address the challenge of sampling from an unnormalized probability distribution. This paper attempts to formalize generalization in the context of GFlowNets, to link generalization with stability, and also to design experiments that assess the capacity of these models to uncover unseen parts of the reward function.
arXiv Detail & Related papers (2024-07-03T13:42:21Z)
Bifurcated Generative Flow Networks [32.40020432840822]
Bifurcated GFlowNets (BN) are a novel approach to factorize the flows into separate representations for state flows and edge-based flow allocation. We show that BN significantly improves learning efficiency and effectiveness compared to strong baselines.
arXiv Detail & Related papers (2024-06-04T02:12:27Z)
LIRE: listwise reward enhancement for preference alignment [27.50204023448716]
We propose a gradient-based reward optimization approach that incorporates the offline rewards of multiple responses into a streamlined listwise framework. LIRE is straightforward to implement, requiring minimal parameter tuning, and seamlessly aligns with the pairwise paradigm. Our experiments demonstrate that LIRE consistently outperforms existing methods across several benchmarks on dialogue and summarization tasks.
arXiv Detail & Related papers (2024-05-22T10:21:50Z)
Pre-Training and Fine-Tuning Generative Flow Networks [61.90529626590415]
We introduce a novel approach for reward-free pre-training of GFlowNets. By framing the training as a self-supervised problem, we propose an outcome-conditioned GFlowNet that learns to explore the candidate space. We show that the pre-trained OC-GFN model can allow for a direct extraction of a policy capable of sampling from any new reward functions in downstream tasks.
arXiv Detail & Related papers (2023-10-05T09:53:22Z)
Local Search GFlowNets [85.0053493167887]
Generative Flow Networks (GFlowNets) are amortized sampling methods that learn a distribution over discrete objects proportional to their rewards. GFlowNets exhibit a remarkable ability to generate diverse samples, yet occasionally struggle to consistently produce samples with high rewards due to over-exploration on wide sample space. This paper proposes to train GFlowNets with local search, which focuses on exploiting high-rewarded sample space to resolve this issue.
arXiv Detail & Related papers (2023-10-04T10:27:17Z)
Stochastic Generative Flow Networks [89.34644133901647]
Generative Flow Networks (or GFlowNets) learn to sample complex structures through the lens of "inference as control" Existing GFlowNets can be applied only to deterministic environments, and fail in more general tasks with dynamics. This paper introduces GFlowNets, a new algorithm that extends GFlowNets to environments.
arXiv Detail & Related papers (2023-02-19T03:19:40Z)
Generative Augmented Flow Networks [88.50647244459009]
We propose Generative Augmented Flow Networks (GAFlowNets) to incorporate intermediate rewards into GFlowNets. GAFlowNets can leverage edge-based and state-based intrinsic rewards in a joint way to improve exploration.
arXiv Detail & Related papers (2022-10-07T03:33:56Z)
Learning GFlowNets from partial episodes for improved convergence and stability [56.99229746004125]
Generative flow networks (GFlowNets) are algorithms for training a sequential sampler of discrete objects under an unnormalized target density. Existing training objectives for GFlowNets are either local to states or transitions, or propagate a reward signal over an entire sampling trajectory. Inspired by the TD($lambda$) algorithm in reinforcement learning, we introduce subtrajectory balance or SubTB($lambda$), a GFlowNet training objective that can learn from partial action subsequences of varying lengths.
arXiv Detail & Related papers (2022-09-26T15:44:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.