Related papers: An Empirical Study of the Effectiveness of Using a Replay Buffer on Mode Discovery in GFlowNets

An Empirical Study of the Effectiveness of Using a Replay Buffer on Mode Discovery in GFlowNets

URL: http://arxiv.org/abs/2307.07674v2
Date: Tue, 18 Jul 2023 01:11:01 GMT
Title: An Empirical Study of the Effectiveness of Using a Replay Buffer on Mode Discovery in GFlowNets
Authors: Nikhil Vemgal, Elaine Lau, Doina Precup
Abstract summary: Reinforcement Learning (RL) algorithms aim to learn an optimal policy by iteratively sampling actions to learn how to maximize the total expected return, $R(x)$. GFlowNets are a special class of algorithms designed to generate diverse candidates, $x$, from a discrete set, by learning a policy that approximates the proportional sampling of $R(x)$.
Score: 47.82697599507171
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reinforcement Learning (RL) algorithms aim to learn an optimal policy by iteratively sampling actions to learn how to maximize the total expected return, $R(x)$. GFlowNets are a special class of algorithms designed to generate diverse candidates, $x$, from a discrete set, by learning a policy that approximates the proportional sampling of $R(x)$. GFlowNets exhibit improved mode discovery compared to conventional RL algorithms, which is very useful for applications such as drug discovery and combinatorial search. However, since GFlowNets are a relatively recent class of algorithms, many techniques which are useful in RL have not yet been associated with them. In this paper, we study the utilization of a replay buffer for GFlowNets. We explore empirically various replay buffer sampling techniques and assess the impact on the speed of mode discovery and the quality of the modes discovered. Our experimental results in the Hypergrid toy domain and a molecule synthesis environment demonstrate significant improvements in mode discovery when training with a replay buffer, compared to training only with trajectories generated on-policy.

Related papers

Trajectory Balance with Asynchrony: Decoupling Exploration and Learning for Fast, Scalable LLM Post-Training [71.16258800411696]
Reinforcement learning (RL) is a critical component of large language model (LLM) post-training. Existing on-policy algorithms used for post-training are inherently incompatible with the use of experience replay buffers. We propose efficiently obtaining this benefit of replay buffers via Trajectory Balance with Asynchrony (TBA)
arXiv Detail & Related papers (2025-03-24T17:51:39Z)
Optimizing Backward Policies in GFlowNets via Trajectory Likelihood Maximization [4.158255103170876]
GFlowNets are a family of generative models that learn to sample objects proportional to a given reward function. Recent results show a close relationship between GFlowNet training and entropy-regularized reinforcement learning problems. We introduce a simple backward policy optimization algorithm that involves direct sequentially of the value function in an entropy-regularized Markov Decision Process.
arXiv Detail & Related papers (2024-10-20T19:12:14Z)
Rectifying Reinforcement Learning for Reward Matching [12.294107455811496]
We establish a new connection between GFlowNets and policy evaluation for a uniform policy. We propose a novel rectified policy evaluation algorithm, which achieves the same reward-matching effect as GFlowNets.
arXiv Detail & Related papers (2024-06-04T11:11:53Z)
Looking Backward: Retrospective Backward Synthesis for Goal-Conditioned GFlowNets [27.33222647437964]
Generative Flow Networks (GFlowNets) are amortized sampling methods for learning a policy to sequentially generate objects with probabilities to their rewards. GFlowNets exhibit a remarkable ability to generate diverse sets of high-reward proportional objects, in contrast to standard reinforcement learning approaches. Recent works have arisen for learning goal-conditioned GFlowNets to acquire various useful properties, aiming to train a single GFlowNet capable of achieving different goals as the task specifies. We propose a novel method named Retrospective Backward Synthesis (RBS) to address these challenges. Specifically, RBS synthesizes a new backward trajectory
arXiv Detail & Related papers (2024-06-03T09:44:10Z)
Local Search GFlowNets [85.0053493167887]
Generative Flow Networks (GFlowNets) are amortized sampling methods that learn a distribution over discrete objects proportional to their rewards. GFlowNets exhibit a remarkable ability to generate diverse samples, yet occasionally struggle to consistently produce samples with high rewards due to over-exploration on wide sample space. This paper proposes to train GFlowNets with local search, which focuses on exploiting high-rewarded sample space to resolve this issue.
arXiv Detail & Related papers (2023-10-04T10:27:17Z)
Thompson sampling for improved exploration in GFlowNets [75.89693358516944]
Generative flow networks (GFlowNets) are amortized variational inference algorithms that treat sampling from a distribution over compositional objects as a sequential decision-making problem with a learnable action policy. We show in two domains that TS-GFN yields improved exploration and thus faster convergence to the target distribution than the off-policy exploration strategies used in past work.
arXiv Detail & Related papers (2023-06-30T14:19:44Z)
Towards Understanding and Improving GFlowNet Training [71.85707593318297]
We introduce an efficient evaluation strategy to compare the learned sampling distribution to the target reward distribution. We propose prioritized replay training of high-reward $x$, relative edge flow policy parametrization, and a novel guided trajectory balance objective.
arXiv Detail & Related papers (2023-05-11T22:50:41Z)
Learning GFlowNets from partial episodes for improved convergence and stability [56.99229746004125]
Generative flow networks (GFlowNets) are algorithms for training a sequential sampler of discrete objects under an unnormalized target density. Existing training objectives for GFlowNets are either local to states or transitions, or propagate a reward signal over an entire sampling trajectory. Inspired by the TD($lambda$) algorithm in reinforcement learning, we introduce subtrajectory balance or SubTB($lambda$), a GFlowNet training objective that can learn from partial action subsequences of varying lengths.
arXiv Detail & Related papers (2022-09-26T15:44:24Z)
Large Batch Experience Replay [22.473676537463607]
We introduce new theoretical foundations of Prioritized Experience Replay. LaBER is an easy-to-code and efficient method for sampling the replay buffer.
arXiv Detail & Related papers (2021-10-04T15:53:13Z)
Learning Sampling Policy for Faster Derivative Free Optimization [100.27518340593284]
We propose a new reinforcement learning based ZO algorithm (ZO-RL) with learning the sampling policy for generating the perturbations in ZO optimization instead of using random sampling. Our results show that our ZO-RL algorithm can effectively reduce the variances of ZO gradient by learning a sampling policy, and converge faster than existing ZO algorithms in different scenarios.
arXiv Detail & Related papers (2021-04-09T14:50:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.