An Empirical Study of the Effectiveness of Using a Replay Buffer on Mode
Discovery in GFlowNets
- URL: http://arxiv.org/abs/2307.07674v2
- Date: Tue, 18 Jul 2023 01:11:01 GMT
- Title: An Empirical Study of the Effectiveness of Using a Replay Buffer on Mode
Discovery in GFlowNets
- Authors: Nikhil Vemgal, Elaine Lau, Doina Precup
- Abstract summary: Reinforcement Learning (RL) algorithms aim to learn an optimal policy by iteratively sampling actions to learn how to maximize the total expected return, $R(x)$.
GFlowNets are a special class of algorithms designed to generate diverse candidates, $x$, from a discrete set, by learning a policy that approximates the proportional sampling of $R(x)$.
- Score: 47.82697599507171
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reinforcement Learning (RL) algorithms aim to learn an optimal policy by
iteratively sampling actions to learn how to maximize the total expected
return, $R(x)$. GFlowNets are a special class of algorithms designed to
generate diverse candidates, $x$, from a discrete set, by learning a policy
that approximates the proportional sampling of $R(x)$. GFlowNets exhibit
improved mode discovery compared to conventional RL algorithms, which is very
useful for applications such as drug discovery and combinatorial search.
However, since GFlowNets are a relatively recent class of algorithms, many
techniques which are useful in RL have not yet been associated with them. In
this paper, we study the utilization of a replay buffer for GFlowNets. We
explore empirically various replay buffer sampling techniques and assess the
impact on the speed of mode discovery and the quality of the modes discovered.
Our experimental results in the Hypergrid toy domain and a molecule synthesis
environment demonstrate significant improvements in mode discovery when
training with a replay buffer, compared to training only with trajectories
generated on-policy.
Related papers
- Optimizing Backward Policies in GFlowNets via Trajectory Likelihood Maximization [4.158255103170876]
GFlowNets are a family of generative models that learn to sample objects proportional to a given reward function.
Recent results show a close relationship between GFlowNet training and entropy-regularized reinforcement learning problems.
We introduce a simple backward policy optimization algorithm that involves direct sequentially of the value function in an entropy-regularized Markov Decision Process.
arXiv Detail & Related papers (2024-10-20T19:12:14Z) - Rectifying Reinforcement Learning for Reward Matching [12.294107455811496]
We establish a new connection between GFlowNets and policy evaluation for a uniform policy.
We propose a novel rectified policy evaluation algorithm, which achieves the same reward-matching effect as GFlowNets.
arXiv Detail & Related papers (2024-06-04T11:11:53Z) - Looking Backward: Retrospective Backward Synthesis for Goal-Conditioned GFlowNets [27.33222647437964]
Generative Flow Networks (GFlowNets) are amortized sampling methods for learning a policy to sequentially generate objects with probabilities to their rewards.
GFlowNets exhibit a remarkable ability to generate diverse sets of high-reward proportional objects, in contrast to standard reinforcement learning approaches.
Recent works have arisen for learning goal-conditioned GFlowNets to acquire various useful properties, aiming to train a single GFlowNet capable of achieving different goals as the task specifies.
We propose a novel method named Retrospective Backward Synthesis (RBS) to address these challenges. Specifically, RBS synthesizes a new backward trajectory
arXiv Detail & Related papers (2024-06-03T09:44:10Z) - Local Search GFlowNets [85.0053493167887]
Generative Flow Networks (GFlowNets) are amortized sampling methods that learn a distribution over discrete objects proportional to their rewards.
GFlowNets exhibit a remarkable ability to generate diverse samples, yet occasionally struggle to consistently produce samples with high rewards due to over-exploration on wide sample space.
This paper proposes to train GFlowNets with local search, which focuses on exploiting high-rewarded sample space to resolve this issue.
arXiv Detail & Related papers (2023-10-04T10:27:17Z) - Thompson sampling for improved exploration in GFlowNets [75.89693358516944]
Generative flow networks (GFlowNets) are amortized variational inference algorithms that treat sampling from a distribution over compositional objects as a sequential decision-making problem with a learnable action policy.
We show in two domains that TS-GFN yields improved exploration and thus faster convergence to the target distribution than the off-policy exploration strategies used in past work.
arXiv Detail & Related papers (2023-06-30T14:19:44Z) - Towards Understanding and Improving GFlowNet Training [71.85707593318297]
We introduce an efficient evaluation strategy to compare the learned sampling distribution to the target reward distribution.
We propose prioritized replay training of high-reward $x$, relative edge flow policy parametrization, and a novel guided trajectory balance objective.
arXiv Detail & Related papers (2023-05-11T22:50:41Z) - Learning GFlowNets from partial episodes for improved convergence and
stability [56.99229746004125]
Generative flow networks (GFlowNets) are algorithms for training a sequential sampler of discrete objects under an unnormalized target density.
Existing training objectives for GFlowNets are either local to states or transitions, or propagate a reward signal over an entire sampling trajectory.
Inspired by the TD($lambda$) algorithm in reinforcement learning, we introduce subtrajectory balance or SubTB($lambda$), a GFlowNet training objective that can learn from partial action subsequences of varying lengths.
arXiv Detail & Related papers (2022-09-26T15:44:24Z) - Learning Sampling Policy for Faster Derivative Free Optimization [100.27518340593284]
We propose a new reinforcement learning based ZO algorithm (ZO-RL) with learning the sampling policy for generating the perturbations in ZO optimization instead of using random sampling.
Our results show that our ZO-RL algorithm can effectively reduce the variances of ZO gradient by learning a sampling policy, and converge faster than existing ZO algorithms in different scenarios.
arXiv Detail & Related papers (2021-04-09T14:50:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.