Towards Understanding and Improving GFlowNet Training
- URL: http://arxiv.org/abs/2305.07170v1
- Date: Thu, 11 May 2023 22:50:41 GMT
- Title: Towards Understanding and Improving GFlowNet Training
- Authors: Max W. Shen, Emmanuel Bengio, Ehsan Hajiramezanali, Andreas Loukas,
Kyunghyun Cho, Tommaso Biancalani
- Abstract summary: We introduce an efficient evaluation strategy to compare the learned sampling distribution to the target reward distribution.
We propose prioritized replay training of high-reward $x$, relative edge flow policy parametrization, and a novel guided trajectory balance objective.
- Score: 71.85707593318297
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generative flow networks (GFlowNets) are a family of algorithms that learn a
generative policy to sample discrete objects $x$ with non-negative reward
$R(x)$. Learning objectives guarantee the GFlowNet samples $x$ from the target
distribution $p^*(x) \propto R(x)$ when loss is globally minimized over all
states or trajectories, but it is unclear how well they perform with practical
limits on training resources. We introduce an efficient evaluation strategy to
compare the learned sampling distribution to the target reward distribution. As
flows can be underdetermined given training data, we clarify the importance of
learned flows to generalization and matching $p^*(x)$ in practice. We
investigate how to learn better flows, and propose (i) prioritized replay
training of high-reward $x$, (ii) relative edge flow policy parametrization,
and (iii) a novel guided trajectory balance objective, and show how it can
solve a substructure credit assignment problem. We substantially improve sample
efficiency on biochemical design tasks.
Related papers
- On Divergence Measures for Training GFlowNets [3.7277730514654555]
Generative Flow Networks (GFlowNets) are amortized inference models designed to sample from unnormalized distributions over composable objects.
Traditionally, the training procedure for GFlowNets seeks to minimize the expected log-squared difference between a proposal (forward policy) and a target (backward policy) distribution.
We review four divergence measures, namely, Renyi-$alpha$'s, Tsallis-$alpha$'s, reverse and forward KL's, and design statistically efficient estimators for their gradients in the context of training GFlowNets
arXiv Detail & Related papers (2024-10-12T03:46:52Z) - Evolution Guided Generative Flow Networks [11.609895436955242]
Generative Flow Networks (GFlowNets) learn to sample compositional objects proportional to their rewards.
One big challenge of GFlowNets is training them effectively when dealing with long time horizons and sparse rewards.
We propose Evolution guided generative flow networks (EGFN), a simple but powerful augmentation to the GFlowNets training using Evolutionary algorithms (EA)
arXiv Detail & Related papers (2024-02-03T15:28:53Z) - Pre-Training and Fine-Tuning Generative Flow Networks [61.90529626590415]
We introduce a novel approach for reward-free pre-training of GFlowNets.
By framing the training as a self-supervised problem, we propose an outcome-conditioned GFlowNet that learns to explore the candidate space.
We show that the pre-trained OC-GFN model can allow for a direct extraction of a policy capable of sampling from any new reward functions in downstream tasks.
arXiv Detail & Related papers (2023-10-05T09:53:22Z) - Distributional GFlowNets with Quantile Flows [73.73721901056662]
Generative Flow Networks (GFlowNets) are a new family of probabilistic samplers where an agent learns a policy for generating complex structure through a series of decision-making steps.
In this work, we adopt a distributional paradigm for GFlowNets, turning each flow function into a distribution, thus providing more informative learning signals during training.
Our proposed textitquantile matching GFlowNet learning algorithm is able to learn a risk-sensitive policy, an essential component for handling scenarios with risk uncertainty.
arXiv Detail & Related papers (2023-02-11T22:06:17Z) - Better Training of GFlowNets with Local Credit and Incomplete
Trajectories [81.14310509871935]
We consider the case where the energy function can be applied not just to terminal states but also to intermediate states.
This is for example achieved when the energy function is additive, with terms available along the trajectory.
This enables a training objective that can be applied to update parameters even with incomplete trajectories.
arXiv Detail & Related papers (2023-02-03T12:19:42Z) - Learning GFlowNets from partial episodes for improved convergence and
stability [56.99229746004125]
Generative flow networks (GFlowNets) are algorithms for training a sequential sampler of discrete objects under an unnormalized target density.
Existing training objectives for GFlowNets are either local to states or transitions, or propagate a reward signal over an entire sampling trajectory.
Inspired by the TD($lambda$) algorithm in reinforcement learning, we introduce subtrajectory balance or SubTB($lambda$), a GFlowNet training objective that can learn from partial action subsequences of varying lengths.
arXiv Detail & Related papers (2022-09-26T15:44:24Z) - Provably Efficient Offline Reinforcement Learning with Trajectory-Wise
Reward [66.81579829897392]
We propose a novel offline reinforcement learning algorithm called Pessimistic vAlue iteRaTion with rEward Decomposition (PARTED)
PARTED decomposes the trajectory return into per-step proxy rewards via least-squares-based reward redistribution, and then performs pessimistic value based on the learned proxy reward.
To the best of our knowledge, PARTED is the first offline RL algorithm that is provably efficient in general MDP with trajectory-wise reward.
arXiv Detail & Related papers (2022-06-13T19:11:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.