MG2FlowNet: Accelerating High-Reward Sample Generation via Enhanced MCTS and Greediness Control
- URL: http://arxiv.org/abs/2510.00805v2
- Date: Sat, 04 Oct 2025 15:33:53 GMT
- Title: MG2FlowNet: Accelerating High-Reward Sample Generation via Enhanced MCTS and Greediness Control
- Authors: Rui Zhu, Xuan Yu, Yudong Zhang, Chen Zhang, Xu Wang, Yang Wang,
- Abstract summary: Generative Flow Networks (GFlowNets) have emerged as a powerful tool for generating diverse and high-reward structured objects by learning to sample from a distribution proportional to a given reward function.<n>In this work, we integrate an enhanced Monte Carlo Tree Search (MCTS) into the GFlowNets sampling process to balance exploration and exploitation adaptively.<n>Our method can not only accelerate the speed of discovering high-reward regions but also continuously generate high-reward samples, while preserving the diversity of the generative distribution.
- Score: 19.49552596070782
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generative Flow Networks (GFlowNets) have emerged as a powerful tool for generating diverse and high-reward structured objects by learning to sample from a distribution proportional to a given reward function. Unlike conventional reinforcement learning (RL) approaches that prioritize optimization of a single trajectory, GFlowNets seek to balance diversity and reward by modeling the entire trajectory distribution. This capability makes them especially suitable for domains such as molecular design and combinatorial optimization. However, existing GFlowNets sampling strategies tend to overexplore and struggle to consistently generate high-reward samples, particularly in large search spaces with sparse high-reward regions. Therefore, improving the probability of generating high-reward samples without sacrificing diversity remains a key challenge under this premise. In this work, we integrate an enhanced Monte Carlo Tree Search (MCTS) into the GFlowNets sampling process, using MCTS-based policy evaluation to guide the generation toward high-reward trajectories and Polynomial Upper Confidence Trees (PUCT) to balance exploration and exploitation adaptively, and we introduce a controllable mechanism to regulate the degree of greediness. Our method enhances exploitation without sacrificing diversity by dynamically balancing exploration and reward-driven guidance. The experimental results show that our method can not only accelerate the speed of discovering high-reward regions but also continuously generate high-reward samples, while preserving the diversity of the generative distribution. All implementations are available at https://github.com/ZRNB/MG2FlowNet.
Related papers
- Avoid What You Know: Divergent Trajectory Balance for GFlowNets [14.524997986396713]
We propose an exploration GFlowNet explicitly trained to search for high-reward states in regions underexplored by the canonical GFlowNet.<n>We show that ACE significantly improves upon prior work in terms of approximation accuracy to the target distribution and discovery rate of diverse high-reward states.
arXiv Detail & Related papers (2026-02-19T20:47:28Z) - Boosted GFlowNets: Improving Exploration via Sequential Learning [13.119757506183392]
Boosted GFlowNets are a method that sequentially trains an ensemble of GFlowNets, each optimizing a residual reward that compensates for the mass already captured by previous models.<n>We show that Boosted GFlowNets achieve substantially better exploration and sample diversity on multimodal synthetic benchmarks and peptide design tasks.
arXiv Detail & Related papers (2025-11-12T19:30:11Z) - Loss-Guided Auxiliary Agents for Overcoming Mode Collapse in GFlowNets [22.653875450786444]
Loss-Guided GFlowNets (LGGFN) is a novel approach where an auxiliary GFlowNet's exploration is textbfdirectly driven by the main GFlowNet's training loss<n>This targeted exploration significantly accelerates the discovery of diverse, high-reward samples.
arXiv Detail & Related papers (2025-05-21T08:27:10Z) - Generative Diffusion Models for Resource Allocation in Wireless Networks [74.84410305593006]
We train a policy to imitate an expert and generate new samples from the optimal distribution.<n>We achieve near-optimal performance through the sequential execution of the generated samples.<n>We present numerical results in a case study of power control.
arXiv Detail & Related papers (2025-04-28T21:44:31Z) - On Generalization for Generative Flow Networks [54.20924253330039]
Generative Flow Networks (GFlowNets) have emerged as an innovative learning paradigm designed to address the challenge of sampling from an unnormalized probability distribution.
This paper attempts to formalize generalization in the context of GFlowNets, to link generalization with stability, and also to design experiments that assess the capacity of these models to uncover unseen parts of the reward function.
arXiv Detail & Related papers (2024-07-03T13:42:21Z) - Looking Backward: Retrospective Backward Synthesis for Goal-Conditioned GFlowNets [27.33222647437964]
Generative Flow Networks (GFlowNets) have demonstrated remarkable capabilities to generate diverse sets of high-reward candidates.<n>However, training such models is challenging due to extremely sparse rewards.<n>We propose a novel method called textbfRetrospective textbfBackward textbfSynthesis (textbfRBS) to address these problems.
arXiv Detail & Related papers (2024-06-03T09:44:10Z) - Local Search GFlowNets [85.0053493167887]
Generative Flow Networks (GFlowNets) are amortized sampling methods that learn a distribution over discrete objects proportional to their rewards.
GFlowNets exhibit a remarkable ability to generate diverse samples, yet occasionally struggle to consistently produce samples with high rewards due to over-exploration on wide sample space.
This paper proposes to train GFlowNets with local search, which focuses on exploiting high-rewarded sample space to resolve this issue.
arXiv Detail & Related papers (2023-10-04T10:27:17Z) - Stochastic Generative Flow Networks [89.34644133901647]
Generative Flow Networks (or GFlowNets) learn to sample complex structures through the lens of "inference as control"
Existing GFlowNets can be applied only to deterministic environments, and fail in more general tasks with dynamics.
This paper introduces GFlowNets, a new algorithm that extends GFlowNets to environments.
arXiv Detail & Related papers (2023-02-19T03:19:40Z) - Generative Augmented Flow Networks [88.50647244459009]
We propose Generative Augmented Flow Networks (GAFlowNets) to incorporate intermediate rewards into GFlowNets.
GAFlowNets can leverage edge-based and state-based intrinsic rewards in a joint way to improve exploration.
arXiv Detail & Related papers (2022-10-07T03:33:56Z) - Learning GFlowNets from partial episodes for improved convergence and
stability [56.99229746004125]
Generative flow networks (GFlowNets) are algorithms for training a sequential sampler of discrete objects under an unnormalized target density.
Existing training objectives for GFlowNets are either local to states or transitions, or propagate a reward signal over an entire sampling trajectory.
Inspired by the TD($lambda$) algorithm in reinforcement learning, we introduce subtrajectory balance or SubTB($lambda$), a GFlowNet training objective that can learn from partial action subsequences of varying lengths.
arXiv Detail & Related papers (2022-09-26T15:44:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.