Monte Carlo Tree Diffusion for System 2 Planning
- URL: http://arxiv.org/abs/2502.07202v2
- Date: Fri, 11 Apr 2025 00:14:32 GMT
- Title: Monte Carlo Tree Diffusion for System 2 Planning
- Authors: Jaesik Yoon, Hyeonseo Cho, Doojin Baek, Yoshua Bengio, Sungjin Ahn,
- Abstract summary: We introduce Monte Carlo Tree Diffusion (MCTD), a novel framework that integrates the generative strength of diffusion models with the adaptive search capabilities of Monte Carlo Tree Search (MCTS)<n>MCTD achieves the benefits of MCTS such as controlling exploration-exploitation trade-offs within the diffusion framework.
- Score: 57.50512800900167
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion models have recently emerged as a powerful tool for planning. However, unlike Monte Carlo Tree Search (MCTS)-whose performance naturally improves with additional test-time computation (TTC), standard diffusion-based planners offer only limited avenues for TTC scalability. In this paper, we introduce Monte Carlo Tree Diffusion (MCTD), a novel framework that integrates the generative strength of diffusion models with the adaptive search capabilities of MCTS. Our method reconceptualizes denoising as a tree-structured process, allowing partially denoised plans to be iteratively evaluated, pruned, and refined. By selectively expanding promising trajectories while retaining the flexibility to revisit and improve suboptimal branches, MCTD achieves the benefits of MCTS such as controlling exploration-exploitation trade-offs within the diffusion framework. Empirical results on challenging long-horizon tasks show that MCTD outperforms diffusion baselines, yielding higher-quality solutions as TTC increases.
Related papers
- Trust-Region Twisted Policy Improvement [8.73717644648873]
Monte-Carlo tree search (MCTS) has driven many recent breakthroughs in deep reinforcement learning (RL)
We tailor Monte-Carlo planners specifically for RL by improving data generation within the planner through constrained action sampling and explicit terminal state handling.
This leads to our Trust-Region Twisted SMC (TRT-SMC), which shows improved runtime and sample-efficiency over baseline MCTS and SMC methods in both discrete and continuous domains.
arXiv Detail & Related papers (2025-04-08T13:47:07Z) - Adding Additional Control to One-Step Diffusion with Joint Distribution Matching [58.37264951734603]
JDM is a novel approach that minimizes the reverse KL divergence between image-condition joint distributions.
By deriving a tractable upper bound, JDM decouples fidelity learning from condition learning.
This asymmetric distillation scheme enables our one-step student to handle controls unknown to the teacher model.
arXiv Detail & Related papers (2025-03-09T15:06:50Z) - Towards Widening The Distillation Bottleneck for Reasoning Models [39.22557129190619]
Distillation--post-training on LRMs-generated data--is a straightforward yet effective method to enhance the reasoning abilities of smaller models.
We found that distilled long CoT data poses learning difficulty for small models and leads to the inheritance of biases.
We propose constructing tree-based CoT data from scratch via Monte Carlo Tree Search.
arXiv Detail & Related papers (2025-03-03T12:17:36Z) - T-SCEND: Test-time Scalable MCTS-enhanced Diffusion Model [7.250494262573953]
Test-time Scalable MCTS-enhanced Diffusion Model (T-SCEND) is a novel framework that significantly improves diffusion model's reasoning capabilities.<n>T-SCEND integrates the denoising process with a novel hybrid Monte Carlo Tree Search.<n>We demonstrate the effectiveness of T-SCEND's training objective and scalable inference method.
arXiv Detail & Related papers (2025-02-04T04:07:48Z) - Boosting MCTS with Free Energy Minimization [0.0]
We propose a new planning framework that integrates Monte Carlo Tree Search (MCTS) with active inference objectives.<n>MCTS can be naturally extended to incorporate free energy minimization by blending expected rewards with information gain.<n>This synergy allows our planner to maintain coherent estimates of value and uncertainty throughout planning, without sacrificing computational tractability.
arXiv Detail & Related papers (2025-01-22T18:45:15Z) - Unleashing Network Potentials for Semantic Scene Completion [50.95486458217653]
This paper proposes a novel SSC framework - Adrial Modality Modulation Network (AMMNet)
AMMNet introduces two core modules: a cross-modal modulation enabling the interdependence of gradient flows between modalities, and a customized adversarial training scheme leveraging dynamic gradient competition.
Extensive experimental results demonstrate that AMMNet outperforms state-of-the-art SSC methods by a large margin.
arXiv Detail & Related papers (2024-03-12T11:48:49Z) - Learning Energy-Based Prior Model with Diffusion-Amortized MCMC [89.95629196907082]
Common practice of learning latent space EBMs with non-convergent short-run MCMC for prior and posterior sampling is hindering the model from further progress.
We introduce a simple but effective diffusion-based amortization method for long-run MCMC sampling and develop a novel learning algorithm for the latent space EBM based on it.
arXiv Detail & Related papers (2023-10-05T00:23:34Z) - Bayesian Decision Trees Inspired from Evolutionary Algorithms [64.80360020499555]
We propose a replacement of the Markov Chain Monte Carlo (MCMC) with an inherently parallel algorithm, the Sequential Monte Carlo (SMC)
Experiments show that SMC combined with the Evolutionary Algorithms (EA) can produce more accurate results compared to MCMC in 100 times fewer iterations.
arXiv Detail & Related papers (2023-05-30T06:17:35Z) - Continuous Monte Carlo Graph Search [61.11769232283621]
Continuous Monte Carlo Graph Search ( CMCGS) is an extension of Monte Carlo Tree Search (MCTS) to online planning.
CMCGS takes advantage of the insight that, during planning, sharing the same action policy between several states can yield high performance.
It can be scaled up through parallelization, and it outperforms the Cross-Entropy Method (CEM) in continuous control with learned dynamics models.
arXiv Detail & Related papers (2022-10-04T07:34:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.