Related papers: Monte Carlo Tree Diffusion for System 2 Planning

Related papers

Elucidated Rolling Diffusion Models for Probabilistic Weather Forecasting [52.6508222408558]
We introduce Elucidated Rolling Diffusion Models (ERDM)<n>ERDM is the first framework to unify a rolling forecast structure with the principled, performant design of Elucidated Diffusion Models (EDM)<n>On 2D Navier-Stokes simulations and ERA5 global weather forecasting at 1.5circ resolution, ERDM consistently outperforms key diffusion-based baselines.
arXiv Detail & Related papers (2025-06-24T21:44:31Z)
Fast Monte Carlo Tree Diffusion: 100x Speedup via Parallel Sparse Planning [61.694143925237206]
Recently proposed Monte Carlo Tree Diffusion (MCTD) offers a promising solution by combining diffusion with tree-based search.<n>Fast-MCTD integrates two techniques: Parallel MCTD, which enables parallel rollouts via delayed tree updates and redundancy-aware selection; and Sparse MCTD, which reduces rollout length through trajectory coarsening.<n>Experiments show that Fast-MCTD achieves up to 100x speedup over standard MCTD while maintaining or improving planning performance.
arXiv Detail & Related papers (2025-06-11T08:17:40Z)
Test-Time Alignment of Discrete Diffusion Models with Sequential Monte Carlo [19.81513273510523]
We propose a training-free method based on Sequential Monte Carlo (SMC) to sample from the reward-aligned target distribution at the test time.<n>Our approach leverages twisted SMC with an approximate locally optimal proposal, obtained via a first-order Taylor expansion of the reward function.<n>To address the challenge of ill-defined gradients in discrete spaces, we incorporate a Gumbel-Softmax relaxation, enabling efficient gradient-based approximation within the discrete generative framework.
arXiv Detail & Related papers (2025-05-28T16:12:03Z)
Trust-Region Twisted Policy Improvement [8.73717644648873]
Monte-Carlo tree search (MCTS) has driven many recent breakthroughs in deep reinforcement learning (RL) We tailor Monte-Carlo planners specifically for RL by improving data generation within the planner through constrained action sampling and explicit terminal state handling. This leads to our Trust-Region Twisted SMC (TRT-SMC), which shows improved runtime and sample-efficiency over baseline MCTS and SMC methods in both discrete and continuous domains.
arXiv Detail & Related papers (2025-04-08T13:47:07Z)
Adding Additional Control to One-Step Diffusion with Joint Distribution Matching [58.37264951734603]
JDM is a novel approach that minimizes the reverse KL divergence between image-condition joint distributions. By deriving a tractable upper bound, JDM decouples fidelity learning from condition learning. This asymmetric distillation scheme enables our one-step student to handle controls unknown to the teacher model.
arXiv Detail & Related papers (2025-03-09T15:06:50Z)
Towards Widening The Distillation Bottleneck for Reasoning Models [39.22557129190619]
Distillation--post-training on LRMs-generated data--is a straightforward yet effective method to enhance the reasoning abilities of smaller models. We found that distilled long CoT data poses learning difficulty for small models and leads to the inheritance of biases. We propose constructing tree-based CoT data from scratch via Monte Carlo Tree Search.
arXiv Detail & Related papers (2025-03-03T12:17:36Z)
DSMoE: Matrix-Partitioned Experts with Dynamic Routing for Computation-Efficient Dense LLMs [70.91804882618243]
This paper proposes DSMoE, a novel approach that achieves sparsification by partitioning pre-trained FFN layers into computational blocks.<n>We implement adaptive expert routing using sigmoid activation and straight-through estimators, enabling tokens to flexibly access different aspects of model knowledge.<n>Experiments on LLaMA models demonstrate that under equivalent computational constraints, DSMoE achieves superior performance compared to existing pruning and MoE approaches.
arXiv Detail & Related papers (2025-02-18T02:37:26Z)
T-SCEND: Test-time Scalable MCTS-enhanced Diffusion Model [7.250494262573953]
Test-time Scalable MCTS-enhanced Diffusion Model (T-SCEND) is a novel framework that significantly improves diffusion model's reasoning capabilities.<n>T-SCEND integrates the denoising process with a novel hybrid Monte Carlo Tree Search.<n>We demonstrate the effectiveness of T-SCEND's training objective and scalable inference method.
arXiv Detail & Related papers (2025-02-04T04:07:48Z)
Boosting MCTS with Free Energy Minimization [0.0]
We propose a new planning framework that integrates Monte Carlo Tree Search (MCTS) with active inference objectives.<n>MCTS can be naturally extended to incorporate free energy minimization by blending expected rewards with information gain.<n>This synergy allows our planner to maintain coherent estimates of value and uncertainty throughout planning, without sacrificing computational tractability.
arXiv Detail & Related papers (2025-01-22T18:45:15Z)
Monte Carlo Planning for Stochastic Control on Constrained Markov Decision Processes [1.445706856497821]
This work defines an MDP framework, the textttSD-MDP, where we disentangle the causal structure of MDPs' transition and reward dynamics. We derive theoretical guarantees on the estimation error of the value function under an optimal policy by allowing independent value estimation from Monte Carlo sampling.
arXiv Detail & Related papers (2024-06-23T16:22:40Z)
Unleashing Network Potentials for Semantic Scene Completion [50.95486458217653]
This paper proposes a novel SSC framework - Adrial Modality Modulation Network (AMMNet) AMMNet introduces two core modules: a cross-modal modulation enabling the interdependence of gradient flows between modalities, and a customized adversarial training scheme leveraging dynamic gradient competition. Extensive experimental results demonstrate that AMMNet outperforms state-of-the-art SSC methods by a large margin.
arXiv Detail & Related papers (2024-03-12T11:48:49Z)
Spatial Attention-based Distribution Integration Network for Human Pose Estimation [0.8052382324386398]
We present the Spatial Attention-based Distribution Integration Network (SADI-NET) to improve the accuracy of localization. Our network consists of three efficient models: the receptive fortified module (RFM), spatial fusion module (SFM), and distribution learning module (DLM) Our model obtained a remarkable $92.10%$ percent accuracy on the MPII test dataset, demonstrating significant improvements over existing models and establishing state-of-the-art performance.
arXiv Detail & Related papers (2023-11-09T12:43:01Z)
Learning Energy-Based Prior Model with Diffusion-Amortized MCMC [89.95629196907082]
Common practice of learning latent space EBMs with non-convergent short-run MCMC for prior and posterior sampling is hindering the model from further progress. We introduce a simple but effective diffusion-based amortization method for long-run MCMC sampling and develop a novel learning algorithm for the latent space EBM based on it.
arXiv Detail & Related papers (2023-10-05T00:23:34Z)
Bayesian Decision Trees Inspired from Evolutionary Algorithms [64.80360020499555]
We propose a replacement of the Markov Chain Monte Carlo (MCMC) with an inherently parallel algorithm, the Sequential Monte Carlo (SMC) Experiments show that SMC combined with the Evolutionary Algorithms (EA) can produce more accurate results compared to MCMC in 100 times fewer iterations.
arXiv Detail & Related papers (2023-05-30T06:17:35Z)
When to Update Your Model: Constrained Model-based Reinforcement Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL) Our follow-up derived bounds reveal the relationship between model shifts and performance improvement. A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z)
Continuous Monte Carlo Graph Search [61.11769232283621]
Continuous Monte Carlo Graph Search ( CMCGS) is an extension of Monte Carlo Tree Search (MCTS) to online planning. CMCGS takes advantage of the insight that, during planning, sharing the same action policy between several states can yield high performance. It can be scaled up through parallelization, and it outperforms the Cross-Entropy Method (CEM) in continuous control with learned dynamics models.
arXiv Detail & Related papers (2022-10-04T07:34:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.