Related papers: Doubly Robust Monte Carlo Tree Search

Doubly Robust Monte Carlo Tree Search

URL: http://arxiv.org/abs/2502.01672v1
Date: Sat, 01 Feb 2025 19:32:46 GMT
Title: Doubly Robust Monte Carlo Tree Search
Authors: Manqing Liu, Andrew L. Beam,
Abstract summary: We present Doubly Robust Monte Carlo Tree Search (DR-MCTS), a novel algorithm that integrates Doubly Robust (DR) off-policy estimation into Monte Carlo Tree Search (MCTS)<n>Our approach combines MCTS rollouts with DR estimation, offering theoretical guarantees of unbiasedness and variance reduction under specified conditions.<n> Empirical evaluations in Tic-Tac-Toe and the partially observable VirtualHome environment demonstrate DR-MCTS's superior performance over standard MCTS.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present Doubly Robust Monte Carlo Tree Search (DR-MCTS), a novel algorithm that integrates Doubly Robust (DR) off-policy estimation into Monte Carlo Tree Search (MCTS) to enhance sample efficiency and decision quality in complex environments. Our approach introduces a hybrid estimator that combines MCTS rollouts with DR estimation, offering theoretical guarantees of unbiasedness and variance reduction under specified conditions. Empirical evaluations in Tic-Tac-Toe and the partially observable VirtualHome environment demonstrate DR-MCTS's superior performance over standard MCTS. In Tic-Tac-Toe, DR-MCTS achieves an 88% win rate compared to a 10% win rate for standard MCTS. In compound VirtualHome tasks, DR-MCTS attains a 20.7% success rate versus 10.3% for standard MCTS. Our scaling analysis reveals that DR-MCTS exhibits better sample efficiency, notably outperforming standard MCTS with larger language models while using a smaller model. These results underscore DR-MCTS's potential for efficient decision-making in complex, real-world scenarios where sample efficiency is paramount.

Related papers

Benchmarking Multi-modal Semantic Segmentation under Sensor Failures: Missing and Noisy Modality Robustness [61.87055159919641]
Multi-modal semantic segmentation (MMSS) addresses the limitations of single-modality data by integrating complementary information across modalities. Despite notable progress, a significant gap persists between research and real-world deployment due to variability and uncertainty in multi-modal data quality. We introduce a robustness benchmark that evaluates MMSS models under three scenarios: Entire-Missing Modality (EMM), Random-Missing Modality (RMM), and Noisy Modality (NM)
arXiv Detail & Related papers (2025-03-24T08:46:52Z)
CMCTS: A Constrained Monte Carlo Tree Search Framework for Mathematical Reasoning in Large Language Model [20.81044211340922]
This paper introduces the Constrained Monte Carlo Tree Search (CMCTS) framework to enhance the mathematical reasoning capabilities of Large Language Models (LLM)<n>By incorporating a constrained action space, Process Reward Model (PRM), and partial order rules, CMCTS effectively addresses the limitations in terms of state space diversity and action selection rationality.<n> Experimental results show that CMCTS performs outstandingly across multiple mathematical reasoning benchmarks.
arXiv Detail & Related papers (2025-02-16T15:39:57Z)
Monte Carlo Tree Diffusion for System 2 Planning [57.50512800900167]
We introduce Monte Carlo Tree Diffusion (MCTD), a novel framework that integrates the generative strength of diffusion models with the adaptive search capabilities of Monte Carlo Tree Search (MCTS) MCTD achieves the benefits of MCTS such as controlling exploration-exploitation trade-offs within the diffusion framework.
arXiv Detail & Related papers (2025-02-11T02:51:42Z)
Lipschitz Lifelong Monte Carlo Tree Search for Mastering Non-Stationary Tasks [19.42056439537988]
This paper presents LiZero for Lipschitz lifelong planning using Monte Carlo Tree Search (MCTS)<n>We propose a novel concept of adaptive UCT (aUCT) to transfer knowledge from a source task to the exploration/exploitation of a new task.<n>Experiment results show that LiZero significantly outperforms existing MCTS and lifelong learning baselines in terms of much faster convergence to optimal rewards.
arXiv Detail & Related papers (2025-02-02T02:45:20Z)
Rethinking the "Heatmap + Monte Carlo Tree Search" Paradigm for Solving Large Scale TSP [11.388824026057735]
"heatmap + Monte Carlo Tree Search (MCTS)" paradigm has recently gained traction for learning-based solutions. This paper revisits the "heatmap + Monte Carlo Tree Search (MCTS)" paradigm that has recently gained traction for learning-based solutions. Our findings demonstrate that a rudimentary and parameter-free heatmap, derived from the intrinsic $k$-nearest nature of the Travelling Salesman Problem, can rival or even surpass the performance of complicated heatmaps.
arXiv Detail & Related papers (2024-11-14T07:13:08Z)
MITA: Bridging the Gap between Model and Data for Test-time Adaptation [68.62509948690698]
Test-Time Adaptation (TTA) has emerged as a promising paradigm for enhancing the generalizability of models. We propose Meet-In-The-Middle based MITA, which introduces energy-based optimization to encourage mutual adaptation of the model and data from opposing directions.
arXiv Detail & Related papers (2024-10-12T07:02:33Z)
Persistent Sampling: Enhancing the Efficiency of Sequential Monte Carlo [0.0]
Sequential Monte Carlo (SMC) samplers are powerful tools for Bayesian inference but suffer from high computational costs. We introduce persistent sampling (PS), which retains SMC and constructs particles from all prior iterations.
arXiv Detail & Related papers (2024-07-30T10:34:40Z)
Discovering Mathematical Formulas from Data via GPT-guided Monte Carlo Tree Search [13.136507215114722]
We introduce SR-GPT, a novel algorithm for symbolic regression. It integrates Monte Carlo Tree Search (MCTS) with a Generative Pre-Trained Transformer (GPT)
arXiv Detail & Related papers (2024-01-24T07:47:04Z)
Monte Carlo Tree Search in the Presence of Transition Uncertainty [33.40823938089618]
We show that the discrepancy between the model and the actual environment can lead to significant performance degradation with standard MCTS. We develop Uncertainty Adapted MCTS (UA-MCTS), a more robust algorithm within the MCTS framework. We prove, in the corrupted bandit case, that adding uncertainty information to adapt UCB leads to tighter regret bound than standard UCB.
arXiv Detail & Related papers (2023-12-18T17:02:27Z)
Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion [56.38386580040991]
Consistency Trajectory Model (CTM) is a generalization of Consistency Models (CM) CTM enables the efficient combination of adversarial training and denoising score matching loss to enhance performance. Unlike CM, CTM's access to the score function can streamline the adoption of established controllable/conditional generation methods.
arXiv Detail & Related papers (2023-10-01T05:07:17Z)
Low-variance estimation in the Plackett-Luce model via quasi-Monte Carlo sampling [58.14878401145309]
We develop a novel approach to producing more sample-efficient estimators of expectations in the PL model. We illustrate our findings both theoretically and empirically using real-world recommendation data from Amazon Music and the Yahoo learning-to-rank challenge.
arXiv Detail & Related papers (2022-05-12T11:15:47Z)
Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios. We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z)
High-dimensional Inference and FDR Control for Simulated Markov Random Fields [1.9458156037869137]
This article explores statistical inference for simulated Markov random fields in high-dimensional settings. We introduce a methodology based on Maximum Chain Monte Carlo Likelihood Estimation with Elastic-net regularization.
arXiv Detail & Related papers (2022-02-11T13:49:08Z)
Monte Carlo Tree Search for high precision manufacturing [55.60116686945561]
We make use of an expert-based simulator and adapt the MCTS default policy to deal with the manufacturing process. Common reasons for this are that there is no efficient simulator of the process available or there exist problems in applying MCTS to the complex rules of the process.
arXiv Detail & Related papers (2021-07-28T14:56:17Z)
Towards a Competitive End-to-End Speech Recognition for CHiME-6 Dinner Party Transcription [73.66530509749305]
In this paper, we argue that, even in difficult cases, some end-to-end approaches show performance close to the hybrid baseline. We experimentally compare and analyze CTC-Attention versus RNN-Transducer approaches along with RNN versus Transformer architectures. Our best end-to-end model based on RNN-Transducer, together with improved beam search, reaches quality by only 3.8% WER abs. worse than the LF-MMI TDNN-F CHiME-6 Challenge baseline.
arXiv Detail & Related papers (2020-04-22T19:08:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.