Doubly Robust Monte Carlo Tree Search
- URL: http://arxiv.org/abs/2502.01672v1
- Date: Sat, 01 Feb 2025 19:32:46 GMT
- Title: Doubly Robust Monte Carlo Tree Search
- Authors: Manqing Liu, Andrew L. Beam,
- Abstract summary: We present Doubly Robust Monte Carlo Tree Search (DR-MCTS), a novel algorithm that integrates Doubly Robust (DR) off-policy estimation into Monte Carlo Tree Search (MCTS)
Our approach combines MCTS rollouts with DR estimation, offering theoretical guarantees of unbiasedness and variance reduction under specified conditions.
Empirical evaluations in Tic-Tac-Toe and the partially observable VirtualHome environment demonstrate DR-MCTS's superior performance over standard MCTS.
- Score: 0.0
- License:
- Abstract: We present Doubly Robust Monte Carlo Tree Search (DR-MCTS), a novel algorithm that integrates Doubly Robust (DR) off-policy estimation into Monte Carlo Tree Search (MCTS) to enhance sample efficiency and decision quality in complex environments. Our approach introduces a hybrid estimator that combines MCTS rollouts with DR estimation, offering theoretical guarantees of unbiasedness and variance reduction under specified conditions. Empirical evaluations in Tic-Tac-Toe and the partially observable VirtualHome environment demonstrate DR-MCTS's superior performance over standard MCTS. In Tic-Tac-Toe, DR-MCTS achieves an 88% win rate compared to a 10% win rate for standard MCTS. In compound VirtualHome tasks, DR-MCTS attains a 20.7% success rate versus 10.3% for standard MCTS. Our scaling analysis reveals that DR-MCTS exhibits better sample efficiency, notably outperforming standard MCTS with larger language models while using a smaller model. These results underscore DR-MCTS's potential for efficient decision-making in complex, real-world scenarios where sample efficiency is paramount.
Related papers
- Monte Carlo Tree Diffusion for System 2 Planning [57.50512800900167]
We introduce Monte Carlo Tree Diffusion (MCTD), a novel framework that integrates the generative strength of diffusion models with the adaptive search capabilities of Monte Carlo Tree Search (MCTS)
MCTD achieves the benefits of MCTS such as controlling exploration-exploitation trade-offs within the diffusion framework.
arXiv Detail & Related papers (2025-02-11T02:51:42Z) - Lipschitz Lifelong Monte Carlo Tree Search for Mastering Non-Stationary Tasks [19.42056439537988]
This paper presents LiZero for Lipschitz lifelong planning using Monte Carlo Tree Search (MCTS)
We propose a novel concept of adaptive UCT (aUCT) to transfer knowledge from a source task to the exploration/exploitation of a new task.
Experiment results show that LiZero significantly outperforms existing MCTS and lifelong learning baselines in terms of much faster convergence to optimal rewards.
arXiv Detail & Related papers (2025-02-02T02:45:20Z) - Rethinking the "Heatmap + Monte Carlo Tree Search" Paradigm for Solving Large Scale TSP [11.388824026057735]
"heatmap + Monte Carlo Tree Search (MCTS)" paradigm has recently gained traction for learning-based solutions.
This paper revisits the "heatmap + Monte Carlo Tree Search (MCTS)" paradigm that has recently gained traction for learning-based solutions.
Our findings demonstrate that a rudimentary and parameter-free heatmap, derived from the intrinsic $k$-nearest nature of the Travelling Salesman Problem, can rival or even surpass the performance of complicated heatmaps.
arXiv Detail & Related papers (2024-11-14T07:13:08Z) - MITA: Bridging the Gap between Model and Data for Test-time Adaptation [68.62509948690698]
Test-Time Adaptation (TTA) has emerged as a promising paradigm for enhancing the generalizability of models.
We propose Meet-In-The-Middle based MITA, which introduces energy-based optimization to encourage mutual adaptation of the model and data from opposing directions.
arXiv Detail & Related papers (2024-10-12T07:02:33Z) - Discovering Mathematical Formulas from Data via GPT-guided Monte Carlo
Tree Search [13.136507215114722]
We introduce SR-GPT, a novel algorithm for symbolic regression.
It integrates Monte Carlo Tree Search (MCTS) with a Generative Pre-Trained Transformer (GPT)
arXiv Detail & Related papers (2024-01-24T07:47:04Z) - Monte Carlo Tree Search in the Presence of Transition Uncertainty [33.40823938089618]
We show that the discrepancy between the model and the actual environment can lead to significant performance degradation with standard MCTS.
We develop Uncertainty Adapted MCTS (UA-MCTS), a more robust algorithm within the MCTS framework.
We prove, in the corrupted bandit case, that adding uncertainty information to adapt UCB leads to tighter regret bound than standard UCB.
arXiv Detail & Related papers (2023-12-18T17:02:27Z) - Low-variance estimation in the Plackett-Luce model via quasi-Monte Carlo
sampling [58.14878401145309]
We develop a novel approach to producing more sample-efficient estimators of expectations in the PL model.
We illustrate our findings both theoretically and empirically using real-world recommendation data from Amazon Music and the Yahoo learning-to-rank challenge.
arXiv Detail & Related papers (2022-05-12T11:15:47Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - Monte Carlo Tree Search for high precision manufacturing [55.60116686945561]
We make use of an expert-based simulator and adapt the MCTS default policy to deal with the manufacturing process.
Common reasons for this are that there is no efficient simulator of the process available or there exist problems in applying MCTS to the complex rules of the process.
arXiv Detail & Related papers (2021-07-28T14:56:17Z) - Towards a Competitive End-to-End Speech Recognition for CHiME-6 Dinner
Party Transcription [73.66530509749305]
In this paper, we argue that, even in difficult cases, some end-to-end approaches show performance close to the hybrid baseline.
We experimentally compare and analyze CTC-Attention versus RNN-Transducer approaches along with RNN versus Transformer architectures.
Our best end-to-end model based on RNN-Transducer, together with improved beam search, reaches quality by only 3.8% WER abs. worse than the LF-MMI TDNN-F CHiME-6 Challenge baseline.
arXiv Detail & Related papers (2020-04-22T19:08:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.