Related papers: Monte Carlo Tree Search in the Presence of Transition Uncertainty

Monte Carlo Tree Search in the Presence of Transition Uncertainty

URL: http://arxiv.org/abs/2312.11348v1
Date: Mon, 18 Dec 2023 17:02:27 GMT
Title: Monte Carlo Tree Search in the Presence of Transition Uncertainty
Authors: Farnaz Kohankhaki, Kiarash Aghakasiri, Hongming Zhang, Ting-Han Wei, Chao Gao, Martin M\"uller
Abstract summary: We show that the discrepancy between the model and the actual environment can lead to significant performance degradation with standard MCTS. We develop Uncertainty Adapted MCTS (UA-MCTS), a more robust algorithm within the MCTS framework. We prove, in the corrupted bandit case, that adding uncertainty information to adapt UCB leads to tighter regret bound than standard UCB.
Score: 33.40823938089618
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Monte Carlo Tree Search (MCTS) is an immensely popular search-based framework used for decision making. It is traditionally applied to domains where a perfect simulation model of the environment is available. We study and improve MCTS in the context where the environment model is given but imperfect. We show that the discrepancy between the model and the actual environment can lead to significant performance degradation with standard MCTS. We therefore develop Uncertainty Adapted MCTS (UA-MCTS), a more robust algorithm within the MCTS framework. We estimate the transition uncertainty in the given model, and direct the search towards more certain transitions in the state space. We modify all four MCTS phases to improve the search behavior by considering these estimates. We prove, in the corrupted bandit case, that adding uncertainty information to adapt UCB leads to tighter regret bound than standard UCB. Empirically, we evaluate UA-MCTS and its individual components on the deterministic domains from the MinAtar test suite. Our results demonstrate that UA-MCTS strongly improves MCTS in the presence of model transition errors.

Related papers

Benchmarking Multi-modal Semantic Segmentation under Sensor Failures: Missing and Noisy Modality Robustness [61.87055159919641]
Multi-modal semantic segmentation (MMSS) addresses the limitations of single-modality data by integrating complementary information across modalities. Despite notable progress, a significant gap persists between research and real-world deployment due to variability and uncertainty in multi-modal data quality. We introduce a robustness benchmark that evaluates MMSS models under three scenarios: Entire-Missing Modality (EMM), Random-Missing Modality (RMM), and Noisy Modality (NM)
arXiv Detail & Related papers (2025-03-24T08:46:52Z)
Reward-Centered ReST-MCTS: A Robust Decision-Making Framework for Robotic Manipulation in High Uncertainty Environments [0.0]
This paper introduces Reward-Centered ReST-MCTS, a novel framework that enhances Monte Carlo Tree Search. The core of our approach is the Rewarding Center, which refines search trajectories by dynamically assigning partial rewards. Compared to baseline methods, our framework achieves a 2-4% accuracy improvement while maintaining computational feasibility.
arXiv Detail & Related papers (2025-03-07T08:25:04Z)
Monte Carlo Tree Diffusion for System 2 Planning [57.50512800900167]
We introduce Monte Carlo Tree Diffusion (MCTD), a novel framework that integrates the generative strength of diffusion models with the adaptive search capabilities of Monte Carlo Tree Search (MCTS) MCTD achieves the benefits of MCTS such as controlling exploration-exploitation trade-offs within the diffusion framework.
arXiv Detail & Related papers (2025-02-11T02:51:42Z)
Doubly Robust Monte Carlo Tree Search [0.0]
We present Doubly Robust Monte Carlo Tree Search (DR-MCTS), a novel algorithm that integrates Doubly Robust (DR) off-policy estimation into Monte Carlo Tree Search (MCTS) Our approach combines MCTS rollouts with DR estimation, offering theoretical guarantees of unbiasedness and variance reduction under specified conditions. Empirical evaluations in Tic-Tac-Toe and the partially observable VirtualHome environment demonstrate DR-MCTS's superior performance over standard MCTS.
arXiv Detail & Related papers (2025-02-01T19:32:46Z)
Diagnosing and Rectifying Fake OOD Invariance: A Restructured Causal Approach [51.012396632595554]
Invariant representation learning (IRL) encourages the prediction from invariant causal features to labels de-confounded from the environments. Recent theoretical results verified that some causal features recovered by IRLs merely pretend domain-invariantly in the training environments but fail in unseen domains. We develop an approach based on conditional mutual information with respect to RS-SCM, then rigorously rectify the spurious and fake invariant effects.
arXiv Detail & Related papers (2023-12-15T12:58:05Z)
Exploring Progress in Multivariate Time Series Forecasting: Comprehensive Benchmarking and Heterogeneity Analysis [70.78170766633039]
We address the need for means of assessing MTS forecasting proposals reliably and fairly. BasicTS+ is a benchmark designed to enable fair, comprehensive, and reproducible comparison of MTS forecasting solutions. We apply BasicTS+ along with rich datasets to assess the capabilities of more than 45 MTS forecasting solutions.
arXiv Detail & Related papers (2023-10-09T19:52:22Z)
Towards Real-World Test-Time Adaptation: Tri-Net Self-Training with Balanced Normalization [52.03927261909813]
Existing works mainly consider real-world test-time adaptation under non-i.i.d. data stream and continual domain shift. We argue failure of state-of-the-art methods is first caused by indiscriminately adapting normalization layers to imbalanced testing data. The final TTA model, termed as TRIBE, is built upon a tri-net architecture with balanced batchnorm layers.
arXiv Detail & Related papers (2023-09-26T14:06:26Z)
On Pitfalls of Test-Time Adaptation [82.8392232222119]
Test-Time Adaptation (TTA) has emerged as a promising approach for tackling the robustness challenge under distribution shifts. We present TTAB, a test-time adaptation benchmark that encompasses ten state-of-the-art algorithms, a diverse array of distribution shifts, and two evaluation protocols.
arXiv Detail & Related papers (2023-06-06T09:35:29Z)
Continuous Monte Carlo Graph Search [61.11769232283621]
Continuous Monte Carlo Graph Search ( CMCGS) is an extension of Monte Carlo Tree Search (MCTS) to online planning. CMCGS takes advantage of the insight that, during planning, sharing the same action policy between several states can yield high performance. It can be scaled up through parallelization, and it outperforms the Cross-Entropy Method (CEM) in continuous control with learned dynamics models.
arXiv Detail & Related papers (2022-10-04T07:34:06Z)
Evolving the MCTS Upper Confidence Bounds for Trees Using a Semantic-inspired Evolutionary Algorithm in the Game of Carcassonne [0.0]
We propose a Semantic-inspired Evolutionary Algorithm in Monte Carlo Tree Search (MCTS) We use Evolutionary Algorithms (EAs) to evolve mathematical expressions with the goal to substitute the Upper Confidence Bounds for Trees formula. We show how SIEA-MCTS is able to successfully evolve mathematical expressions that yield better or competitive results compared to UCT without the need of tuning these evolved expressions.
arXiv Detail & Related papers (2022-08-29T13:31:06Z)
Decision Making in Non-Stationary Environments with Policy-Augmented Monte Carlo Tree Search [2.20439695290991]
Decision-making under uncertainty (DMU) is present in many important problems. An open challenge is DMU in non-stationary environments, where the dynamics of the environment can change over time. We present a novel hybrid decision-making approach that combines the strengths of RL and planning while mitigating their weaknesses.
arXiv Detail & Related papers (2022-02-25T22:31:37Z)
MCTSteg: A Monte Carlo Tree Search-based Reinforcement Learning Framework for Universal Non-additive Steganography [40.622844703837046]
We propose an automatic non-additive steganographic distortion learning framework called MCTSteg. Due to its self-learning characteristic and domain-independent reward function, MCTSteg has become the first reported universal non-additive steganographic framework.
arXiv Detail & Related papers (2021-03-25T09:12:08Z)
Pairwise Covariates-adjusted Block Model for Community Detection [9.423321226644891]
Community detection is one of the most fundamental problems in network study. We introduce a pairwise co-adjusted generalization block model (PCABM) We show that PCABM is consistent under suitable sparsity conditions.
arXiv Detail & Related papers (2018-07-10T03:37:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.