Monte Carlo Tree Search in the Presence of Transition Uncertainty
- URL: http://arxiv.org/abs/2312.11348v1
- Date: Mon, 18 Dec 2023 17:02:27 GMT
- Title: Monte Carlo Tree Search in the Presence of Transition Uncertainty
- Authors: Farnaz Kohankhaki, Kiarash Aghakasiri, Hongming Zhang, Ting-Han Wei,
Chao Gao, Martin M\"uller
- Abstract summary: We show that the discrepancy between the model and the actual environment can lead to significant performance degradation with standard MCTS.
We develop Uncertainty Adapted MCTS (UA-MCTS), a more robust algorithm within the MCTS framework.
We prove, in the corrupted bandit case, that adding uncertainty information to adapt UCB leads to tighter regret bound than standard UCB.
- Score: 33.40823938089618
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Monte Carlo Tree Search (MCTS) is an immensely popular search-based framework
used for decision making. It is traditionally applied to domains where a
perfect simulation model of the environment is available. We study and improve
MCTS in the context where the environment model is given but imperfect. We show
that the discrepancy between the model and the actual environment can lead to
significant performance degradation with standard MCTS. We therefore develop
Uncertainty Adapted MCTS (UA-MCTS), a more robust algorithm within the MCTS
framework. We estimate the transition uncertainty in the given model, and
direct the search towards more certain transitions in the state space. We
modify all four MCTS phases to improve the search behavior by considering these
estimates. We prove, in the corrupted bandit case, that adding uncertainty
information to adapt UCB leads to tighter regret bound than standard UCB.
Empirically, we evaluate UA-MCTS and its individual components on the
deterministic domains from the MinAtar test suite. Our results demonstrate that
UA-MCTS strongly improves MCTS in the presence of model transition errors.
Related papers
- Diagnosing and Rectifying Fake OOD Invariance: A Restructured Causal
Approach [51.012396632595554]
Invariant representation learning (IRL) encourages the prediction from invariant causal features to labels de-confounded from the environments.
Recent theoretical results verified that some causal features recovered by IRLs merely pretend domain-invariantly in the training environments but fail in unseen domains.
We develop an approach based on conditional mutual information with respect to RS-SCM, then rigorously rectify the spurious and fake invariant effects.
arXiv Detail & Related papers (2023-12-15T12:58:05Z) - Exploring Progress in Multivariate Time Series Forecasting: Comprehensive Benchmarking and Heterogeneity Analysis [70.78170766633039]
We address the need for means of assessing MTS forecasting proposals reliably and fairly.
BasicTS+ is a benchmark designed to enable fair, comprehensive, and reproducible comparison of MTS forecasting solutions.
We apply BasicTS+ along with rich datasets to assess the capabilities of more than 45 MTS forecasting solutions.
arXiv Detail & Related papers (2023-10-09T19:52:22Z) - Towards Real-World Test-Time Adaptation: Tri-Net Self-Training with
Balanced Normalization [52.03927261909813]
Existing works mainly consider real-world test-time adaptation under non-i.i.d. data stream and continual domain shift.
We argue failure of state-of-the-art methods is first caused by indiscriminately adapting normalization layers to imbalanced testing data.
The final TTA model, termed as TRIBE, is built upon a tri-net architecture with balanced batchnorm layers.
arXiv Detail & Related papers (2023-09-26T14:06:26Z) - On Pitfalls of Test-Time Adaptation [82.8392232222119]
Test-Time Adaptation (TTA) has emerged as a promising approach for tackling the robustness challenge under distribution shifts.
We present TTAB, a test-time adaptation benchmark that encompasses ten state-of-the-art algorithms, a diverse array of distribution shifts, and two evaluation protocols.
arXiv Detail & Related papers (2023-06-06T09:35:29Z) - Continuous Monte Carlo Graph Search [61.11769232283621]
Continuous Monte Carlo Graph Search ( CMCGS) is an extension of Monte Carlo Tree Search (MCTS) to online planning.
CMCGS takes advantage of the insight that, during planning, sharing the same action policy between several states can yield high performance.
It can be scaled up through parallelization, and it outperforms the Cross-Entropy Method (CEM) in continuous control with learned dynamics models.
arXiv Detail & Related papers (2022-10-04T07:34:06Z) - Evolving the MCTS Upper Confidence Bounds for Trees Using a
Semantic-inspired Evolutionary Algorithm in the Game of Carcassonne [0.0]
We propose a Semantic-inspired Evolutionary Algorithm in Monte Carlo Tree Search (MCTS)
We use Evolutionary Algorithms (EAs) to evolve mathematical expressions with the goal to substitute the Upper Confidence Bounds for Trees formula.
We show how SIEA-MCTS is able to successfully evolve mathematical expressions that yield better or competitive results compared to UCT without the need of tuning these evolved expressions.
arXiv Detail & Related papers (2022-08-29T13:31:06Z) - Decision Making in Non-Stationary Environments with Policy-Augmented
Monte Carlo Tree Search [2.20439695290991]
Decision-making under uncertainty (DMU) is present in many important problems.
An open challenge is DMU in non-stationary environments, where the dynamics of the environment can change over time.
We present a novel hybrid decision-making approach that combines the strengths of RL and planning while mitigating their weaknesses.
arXiv Detail & Related papers (2022-02-25T22:31:37Z) - MCTSteg: A Monte Carlo Tree Search-based Reinforcement Learning
Framework for Universal Non-additive Steganography [40.622844703837046]
We propose an automatic non-additive steganographic distortion learning framework called MCTSteg.
Due to its self-learning characteristic and domain-independent reward function, MCTSteg has become the first reported universal non-additive steganographic framework.
arXiv Detail & Related papers (2021-03-25T09:12:08Z) - Pairwise Covariates-adjusted Block Model for Community Detection [9.423321226644891]
Community detection is one of the most fundamental problems in network study.
We introduce a pairwise co-adjusted generalization block model (PCABM)
We show that PCABM is consistent under suitable sparsity conditions.
arXiv Detail & Related papers (2018-07-10T03:37:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.