Towards Understanding the Effects of Evolving the MCTS UCT Selection
Policy
- URL: http://arxiv.org/abs/2302.03352v1
- Date: Tue, 7 Feb 2023 09:50:55 GMT
- Title: Towards Understanding the Effects of Evolving the MCTS UCT Selection
Policy
- Authors: Fred Valdez Ameneyro and Edgar Galvan
- Abstract summary: Upper Confidence Bounds for Trees (UCT) is widely used in Monte Carlo Tree Search (MCTS)
We show how the evolution of the UCT might be beneficial in multimodal and deceptive scenarios.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Monte Carlo Tree Search (MCTS) is a sampling best-first method to search for
optimal decisions. The success of MCTS depends heavily on how the MCTS
statistical tree is built and the selection policy plays a fundamental role in
this. A particular selection policy that works particularly well, widely
adopted in MCTS, is the Upper Confidence Bounds for Trees, referred to as UCT.
Other more sophisticated bounds have been proposed by the community with the
goal to improve MCTS performance on particular problems. Thus, it is evident
that while the MCTS UCT behaves generally well, some variants might behave
better. As a result of this, multiple works have been proposed to evolve a
selection policy to be used in MCTS. Although all these works are inspiring,
none of them have carried out an in-depth analysis shedding light under what
circumstances an evolved alternative of MCTS UCT might be beneficial in MCTS
due to focusing on a single type of problem. In sharp contrast to this, in this
work we use five functions of different nature, going from a unimodal function,
covering multimodal functions to deceptive functions. We demonstrate how the
evolution of the MCTS UCT might be beneficial in multimodal and deceptive
scenarios, whereas the MCTS UCT is robust in unimodal scenarios and competitive
in the rest of the scenarios used in this study.
Related papers
- CoPS: Empowering LLM Agents with Provable Cross-Task Experience Sharing [70.25689961697523]
We propose a generalizable algorithm that enhances sequential reasoning by cross-task experience sharing and selection.
Our work bridges the gap between existing sequential reasoning paradigms and validates the effectiveness of leveraging cross-task experiences.
arXiv Detail & Related papers (2024-10-22T03:59:53Z) - Combinatorial Multivariant Multi-Armed Bandits with Applications to Episodic Reinforcement Learning and Beyond [58.39457881271146]
We introduce a novel framework of multi-armed bandits (CMAB) with multivariant and probabilistically triggering arms (CMAB-MT)
Compared with existing CMAB works, CMAB-MT not only enhances the modeling power but also allows improved results by leveraging distinct statistical properties for multivariant random variables.
Our framework can include many important problems as applications, such as episodic reinforcement learning (RL) and probabilistic maximum coverage for goods distribution.
arXiv Detail & Related papers (2024-06-03T14:48:53Z) - What Makes Multimodal In-Context Learning Work? [58.48612721156335]
We present a framework for investigating Multimodal ICL (M-ICL) in the context of Large Multimodal Models.
M-ICL primarily relies on text-driven mechanisms, showing little to no influence from the image modality.
We identify several biases and limitations of M-ICL that warrant consideration prior to deployment.
arXiv Detail & Related papers (2024-04-24T08:50:45Z) - Monte Carlo Tree Search in the Presence of Transition Uncertainty [33.40823938089618]
We show that the discrepancy between the model and the actual environment can lead to significant performance degradation with standard MCTS.
We develop Uncertainty Adapted MCTS (UA-MCTS), a more robust algorithm within the MCTS framework.
We prove, in the corrupted bandit case, that adding uncertainty information to adapt UCB leads to tighter regret bound than standard UCB.
arXiv Detail & Related papers (2023-12-18T17:02:27Z) - An Analysis on the Effects of Evolving the Monte Carlo Tree Search Upper
Confidence for Trees Selection Policy on Unimodal, Multimodal and Deceptive
Landscapes [0.0]
The Monte Carlo Tree Search (MCTS) is a best-first sampling method employed in the search for optimal decisions.
A selection policy that works particularly well in MCTS is the Upper Confidence Bounds for Trees, referred to as UCT.
This work explores the use of five functions of different natures, ranging from unimodal to multimodal and deceptive functions.
arXiv Detail & Related papers (2023-11-21T20:40:34Z) - Exploring Progress in Multivariate Time Series Forecasting: Comprehensive Benchmarking and Heterogeneity Analysis [70.78170766633039]
We address the need for means of assessing MTS forecasting proposals reliably and fairly.
BasicTS+ is a benchmark designed to enable fair, comprehensive, and reproducible comparison of MTS forecasting solutions.
We apply BasicTS+ along with rich datasets to assess the capabilities of more than 45 MTS forecasting solutions.
arXiv Detail & Related papers (2023-10-09T19:52:22Z) - Multimodal Chain-of-Thought Reasoning in Language Models [94.70184390935661]
We propose Multimodal-CoT that incorporates language (text) and vision (images) modalities into a two-stage framework.
Experimental results on ScienceQA and A-OKVQA benchmark datasets show the effectiveness of our proposed approach.
arXiv Detail & Related papers (2023-02-02T07:51:19Z) - Evolving the MCTS Upper Confidence Bounds for Trees Using a
Semantic-inspired Evolutionary Algorithm in the Game of Carcassonne [0.0]
We propose a Semantic-inspired Evolutionary Algorithm in Monte Carlo Tree Search (MCTS)
We use Evolutionary Algorithms (EAs) to evolve mathematical expressions with the goal to substitute the Upper Confidence Bounds for Trees formula.
We show how SIEA-MCTS is able to successfully evolve mathematical expressions that yield better or competitive results compared to UCT without the need of tuning these evolved expressions.
arXiv Detail & Related papers (2022-08-29T13:31:06Z) - CAMEO: Curiosity Augmented Metropolis for Exploratory Optimal Policies [62.39667564455059]
We consider and study a distribution of optimal policies.
In experimental simulations we show that CAMEO indeed obtains policies that all solve classic control problems.
We further show that the different policies we sample present different risk profiles, corresponding to interesting practical applications in interpretability.
arXiv Detail & Related papers (2022-05-19T09:48:56Z) - On the Evolution of the MCTS Upper Confidence Bounds for Trees by Means
of Evolutionary Algorithms in the Game of Carcassonne [0.0]
Monte Carlo Tree Search (MCTS) is a sampling best-first method to search for optimal decisions.
We use Evolutionary Algorithms (EAs) to evolve mathematical expressions with the goal to substitute the Upper Confidence Bounds for Trees (UCT) mathematical expression.
We show how the ES-MCTS controller, is able to outperform all these 10 intelligent controllers, including robust UCT controllers.
arXiv Detail & Related papers (2021-12-17T18:06:21Z) - Monte Carlo Tree Search: A Review of Recent Modifications and
Applications [0.17205106391379024]
Monte Carlo Tree Search (MCTS) is a powerful approach to designing game-playing bots or solving sequential decision problems.
The method relies on intelligent tree search that balances exploration and exploitation.
The method has become a state-of-the-art technique for games, however, in more complex games.
arXiv Detail & Related papers (2021-03-08T17:44:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.