Learning to Stop: Dynamic Simulation Monte-Carlo Tree Search
- URL: http://arxiv.org/abs/2012.07910v1
- Date: Mon, 14 Dec 2020 19:49:25 GMT
- Title: Learning to Stop: Dynamic Simulation Monte-Carlo Tree Search
- Authors: Li-Cheng Lan, Meng-Yu Tsai, Ti-Rong Wu, I-Chen Wu, Cho-Jui Hsieh
- Abstract summary: Monte Carlo tree search (MCTS) has achieved state-of-the-art results in many domains such as Go and Atari games.
We propose to achieve this goal by predicting the uncertainty of the current searching status and use the result to decide whether we should stop searching.
- Score: 66.34387649910046
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Monte Carlo tree search (MCTS) has achieved state-of-the-art results in many
domains such as Go and Atari games when combining with deep neural networks
(DNNs). When more simulations are executed, MCTS can achieve higher performance
but also requires enormous amounts of CPU and GPU resources. However, not all
states require a long searching time to identify the best action that the agent
can find. For example, in 19x19 Go and NoGo, we found that for more than half
of the states, the best action predicted by DNN remains unchanged even after
searching 2 minutes. This implies that a significant amount of resources can be
saved if we are able to stop the searching earlier when we are confident with
the current searching result. In this paper, we propose to achieve this goal by
predicting the uncertainty of the current searching status and use the result
to decide whether we should stop searching. With our algorithm, called Dynamic
Simulation MCTS (DS-MCTS), we can speed up a NoGo agent trained by AlphaZero
2.5 times faster while maintaining a similar winning rate. Also, under the same
average simulation count, our method can achieve a 61% winning rate against the
original program.
Related papers
- Tree Search for Language Model Agents [69.43007235771383]
We propose an inference-time search algorithm for LM agents to perform exploration and multi-step planning in interactive web environments.
Our approach is a form of best-first tree search that operates within the actual environment space.
It is the first tree search algorithm for LM agents that shows effectiveness on realistic web tasks.
arXiv Detail & Related papers (2024-07-01T17:07:55Z) - AlphaZeroES: Direct score maximization outperforms planning loss minimization [61.17702187957206]
Planning at execution time has been shown to dramatically improve performance for agents in both single-agent and multi-agent settings.
A family of approaches to planning at execution time are AlphaZero and its variants, which use Monte Carlo Tree Search together with a neural network that guides the search by predicting state values and action probabilities.
We show that, across multiple environments, directly maximizing the episode score outperforms minimizing the planning loss.
arXiv Detail & Related papers (2024-06-12T23:00:59Z) - Spending Thinking Time Wisely: Accelerating MCTS with Virtual Expansions [89.89612827542972]
This paper proposes a variant of Monte-Carlo tree search (MCTS) that spends more search time on harder states and less search time on simpler states adaptively.
We evaluate the performance and computations on $9 times 9$ Go board games and Atari games.
Experiments show that our method can achieve comparable performances to the original search algorithm while requiring less than $50%$ search time on average.
arXiv Detail & Related papers (2022-10-23T06:39:20Z) - Batch Monte Carlo Tree Search [9.114710429587479]
We build on this property to propose Monte Carlo Tree Search algorithms using batched inferences.
The transposition table contains the results of the inferences while the search tree contains the statistics of Monte Carlo Tree Search.
We also propose to analyze multiple algorithms that improve the search: the $mu$ FPU, the Virtual Mean, the Iteration and the Second Move Lasts.
arXiv Detail & Related papers (2021-04-09T09:54:21Z) - Dual Monte Carlo Tree Search [0.0]
We show that Dual MCTS performs better than one of the most widely used neural MCTS algorithms, AlphaZero, for various symmetric and asymmetric games.
Dual MCTS uses two different search trees, a single deep neural network, and a new update technique for the search trees using a combination of the PUCB, a sliding-window, and the epsilon-greedy algorithm.
arXiv Detail & Related papers (2021-03-21T23:34:11Z) - Resource Allocation in Multi-armed Bandit Exploration: Overcoming
Sublinear Scaling with Adaptive Parallelism [107.48538091418412]
We study exploration in multi-armed bandits when we have access to a divisible resource that can be allocated in varying amounts to arm pulls.
We focus in particular on the allocation of distributed computing resources, where we may obtain results faster by allocating more resources per pull.
arXiv Detail & Related papers (2020-10-31T18:19:29Z) - On Effective Parallelization of Monte Carlo Tree Search [51.15940034629022]
Monte Carlo Tree Search (MCTS) is computationally expensive as it requires a substantial number of rollouts to construct the search tree.
How to design effective parallel MCTS algorithms has not been systematically studied and remains poorly understood.
We demonstrate how proposed necessary conditions can be adopted to design more effective parallel MCTS algorithms.
arXiv Detail & Related papers (2020-06-15T21:36:00Z) - Single-Agent Optimization Through Policy Iteration Using Monte-Carlo
Tree Search [8.22379888383833]
Combination of Monte-Carlo Tree Search (MCTS) and deep reinforcement learning is state-of-the-art in two-player perfect-information games.
We describe a search algorithm that uses a variant of MCTS which we enhanced by 1) a novel action value normalization mechanism for games with potentially unbounded rewards, 2) defining a virtual loss function that enables effective search parallelization, and 3) a policy network, trained by generations of self-play, to guide the search.
arXiv Detail & Related papers (2020-05-22T18:02:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.