Single-Agent Optimization Through Policy Iteration Using Monte-Carlo
Tree Search
- URL: http://arxiv.org/abs/2005.11335v1
- Date: Fri, 22 May 2020 18:02:36 GMT
- Title: Single-Agent Optimization Through Policy Iteration Using Monte-Carlo
Tree Search
- Authors: Arta Seify and Michael Buro
- Abstract summary: Combination of Monte-Carlo Tree Search (MCTS) and deep reinforcement learning is state-of-the-art in two-player perfect-information games.
We describe a search algorithm that uses a variant of MCTS which we enhanced by 1) a novel action value normalization mechanism for games with potentially unbounded rewards, 2) defining a virtual loss function that enables effective search parallelization, and 3) a policy network, trained by generations of self-play, to guide the search.
- Score: 8.22379888383833
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The combination of Monte-Carlo Tree Search (MCTS) and deep reinforcement
learning is state-of-the-art in two-player perfect-information games. In this
paper, we describe a search algorithm that uses a variant of MCTS which we
enhanced by 1) a novel action value normalization mechanism for games with
potentially unbounded rewards (which is the case in many optimization
problems), 2) defining a virtual loss function that enables effective search
parallelization, and 3) a policy network, trained by generations of self-play,
to guide the search. We gauge the effectiveness of our method in "SameGame"---a
popular single-player test domain. Our experimental results indicate that our
method outperforms baseline algorithms on several board sizes. Additionally, it
is competitive with state-of-the-art search algorithms on a public set of
positions.
Related papers
- Tree Search for Language Model Agents [69.43007235771383]
We propose an inference-time search algorithm for LM agents to perform exploration and multi-step planning in interactive web environments.
Our approach is a form of best-first tree search that operates within the actual environment space.
It is the first tree search algorithm for LM agents that shows effectiveness on realistic web tasks.
arXiv Detail & Related papers (2024-07-01T17:07:55Z) - Injecting Combinatorial Optimization into MCTS: Application to the Board Game boop [0.0]
Combinatorial Optimization and Monte Carlo Tree Search can be combined efficiently.
Our method beats 96% of the time the Monte Carlo Tree Search algorithm baseline.
We opposed our AI method against human players on the Board Game Arena platform.
arXiv Detail & Related papers (2024-06-13T02:55:08Z) - Playing Board Games with the Predict Results of Beam Search Algorithm [0.0]
This paper introduces a novel algorithm for two-player deterministic games with perfect information, which we call PROBS (Predict Results of Beam Search)
We evaluate the performance of our algorithm across a selection of board games, where it consistently demonstrates an increased winning ratio against baseline opponents.
A key result of this study is that the PROBS algorithm operates effectively, even when the beam search size is considerably smaller than the average number of turns in the game.
arXiv Detail & Related papers (2024-04-23T20:10:27Z) - Monte-Carlo Tree Search for Multi-Agent Pathfinding: Preliminary Results [60.4817465598352]
We introduce an original variant of Monte-Carlo Tree Search (MCTS) tailored to multi-agent pathfinding.
Specifically, we use individual paths to assist the agents with the the goal-reaching behavior.
We also use a dedicated decomposition technique to reduce the branching factor of the tree search procedure.
arXiv Detail & Related papers (2023-07-25T12:33:53Z) - Proof Number Based Monte-Carlo Tree Search [1.93674821880689]
This paper proposes a new game-search algorithm, PN-MCTS, which combines Monte-Carlo Tree Search (MCTS) and Proof-Number Search (PNS)
We define three areas where the additional knowledge provided by the proof and disproof numbers gathered in MCTS trees might be used.
Experiments show that PN-MCTS is able to outperform MCTS in all tested game domains, achieving win rates up to 96.2% for Lines of Action.
arXiv Detail & Related papers (2023-03-16T16:27:07Z) - Spending Thinking Time Wisely: Accelerating MCTS with Virtual Expansions [89.89612827542972]
This paper proposes a variant of Monte-Carlo tree search (MCTS) that spends more search time on harder states and less search time on simpler states adaptively.
We evaluate the performance and computations on $9 times 9$ Go board games and Atari games.
Experiments show that our method can achieve comparable performances to the original search algorithm while requiring less than $50%$ search time on average.
arXiv Detail & Related papers (2022-10-23T06:39:20Z) - A Ranking Game for Imitation Learning [22.028680861819215]
We treat imitation as a two-player ranking-based Stackelberg game between a $textitpolicy$ and a $textitreward$ function.
This game encompasses a large subset of both inverse reinforcement learning (IRL) methods and methods which learn from offline preferences.
We theoretically analyze the requirements of the loss function used for ranking policy performances to facilitate near-optimal imitation learning at equilibrium.
arXiv Detail & Related papers (2022-02-07T19:38:22Z) - No-Regret Learning in Time-Varying Zero-Sum Games [99.86860277006318]
Learning from repeated play in a fixed zero-sum game is a classic problem in game theory and online learning.
We develop a single parameter-free algorithm that simultaneously enjoys favorable guarantees under three performance measures.
Our algorithm is based on a two-layer structure with a meta-algorithm learning over a group of black-box base-learners satisfying a certain property.
arXiv Detail & Related papers (2022-01-30T06:10:04Z) - Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit
Feedback [51.21673420940346]
Combinatorial bandits generalize multi-armed bandits, where the agent chooses sets of arms and observes a noisy reward for each arm contained in the chosen set.
We focus on the pure-exploration problem of identifying the best arm with fixed confidence, as well as a more general setting, where the structure of the answer set differs from the one of the action set.
Based on a projection-free online learning algorithm for finite polytopes, it is the first computationally efficient algorithm which is convexally optimal and has competitive empirical performance.
arXiv Detail & Related papers (2021-01-21T10:35:09Z) - Learning to Stop: Dynamic Simulation Monte-Carlo Tree Search [66.34387649910046]
Monte Carlo tree search (MCTS) has achieved state-of-the-art results in many domains such as Go and Atari games.
We propose to achieve this goal by predicting the uncertainty of the current searching status and use the result to decide whether we should stop searching.
arXiv Detail & Related papers (2020-12-14T19:49:25Z) - Monte Carlo Tree Search for a single target search game on a 2-D lattice [0.0]
This project imagines a game in which an AI player searches for a stationary target within a 2-D lattice.
We analyze its behavior with different target distributions and compare its efficiency to the Levy Flight Search, a model for animal foraging behavior.
arXiv Detail & Related papers (2020-11-29T01:07:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.