Related papers: Proof Number Based Monte-Carlo Tree Search

Proof Number Based Monte-Carlo Tree Search

URL: http://arxiv.org/abs/2303.09449v4
Date: Wed, 29 May 2024 06:49:31 GMT
Title: Proof Number Based Monte-Carlo Tree Search
Authors: Jakub Kowalski, Elliot Doe, Mark H. M. Winands, Daniel Górski, Dennis J. N. J. Soemers,
Abstract summary: This paper proposes a new game-search algorithm, PN-MCTS, which combines Monte-Carlo Tree Search (MCTS) and Proof-Number Search (PNS) We define three areas where the additional knowledge provided by the proof and disproof numbers gathered in MCTS trees might be used. Experiments show that PN-MCTS is able to outperform MCTS in all tested game domains, achieving win rates up to 96.2% for Lines of Action.
Score: 1.93674821880689
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper proposes a new game-search algorithm, PN-MCTS, which combines Monte-Carlo Tree Search (MCTS) and Proof-Number Search (PNS). These two algorithms have been successfully applied for decision making in a range of domains. We define three areas where the additional knowledge provided by the proof and disproof numbers gathered in MCTS trees might be used: final move selection, solving subtrees, and the UCB1 selection mechanism. We test all possible combinations on different time settings, playing against vanilla UCT on several games: Lines of Action ($7$$\times$$7$ and $8$$\times$$8$ board sizes), MiniShogi, Knightthrough, and Awari. Furthermore, we extend this new algorithm to properly address games with draws, like Awari, by adding an additional layer of PNS on top of the MCTS tree. The experiments show that PN-MCTS is able to outperform MCTS in all tested game domains, achieving win rates up to 96.2% for Lines of Action.

Related papers

Generalized Proof-Number Monte-Carlo Tree Search [2.052209054561478]
Generalized Proof-Number Monte-Carlo Tree Search: a generalization of recently proposed combinations of Proof-Number Search (PNS) with Monte-Carlo Tree Search (MCTS)<n>We propose three core modifications of prior combinations of PNS with MCTS.<n>This reduces code complexity in the sense that we no longer need disproof numbers, and generalizes the technique to be applicable to games with more than two players.
arXiv Detail & Related papers (2025-06-16T08:45:36Z)
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search [74.46681227410038]
We propose Collective Monte Carlo Tree Search (CoMCTS) for effective and efficient reasoning-path searching and learning. We construct Mulberry-260k, a multimodal dataset with a tree of rich, explicit and well-defined reasoning nodes for each question. We perform collective SFT to train our model, Mulberry, a series of MLLMs with o1-like step-by-step Reasoning and Reflection capabilities.
arXiv Detail & Related papers (2024-12-24T10:07:51Z)
LiteSearch: Efficacious Tree Search for LLM [70.29796112457662]
This study introduces a novel guided tree search algorithm with dynamic node selection and node-level exploration budget. Experiments conducted on the GSM8K and TabMWP datasets demonstrate that our approach enjoys significantly lower computational costs compared to baseline methods.
arXiv Detail & Related papers (2024-06-29T05:14:04Z)
Can Large Language Models Play Games? A Case Study of A Self-Play Approach [61.15761840203145]
Large Language Models (LLMs) harness extensive data from the Internet, storing a broad spectrum of prior knowledge. Monte-Carlo Tree Search (MCTS) is a search algorithm that provides reliable decision-making solutions. This work introduces an innovative approach that bolsters LLMs with MCTS self-play to efficiently resolve turn-based zero-sum games.
arXiv Detail & Related papers (2024-03-08T19:16:29Z)
Monte-Carlo Tree Search for Multi-Agent Pathfinding: Preliminary Results [60.4817465598352]
We introduce an original variant of Monte-Carlo Tree Search (MCTS) tailored to multi-agent pathfinding. Specifically, we use individual paths to assist the agents with the the goal-reaching behavior. We also use a dedicated decomposition technique to reduce the branching factor of the tree search procedure.
arXiv Detail & Related papers (2023-07-25T12:33:53Z)
Combining Monte-Carlo Tree Search with Proof-Number Search [5.354801701968199]
Proof-Number Search (PNS) and Monte-Carlo Tree Search (MCTS) have been successfully applied for decision making in a range of games. This paper proposes a new approach called PN-MCTS that combines these two tree-search methods by incorporating the concept of proof and disproof numbers into the UCT formula of MCTS. Experimental results demonstrate that PN-MCTS outperforms basic MCTS in several games including Lines of Action, MiniShogi, Knightthrough, and Awari, achieving win rates up to 94.0%.
arXiv Detail & Related papers (2022-06-08T15:28:42Z)
On the Evolution of the MCTS Upper Confidence Bounds for Trees by Means of Evolutionary Algorithms in the Game of Carcassonne [0.0]
Monte Carlo Tree Search (MCTS) is a sampling best-first method to search for optimal decisions. We use Evolutionary Algorithms (EAs) to evolve mathematical expressions with the goal to substitute the Upper Confidence Bounds for Trees (UCT) mathematical expression. We show how the ES-MCTS controller, is able to outperform all these 10 intelligent controllers, including robust UCT controllers.
arXiv Detail & Related papers (2021-12-17T18:06:21Z)
MCTS Based Agents for Multistage Single-Player Card Game [0.0]
The article presents the use of Monte Carlo Tree Search algorithms for the card game Lord of the Rings. The main challenge was the complexity of the game mechanics, in which each round consists of 5 decision stages and 2 random stages. To test various decision-making algorithms, a game simulator has been implemented.
arXiv Detail & Related papers (2021-09-24T10:56:54Z)
Dual Monte Carlo Tree Search [0.0]
We show that Dual MCTS performs better than one of the most widely used neural MCTS algorithms, AlphaZero, for various symmetric and asymmetric games. Dual MCTS uses two different search trees, a single deep neural network, and a new update technique for the search trees using a combination of the PUCB, a sliding-window, and the epsilon-greedy algorithm.
arXiv Detail & Related papers (2021-03-21T23:34:11Z)
Almost Optimal Algorithms for Two-player Markov Games with Linear Function Approximation [92.99933928528797]
We study reinforcement learning for two-player zero-sum Markov games with simultaneous moves. We propose an algorithm Nash-UCRL-VTR based on the principle "Optimism-in-Face-of-Uncertainty" We show that Nash-UCRL-VTR can provably achieve an $tildeO(dHsqrtT)$ regret, where $d$ is the linear function dimension.
arXiv Detail & Related papers (2021-02-15T09:09:16Z)
Learning to Stop: Dynamic Simulation Monte-Carlo Tree Search [66.34387649910046]
Monte Carlo tree search (MCTS) has achieved state-of-the-art results in many domains such as Go and Atari games. We propose to achieve this goal by predicting the uncertainty of the current searching status and use the result to decide whether we should stop searching.
arXiv Detail & Related papers (2020-12-14T19:49:25Z)
On Effective Parallelization of Monte Carlo Tree Search [51.15940034629022]
Monte Carlo Tree Search (MCTS) is computationally expensive as it requires a substantial number of rollouts to construct the search tree. How to design effective parallel MCTS algorithms has not been systematically studied and remains poorly understood. We demonstrate how proposed necessary conditions can be adopted to design more effective parallel MCTS algorithms.
arXiv Detail & Related papers (2020-06-15T21:36:00Z)
Single-Agent Optimization Through Policy Iteration Using Monte-Carlo Tree Search [8.22379888383833]
Combination of Monte-Carlo Tree Search (MCTS) and deep reinforcement learning is state-of-the-art in two-player perfect-information games. We describe a search algorithm that uses a variant of MCTS which we enhanced by 1) a novel action value normalization mechanism for games with potentially unbounded rewards, 2) defining a virtual loss function that enables effective search parallelization, and 3) a policy network, trained by generations of self-play, to guide the search.
arXiv Detail & Related papers (2020-05-22T18:02:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.