Generalized Proof-Number Monte-Carlo Tree Search
- URL: http://arxiv.org/abs/2506.13249v1
- Date: Mon, 16 Jun 2025 08:45:36 GMT
- Title: Generalized Proof-Number Monte-Carlo Tree Search
- Authors: Jakub Kowalski, Dennis J. N. J. Soemers, Szymon Kosakowski, Mark H. M. Winands,
- Abstract summary: Generalized Proof-Number Monte-Carlo Tree Search: a generalization of recently proposed combinations of Proof-Number Search (PNS) with Monte-Carlo Tree Search (MCTS)<n>We propose three core modifications of prior combinations of PNS with MCTS.<n>This reduces code complexity in the sense that we no longer need disproof numbers, and generalizes the technique to be applicable to games with more than two players.
- Score: 2.052209054561478
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents Generalized Proof-Number Monte-Carlo Tree Search: a generalization of recently proposed combinations of Proof-Number Search (PNS) with Monte-Carlo Tree Search (MCTS), which use (dis)proof numbers to bias UCB1-based Selection strategies towards parts of the search that are expected to be easily (dis)proven. We propose three core modifications of prior combinations of PNS with MCTS. First, we track proof numbers per player. This reduces code complexity in the sense that we no longer need disproof numbers, and generalizes the technique to be applicable to games with more than two players. Second, we propose and extensively evaluate different methods of using proof numbers to bias the selection strategy, achieving strong performance with strategies that are simpler to implement and compute. Third, we merge our technique with Score Bounded MCTS, enabling the algorithm to prove and leverage upper and lower bounds on scores - as opposed to only proving wins or not-wins. Experiments demonstrate substantial performance increases, reaching the range of 80% for 8 out of the 11 tested board games.
Related papers
- Anytime Sequential Halving in Monte-Carlo Tree Search [1.3820916757781068]
This paper proposes an anytime version of the algorithm, which can be halted at any arbitrary time and still return a satisfactory result.
Empirical results in synthetic MAB problems and ten different board games demonstrate that the algorithm's performance is competitive with Sequential Halving and UCB1.
arXiv Detail & Related papers (2024-11-11T17:49:47Z) - Combinatorial Stochastic-Greedy Bandit [79.1700188160944]
We propose a novelgreedy bandit (SGB) algorithm for multi-armed bandit problems when no extra information other than the joint reward of the selected set of $n$ arms at each time $tin [T]$ is observed.
SGB adopts an optimized-explore-then-commit approach and is specifically designed for scenarios with a large set of base arms.
arXiv Detail & Related papers (2023-12-13T11:08:25Z) - Robust Collaborative Filtering to Popularity Distribution Shift [56.78171423428719]
We present a simple yet effective debiasing strategy, PopGo, which quantifies and reduces the interaction-wise popularity shortcut without assumptions on the test data.
On both ID and OOD test sets, PopGo achieves significant gains over the state-of-the-art debiasing strategies.
arXiv Detail & Related papers (2023-10-16T04:20:52Z) - Monte-Carlo Tree Search for Multi-Agent Pathfinding: Preliminary Results [60.4817465598352]
We introduce an original variant of Monte-Carlo Tree Search (MCTS) tailored to multi-agent pathfinding.
Specifically, we use individual paths to assist the agents with the the goal-reaching behavior.
We also use a dedicated decomposition technique to reduce the branching factor of the tree search procedure.
arXiv Detail & Related papers (2023-07-25T12:33:53Z) - Proof Number Based Monte-Carlo Tree Search [1.93674821880689]
This paper proposes a new game-search algorithm, PN-MCTS, which combines Monte-Carlo Tree Search (MCTS) and Proof-Number Search (PNS)
We define three areas where the additional knowledge provided by the proof and disproof numbers gathered in MCTS trees might be used.
Experiments show that PN-MCTS is able to outperform MCTS in all tested game domains, achieving win rates up to 96.2% for Lines of Action.
arXiv Detail & Related papers (2023-03-16T16:27:07Z) - Combining Monte-Carlo Tree Search with Proof-Number Search [5.354801701968199]
Proof-Number Search (PNS) and Monte-Carlo Tree Search (MCTS) have been successfully applied for decision making in a range of games.
This paper proposes a new approach called PN-MCTS that combines these two tree-search methods by incorporating the concept of proof and disproof numbers into the UCT formula of MCTS.
Experimental results demonstrate that PN-MCTS outperforms basic MCTS in several games including Lines of Action, MiniShogi, Knightthrough, and Awari, achieving win rates up to 94.0%.
arXiv Detail & Related papers (2022-06-08T15:28:42Z) - MCTS Based Agents for Multistage Single-Player Card Game [0.0]
The article presents the use of Monte Carlo Tree Search algorithms for the card game Lord of the Rings.
The main challenge was the complexity of the game mechanics, in which each round consists of 5 decision stages and 2 random stages.
To test various decision-making algorithms, a game simulator has been implemented.
arXiv Detail & Related papers (2021-09-24T10:56:54Z) - Sample Complexity of Tree Search Configuration: Cutting Planes and
Beyond [98.92725321081994]
State-of-the-art solvers integrate a myriad of cutting-plane techniques to speed up the underlying tree-search algorithm.
We prove the first guarantees for learning high-performing cut-selection policies tailored to the instance distribution.
arXiv Detail & Related papers (2021-06-08T00:57:59Z) - Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit
Feedback [51.21673420940346]
Combinatorial bandits generalize multi-armed bandits, where the agent chooses sets of arms and observes a noisy reward for each arm contained in the chosen set.
We focus on the pure-exploration problem of identifying the best arm with fixed confidence, as well as a more general setting, where the structure of the answer set differs from the one of the action set.
Based on a projection-free online learning algorithm for finite polytopes, it is the first computationally efficient algorithm which is convexally optimal and has competitive empirical performance.
arXiv Detail & Related papers (2021-01-21T10:35:09Z) - Faster Algorithms for Optimal Ex-Ante Coordinated Collusive Strategies
in Extensive-Form Zero-Sum Games [123.76716667704625]
We focus on the problem of finding an optimal strategy for a team of two players that faces an opponent in an imperfect-information zero-sum extensive-form game.
In that setting, it is known that the best the team can do is sample a profile of potentially randomized strategies (one per player) from a joint (a.k.a. correlated) probability distribution at the beginning of the game.
We provide an algorithm that computes such an optimal distribution by only using profiles where only one of the team members gets to randomize in each profile.
arXiv Detail & Related papers (2020-09-21T17:51:57Z) - Single-Agent Optimization Through Policy Iteration Using Monte-Carlo
Tree Search [8.22379888383833]
Combination of Monte-Carlo Tree Search (MCTS) and deep reinforcement learning is state-of-the-art in two-player perfect-information games.
We describe a search algorithm that uses a variant of MCTS which we enhanced by 1) a novel action value normalization mechanism for games with potentially unbounded rewards, 2) defining a virtual loss function that enables effective search parallelization, and 3) a policy network, trained by generations of self-play, to guide the search.
arXiv Detail & Related papers (2020-05-22T18:02:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.