Related papers: Game Solving with Online Fine-Tuning

Game Solving with Online Fine-Tuning

URL: http://arxiv.org/abs/2311.07178v1
Date: Mon, 13 Nov 2023 09:09:52 GMT
Title: Game Solving with Online Fine-Tuning
Authors: Ti-Rong Wu, Hung Guei, Ting Han Wei, Chung-Chin Shih, Jui-Te Chin, I-Chen Wu
Abstract summary: This paper investigates applying online fine-tuning while searching and proposes two methods to learn tailor-designed computations for game solving. Our experiments show that using online fine-tuning can solve a series of challenging 7x7 Killall-Go problems, using only 23.54% of time compared to the baseline.
Score: 17.614045403579244
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Game solving is a similar, yet more difficult task than mastering a game. Solving a game typically means to find the game-theoretic value (outcome given optimal play), and optionally a full strategy to follow in order to achieve that outcome. The AlphaZero algorithm has demonstrated super-human level play, and its powerful policy and value predictions have also served as heuristics in game solving. However, to solve a game and obtain a full strategy, a winning response must be found for all possible moves by the losing player. This includes very poor lines of play from the losing side, for which the AlphaZero self-play process will not encounter. AlphaZero-based heuristics can be highly inaccurate when evaluating these out-of-distribution positions, which occur throughout the entire search. To address this issue, this paper investigates applying online fine-tuning while searching and proposes two methods to learn tailor-designed heuristics for game solving. Our experiments show that using online fine-tuning can solve a series of challenging 7x7 Killall-Go problems, using only 23.54% of computation time compared to the baseline without online fine-tuning. Results suggest that the savings scale with problem size. Our method can further be extended to any tree search algorithm for problem solving. Our code is available at https://rlg.iis.sinica.edu.tw/papers/neurips2023-online-fine-tuning-solver.

Related papers

Semi-Strongly solved: a New Definition Leading Computer to Perfect Gameplay [0.0]
Several definitions exist for solving the game,' but they are markedly different regarding computational cost and the detail of insights derived. We introduce a novel definition called semi-strongly solved' and propose an algorithm to achieve this type of solution efficiently.
arXiv Detail & Related papers (2024-11-01T21:00:46Z)
People use fast, goal-directed simulation to reason about novel games [75.25089384921557]
We study how people reason about a range of simple but novel connect-n style board games. We ask people to judge how fair and how fun the games are from very little experience.
arXiv Detail & Related papers (2024-07-19T07:59:04Z)
Hardness of Independent Learning and Sparse Equilibrium Computation in Markov Games [70.19141208203227]
We consider the problem of decentralized multi-agent reinforcement learning in Markov games. We show that no algorithm attains no-regret in general-sum games when executed independently by all players. We show that our lower bounds hold even for seemingly easier setting in which all agents are controlled by a centralized algorithm.
arXiv Detail & Related papers (2023-03-22T03:28:12Z)
Generalised agent for solving higher board states of tic tac toe using Reinforcement Learning [0.0]
This study is aimed at providing a generalized algorithm for higher board states of tic tac toe to make precise moves in a short period. The idea is to pose the tic tac toe game as a well-posed learning problem. The study and its results are promising, giving a high win to draw ratio with each epoch of training.
arXiv Detail & Related papers (2022-12-23T10:58:27Z)
An AlphaZero-Inspired Approach to Solving Search Problems [63.24965775030674]
We adapt the methods and techniques used in AlphaZero for solving search problems. We describe possible representations in terms of easy-instance solvers and self-reductions. We also describe a version of Monte Carlo tree search adapted for search problems.
arXiv Detail & Related papers (2022-07-02T23:39:45Z)
No-Regret Learning in Time-Varying Zero-Sum Games [99.86860277006318]
Learning from repeated play in a fixed zero-sum game is a classic problem in game theory and online learning. We develop a single parameter-free algorithm that simultaneously enjoys favorable guarantees under three performance measures. Our algorithm is based on a two-layer structure with a meta-algorithm learning over a group of black-box base-learners satisfying a certain property.
arXiv Detail & Related papers (2022-01-30T06:10:04Z)
Generating Diverse and Competitive Play-Styles for Strategy Games [58.896302717975445]
We propose Portfolio Monte Carlo Tree Search with Progressive Unpruning for playing a turn-based strategy game (Tribes) We show how it can be parameterized so a quality-diversity algorithm (MAP-Elites) is used to achieve different play-styles while keeping a competitive level of play. Our results show that this algorithm is capable of achieving these goals even for an extensive collection of game levels beyond those used for training.
arXiv Detail & Related papers (2021-04-17T20:33:24Z)
Safe Search for Stackelberg Equilibria in Extensive-Form Games [24.557177222572786]
Stackelberg equilibrium is a solution concept in two-player games where the leader has commitment rights over the follower. We present a theoretically sound and empirically effective way to apply search to the computation of Stackelberg equilibria in general-sum games.
arXiv Detail & Related papers (2021-02-02T22:01:19Z)
Efficient exploration of zero-sum stochastic games [83.28949556413717]
We investigate the increasingly important and common game-solving setting where we do not have an explicit description of the game but only oracle access to it through gameplay. During a limited-duration learning phase, the algorithm can control the actions of both players in order to try to learn the game and how to play it well. Our motivation is to quickly learn strategies that have low exploitability in situations where evaluating the payoffs of a queried strategy profile is costly.
arXiv Detail & Related papers (2020-02-24T20:30:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.