Game Solving with Online Fine-Tuning
- URL: http://arxiv.org/abs/2311.07178v1
- Date: Mon, 13 Nov 2023 09:09:52 GMT
- Title: Game Solving with Online Fine-Tuning
- Authors: Ti-Rong Wu, Hung Guei, Ting Han Wei, Chung-Chin Shih, Jui-Te Chin,
I-Chen Wu
- Abstract summary: This paper investigates applying online fine-tuning while searching and proposes two methods to learn tailor-designed computations for game solving.
Our experiments show that using online fine-tuning can solve a series of challenging 7x7 Killall-Go problems, using only 23.54% of time compared to the baseline.
- Score: 17.614045403579244
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Game solving is a similar, yet more difficult task than mastering a game.
Solving a game typically means to find the game-theoretic value (outcome given
optimal play), and optionally a full strategy to follow in order to achieve
that outcome. The AlphaZero algorithm has demonstrated super-human level play,
and its powerful policy and value predictions have also served as heuristics in
game solving. However, to solve a game and obtain a full strategy, a winning
response must be found for all possible moves by the losing player. This
includes very poor lines of play from the losing side, for which the AlphaZero
self-play process will not encounter. AlphaZero-based heuristics can be highly
inaccurate when evaluating these out-of-distribution positions, which occur
throughout the entire search. To address this issue, this paper investigates
applying online fine-tuning while searching and proposes two methods to learn
tailor-designed heuristics for game solving. Our experiments show that using
online fine-tuning can solve a series of challenging 7x7 Killall-Go problems,
using only 23.54% of computation time compared to the baseline without online
fine-tuning. Results suggest that the savings scale with problem size. Our
method can further be extended to any tree search algorithm for problem
solving. Our code is available at
https://rlg.iis.sinica.edu.tw/papers/neurips2023-online-fine-tuning-solver.
Related papers
- Semi-Strongly solved: a New Definition Leading Computer to Perfect Gameplay [0.0]
Several definitions exist for solving the game,' but they are markedly different regarding computational cost and the detail of insights derived.
We introduce a novel definition called semi-strongly solved' and propose an algorithm to achieve this type of solution efficiently.
arXiv Detail & Related papers (2024-11-01T21:00:46Z) - People use fast, goal-directed simulation to reason about novel games [75.25089384921557]
We study how people reason about a range of simple but novel connect-n style board games.
We ask people to judge how fair and how fun the games are from very little experience.
arXiv Detail & Related papers (2024-07-19T07:59:04Z) - Hardness of Independent Learning and Sparse Equilibrium Computation in
Markov Games [70.19141208203227]
We consider the problem of decentralized multi-agent reinforcement learning in Markov games.
We show that no algorithm attains no-regret in general-sum games when executed independently by all players.
We show that our lower bounds hold even for seemingly easier setting in which all agents are controlled by a centralized algorithm.
arXiv Detail & Related papers (2023-03-22T03:28:12Z) - Generalised agent for solving higher board states of tic tac toe using
Reinforcement Learning [0.0]
This study is aimed at providing a generalized algorithm for higher board states of tic tac toe to make precise moves in a short period.
The idea is to pose the tic tac toe game as a well-posed learning problem.
The study and its results are promising, giving a high win to draw ratio with each epoch of training.
arXiv Detail & Related papers (2022-12-23T10:58:27Z) - An AlphaZero-Inspired Approach to Solving Search Problems [63.24965775030674]
We adapt the methods and techniques used in AlphaZero for solving search problems.
We describe possible representations in terms of easy-instance solvers and self-reductions.
We also describe a version of Monte Carlo tree search adapted for search problems.
arXiv Detail & Related papers (2022-07-02T23:39:45Z) - No-Regret Learning in Time-Varying Zero-Sum Games [99.86860277006318]
Learning from repeated play in a fixed zero-sum game is a classic problem in game theory and online learning.
We develop a single parameter-free algorithm that simultaneously enjoys favorable guarantees under three performance measures.
Our algorithm is based on a two-layer structure with a meta-algorithm learning over a group of black-box base-learners satisfying a certain property.
arXiv Detail & Related papers (2022-01-30T06:10:04Z) - Generating Diverse and Competitive Play-Styles for Strategy Games [58.896302717975445]
We propose Portfolio Monte Carlo Tree Search with Progressive Unpruning for playing a turn-based strategy game (Tribes)
We show how it can be parameterized so a quality-diversity algorithm (MAP-Elites) is used to achieve different play-styles while keeping a competitive level of play.
Our results show that this algorithm is capable of achieving these goals even for an extensive collection of game levels beyond those used for training.
arXiv Detail & Related papers (2021-04-17T20:33:24Z) - Safe Search for Stackelberg Equilibria in Extensive-Form Games [24.557177222572786]
Stackelberg equilibrium is a solution concept in two-player games where the leader has commitment rights over the follower.
We present a theoretically sound and empirically effective way to apply search to the computation of Stackelberg equilibria in general-sum games.
arXiv Detail & Related papers (2021-02-02T22:01:19Z) - Efficient exploration of zero-sum stochastic games [83.28949556413717]
We investigate the increasingly important and common game-solving setting where we do not have an explicit description of the game but only oracle access to it through gameplay.
During a limited-duration learning phase, the algorithm can control the actions of both players in order to try to learn the game and how to play it well.
Our motivation is to quickly learn strategies that have low exploitability in situations where evaluating the payoffs of a queried strategy profile is costly.
arXiv Detail & Related papers (2020-02-24T20:30:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.