A Novel Approach to Solving Goal-Achieving Problems for Board Games
- URL: http://arxiv.org/abs/2112.02563v1
- Date: Sun, 5 Dec 2021 13:23:10 GMT
- Title: A Novel Approach to Solving Goal-Achieving Problems for Board Games
- Authors: Chung-Chin Shih, Ti-Rong Wu, Ting Han Wei, and I-Chen Wu
- Abstract summary: This paper first proposes a novel RZ-based approach, called the RZ-Based Search (RZS) to solving L&D problems for Go.
RZS tries moves before determining whether they are null moves post-hoc.
We also propose a new training method called Faster to Life (FTL), which modifies AlphaZero to entice it to win more quickly.
- Score: 8.882671058559016
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Goal-achieving problems are puzzles that set up a specific situation with a
clear objective. An example that is well-studied is the category of
life-and-death (L&D) problems for Go, which helps players hone their skill of
identifying region safety. Many previous methods like lambda search try null
moves first, then derive so-called relevance zones (RZs), outside of which the
opponent does not need to search. This paper first proposes a novel RZ-based
approach, called the RZ-Based Search (RZS), to solving L&D problems for Go. RZS
tries moves before determining whether they are null moves post-hoc. This means
we do not need to rely on null move heuristics, resulting in a more elegant
algorithm, so that it can also be seamlessly incorporated into AlphaZero's
super-human level play in our solver. To repurpose AlphaZero for solving, we
also propose a new training method called Faster to Life (FTL), which modifies
AlphaZero to entice it to win more quickly. We use RZS and FTL to solve L&D
problems on Go, namely solving 68 among 106 problems from a professional L&D
book while a previous program solves 11 only. Finally, we discuss that the
approach is generic in the sense that RZS is applicable to solving many other
goal-achieving problems for board games.
Related papers
- Game Solving with Online Fine-Tuning [17.614045403579244]
This paper investigates applying online fine-tuning while searching and proposes two methods to learn tailor-designed computations for game solving.
Our experiments show that using online fine-tuning can solve a series of challenging 7x7 Killall-Go problems, using only 23.54% of time compared to the baseline.
arXiv Detail & Related papers (2023-11-13T09:09:52Z) - Responsible AI (RAI) Games and Ensembles [30.110052769733247]
We provide a general framework for studying problems, which we refer to as Responsible AI (RAI) games.
We provide two classes of algorithms for solving these games: (a) game-play based algorithms, and (b) greedy stagewise estimation algorithms.
We empirically demonstrate the applicability and competitive performance of our techniques for solving several RAI problems, particularly around subpopulation shift.
arXiv Detail & Related papers (2023-10-28T22:17:30Z) - Provably Efficient Generalized Lagrangian Policy Optimization for Safe
Multi-Agent Reinforcement Learning [105.7510838453122]
We examine online safe multi-agent reinforcement learning using constrained Markov games.
We develop an upper confidence reinforcement learning algorithm to solve this Lagrangian problem.
Our algorithm updates the minimax decision primal variables via online mirror descent and the dual variable via projected gradient step.
arXiv Detail & Related papers (2023-05-31T22:09:24Z) - SPRING: Studying the Paper and Reasoning to Play Games [102.5587155284795]
We propose a novel approach, SPRING, to read the game's original academic paper and use the knowledge learned to reason and play the game through a large language model (LLM)
In experiments, we study the quality of in-context "reasoning" induced by different forms of prompts under the setting of the Crafter open-world environment.
Our experiments suggest that LLMs, when prompted with consistent chain-of-thought, have great potential in completing sophisticated high-level trajectories.
arXiv Detail & Related papers (2023-05-24T18:14:35Z) - Guessing Winning Policies in LTL Synthesis by Semantic Learning [0.0]
We provide a learning-based technique for guessing a winning strategy in a parity game originating from an synthesis problem.
Not only can the guessed strategy be applied as best-effort in cases where the game's huge size prohibits rigorous approaches, but it can also increase the scalability of rigorous synthesis in several ways.
arXiv Detail & Related papers (2023-05-24T12:57:53Z) - Hardness of Independent Learning and Sparse Equilibrium Computation in
Markov Games [70.19141208203227]
We consider the problem of decentralized multi-agent reinforcement learning in Markov games.
We show that no algorithm attains no-regret in general-sum games when executed independently by all players.
We show that our lower bounds hold even for seemingly easier setting in which all agents are controlled by a centralized algorithm.
arXiv Detail & Related papers (2023-03-22T03:28:12Z) - An AlphaZero-Inspired Approach to Solving Search Problems [63.24965775030674]
We adapt the methods and techniques used in AlphaZero for solving search problems.
We describe possible representations in terms of easy-instance solvers and self-reductions.
We also describe a version of Monte Carlo tree search adapted for search problems.
arXiv Detail & Related papers (2022-07-02T23:39:45Z) - Learning in Mean Field Games: A Survey [44.93300994923148]
Mean Field Games (MFGs) rely on a mean-field approximation to allow the number of players to grow to infinity.
Recent literature on Reinforcement Learning methods to learnlibria and social optima in MFGs.
We present a general framework for classical iterative methods to solve MFGs in an exact way.
arXiv Detail & Related papers (2022-05-25T17:49:37Z) - BeBold: Exploration Beyond the Boundary of Explored Regions [66.88415950549556]
In this paper, we propose the regulated difference of inverse visitation counts as a simple but effective criterion for intrinsic reward (IR)
The criterion helps the agent explore Beyond the Boundary of explored regions and mitigates common issues in count-based methods, such as short-sightedness and detachment.
The resulting method, BeBold, solves the 12 most challenging procedurally-generated tasks in MiniGrid with just 120M environment steps, without any curriculum learning.
arXiv Detail & Related papers (2020-12-15T21:26:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.