Generalised agent for solving higher board states of tic tac toe using
Reinforcement Learning
- URL: http://arxiv.org/abs/2212.12252v1
- Date: Fri, 23 Dec 2022 10:58:27 GMT
- Title: Generalised agent for solving higher board states of tic tac toe using
Reinforcement Learning
- Authors: Bhavuk Kalra
- Abstract summary: This study is aimed at providing a generalized algorithm for higher board states of tic tac toe to make precise moves in a short period.
The idea is to pose the tic tac toe game as a well-posed learning problem.
The study and its results are promising, giving a high win to draw ratio with each epoch of training.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Tic Tac Toe is amongst the most well-known games. It has already been shown
that it is a biased game, giving more chances to win for the first player
leaving only a draw or a loss as possibilities for the opponent, assuming both
the players play optimally. Thus on average majority of the games played result
in a draw. The majority of the latest research on how to solve a tic tac toe
board state employs strategies such as Genetic Algorithms, Neural Networks,
Co-Evolution, and Evolutionary Programming. But these approaches deal with a
trivial board state of 3X3 and very little research has been done for a
generalized algorithm to solve 4X4,5X5,6X6 and many higher states. Even though
an algorithm exists which is Min-Max but it takes a lot of time in coming up
with an ideal move due to its recursive nature of implementation. A Sample has
been created on this link \url{https://bk-tic-tac-toe.herokuapp.com/} to prove
this fact. This is the main problem that this study is aimed at solving i.e
providing a generalized algorithm(Approximate method, Learning-Based) for
higher board states of tic tac toe to make precise moves in a short period.
Also, the code changes needed to accommodate higher board states will be
nominal. The idea is to pose the tic tac toe game as a well-posed learning
problem. The study and its results are promising, giving a high win to draw
ratio with each epoch of training. This study could also be encouraging for
other researchers to apply the same algorithm to other similar board games like
Minesweeper, Chess, and GO for finding efficient strategies and comparing the
results.
Related papers
- Game Solving with Online Fine-Tuning [17.614045403579244]
This paper investigates applying online fine-tuning while searching and proposes two methods to learn tailor-designed computations for game solving.
Our experiments show that using online fine-tuning can solve a series of challenging 7x7 Killall-Go problems, using only 23.54% of time compared to the baseline.
arXiv Detail & Related papers (2023-11-13T09:09:52Z) - Learning to Play Stochastic Two-player Perfect-Information Games without
Knowledge [5.071342645033634]
We extend the Descent framework, which enables learning and planning in the context of two-player games with perfect information.
We evaluate them on the game Ein wurfelt! against state-of-the-art algorithms.
It is our generalization of Descent which obtains the best results.
arXiv Detail & Related papers (2023-02-08T20:27:45Z) - No-Regret Learning in Time-Varying Zero-Sum Games [99.86860277006318]
Learning from repeated play in a fixed zero-sum game is a classic problem in game theory and online learning.
We develop a single parameter-free algorithm that simultaneously enjoys favorable guarantees under three performance measures.
Our algorithm is based on a two-layer structure with a meta-algorithm learning over a group of black-box base-learners satisfying a certain property.
arXiv Detail & Related papers (2022-01-30T06:10:04Z) - Doubly Optimal No-Regret Online Learning in Strongly Monotone Games with Bandit Feedback [29.553652241608997]
We study the class of textitsmooth and strongly monotone games and study optimal no-regret learning therein.
We first construct a new bandit learning algorithm and show that it achieves the single-agent optimal regret of $tildeTheta(nsqrtT)$.
Our results thus settle this open problem and contribute to the broad landscape of bandit game-theoretical learning.
arXiv Detail & Related papers (2021-12-06T08:27:54Z) - Generating Diverse and Competitive Play-Styles for Strategy Games [58.896302717975445]
We propose Portfolio Monte Carlo Tree Search with Progressive Unpruning for playing a turn-based strategy game (Tribes)
We show how it can be parameterized so a quality-diversity algorithm (MAP-Elites) is used to achieve different play-styles while keeping a competitive level of play.
Our results show that this algorithm is capable of achieving these goals even for an extensive collection of game levels beyond those used for training.
arXiv Detail & Related papers (2021-04-17T20:33:24Z) - Mastering Terra Mystica: Applying Self-Play to Multi-agent Cooperative
Board Games [0.0]
In this paper, we explore and compare multiple algorithms for solving the complex strategy game of Terra Mystica.
We apply these breakthroughs to a novel state-representation of TM with the goal of creating an AI that will rival human players.
In the end, we discuss the success and shortcomings of this method by comparing against multiple baselines and typical human scores.
arXiv Detail & Related papers (2021-02-21T07:53:34Z) - Faster Algorithms for Optimal Ex-Ante Coordinated Collusive Strategies
in Extensive-Form Zero-Sum Games [123.76716667704625]
We focus on the problem of finding an optimal strategy for a team of two players that faces an opponent in an imperfect-information zero-sum extensive-form game.
In that setting, it is known that the best the team can do is sample a profile of potentially randomized strategies (one per player) from a joint (a.k.a. correlated) probability distribution at the beginning of the game.
We provide an algorithm that computes such an optimal distribution by only using profiles where only one of the team members gets to randomize in each profile.
arXiv Detail & Related papers (2020-09-21T17:51:57Z) - Learning to Play Sequential Games versus Unknown Opponents [93.8672371143881]
We consider a repeated sequential game between a learner, who plays first, and an opponent who responds to the chosen action.
We propose a novel algorithm for the learner when playing against an adversarial sequence of opponents.
Our results include algorithm's regret guarantees that depend on the regularity of the opponent's response.
arXiv Detail & Related papers (2020-07-10T09:33:05Z) - Near-Optimal Reinforcement Learning with Self-Play [50.29853537456737]
We focus on self-play algorithms which learn the optimal policy by playing against itself without any direct supervision.
We propose an optimistic variant of the emphNash Q-learning algorithm with sample complexity $tildemathcalO(SAB)$, and a new emphNash V-learning algorithm with sample complexity $tildemathcalO(S(A+B))$.
arXiv Detail & Related papers (2020-06-22T05:00:13Z) - Efficient exploration of zero-sum stochastic games [83.28949556413717]
We investigate the increasingly important and common game-solving setting where we do not have an explicit description of the game but only oracle access to it through gameplay.
During a limited-duration learning phase, the algorithm can control the actions of both players in order to try to learn the game and how to play it well.
Our motivation is to quickly learn strategies that have low exploitability in situations where evaluating the payoffs of a queried strategy profile is costly.
arXiv Detail & Related papers (2020-02-24T20:30:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.