Multi-Armed Bandits for Minesweeper: Profiting from
Exploration-Exploitation Synergy
- URL: http://arxiv.org/abs/2007.12824v2
- Date: Thu, 17 Jun 2021 21:18:03 GMT
- Title: Multi-Armed Bandits for Minesweeper: Profiting from
Exploration-Exploitation Synergy
- Authors: Igor Q. Lordeiro, Diego B. Haddad, Douglas O. Cardoso
- Abstract summary: A popular computer puzzle, the game of Minesweeper requires its human players to have a mix of both luck and strategy to succeed.
We develop a novel methodology based on Reinforcement Learning to tackle the problem presented by this game.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A popular computer puzzle, the game of Minesweeper requires its human players
to have a mix of both luck and strategy to succeed. Analyzing these aspects
more formally, in our research we assessed the feasibility of a novel
methodology based on Reinforcement Learning as an adequate approach to tackle
the problem presented by this game. For this purpose we employed Multi-Armed
Bandit algorithms which were carefully adapted in order to enable their use to
define autonomous computational players, targeting to make the best use of some
game peculiarities. After experimental evaluation, results showed that this
approach was indeed successful, especially in smaller game boards, such as the
standard beginner level. Despite this fact the main contribution of this work
is a detailed examination of Minesweeper from a learning perspective, which led
to various original insights which are thoroughly discussed.
Related papers
- Multi-Player Approaches for Dueling Bandits [58.442742345319225]
We show that the direct use of a Follow Your Leader black-box approach matches the lower bound for this setting.
We also analyze a message-passing fully distributed approach with a novel Condorcet-winner recommendation protocol.
arXiv Detail & Related papers (2024-05-25T10:25:48Z) - Two-Step Reinforcement Learning for Multistage Strategy Card Game [0.0]
This study introduces a two-step reinforcement learning (RL) strategy tailored for "The Lord of the Rings: The Card Game (LOTRCG)"
This research diverges from conventional RL methods by adopting a phased learning approach.
The paper also explores a multi-agent system, where distinct RL agents are employed for various decision-making aspects of the game.
arXiv Detail & Related papers (2023-11-29T01:31:21Z) - Incentivizing Combinatorial Bandit Exploration [87.08827496301839]
Consider a bandit algorithm that recommends actions to self-interested users in a recommendation system.
Users are free to choose other actions and need to be incentivized to follow the algorithm's recommendations.
While the users prefer to exploit, the algorithm can incentivize them to explore by leveraging the information collected from the previous users.
arXiv Detail & Related papers (2022-06-01T13:46:25Z) - Optimisation of MCTS Player for The Lord of the Rings: The Card Game [0.0]
The article presents research on the use of Monte-Carlo Tree Search (MCTS) methods to create an artificial player for the popular card game "The Lord of the Rings"
arXiv Detail & Related papers (2021-09-24T14:42:32Z) - MCTS Based Agents for Multistage Single-Player Card Game [0.0]
The article presents the use of Monte Carlo Tree Search algorithms for the card game Lord of the Rings.
The main challenge was the complexity of the game mechanics, in which each round consists of 5 decision stages and 2 random stages.
To test various decision-making algorithms, a game simulator has been implemented.
arXiv Detail & Related papers (2021-09-24T10:56:54Z) - Strategically Efficient Exploration in Competitive Multi-agent
Reinforcement Learning [25.041622707261897]
This work seeks to understand the role of optimistic exploration in non-cooperative multi-agent settings.
We will show that, in zero-sum games, optimistic exploration can cause the learner to waste time sampling parts of the state space that are irrelevant to strategic play.
To address this issue, we introduce a formal notion of strategically efficient exploration in Markov games, and use this to develop two strategically efficient learning algorithms for finite Markov games.
arXiv Detail & Related papers (2021-07-30T15:22:59Z) - Generating Diverse and Competitive Play-Styles for Strategy Games [58.896302717975445]
We propose Portfolio Monte Carlo Tree Search with Progressive Unpruning for playing a turn-based strategy game (Tribes)
We show how it can be parameterized so a quality-diversity algorithm (MAP-Elites) is used to achieve different play-styles while keeping a competitive level of play.
Our results show that this algorithm is capable of achieving these goals even for an extensive collection of game levels beyond those used for training.
arXiv Detail & Related papers (2021-04-17T20:33:24Z) - Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit
Feedback [51.21673420940346]
Combinatorial bandits generalize multi-armed bandits, where the agent chooses sets of arms and observes a noisy reward for each arm contained in the chosen set.
We focus on the pure-exploration problem of identifying the best arm with fixed confidence, as well as a more general setting, where the structure of the answer set differs from the one of the action set.
Based on a projection-free online learning algorithm for finite polytopes, it is the first computationally efficient algorithm which is convexally optimal and has competitive empirical performance.
arXiv Detail & Related papers (2021-01-21T10:35:09Z) - Learning to Play Sequential Games versus Unknown Opponents [93.8672371143881]
We consider a repeated sequential game between a learner, who plays first, and an opponent who responds to the chosen action.
We propose a novel algorithm for the learner when playing against an adversarial sequence of opponents.
Our results include algorithm's regret guarantees that depend on the regularity of the opponent's response.
arXiv Detail & Related papers (2020-07-10T09:33:05Z) - Efficient exploration of zero-sum stochastic games [83.28949556413717]
We investigate the increasingly important and common game-solving setting where we do not have an explicit description of the game but only oracle access to it through gameplay.
During a limited-duration learning phase, the algorithm can control the actions of both players in order to try to learn the game and how to play it well.
Our motivation is to quickly learn strategies that have low exploitability in situations where evaluating the payoffs of a queried strategy profile is costly.
arXiv Detail & Related papers (2020-02-24T20:30:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.