Reinforcement Learning In Two Player Zero Sum Simultaneous Action Games
- URL: http://arxiv.org/abs/2110.04835v1
- Date: Sun, 10 Oct 2021 16:03:44 GMT
- Title: Reinforcement Learning In Two Player Zero Sum Simultaneous Action Games
- Authors: Patrick Phillips
- Abstract summary: Two player zero sum simultaneous action games are common in video games, financial markets, war, business competition, and many other settings.
We introduce the fundamental concepts of reinforcement learning in two player zero sum simultaneous action games and discuss the unique challenges this type of game poses.
We introduce two novel agents that attempt to handle these challenges by using joint action Deep Q-Networks.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Two player zero sum simultaneous action games are common in video games,
financial markets, war, business competition, and many other settings. We first
introduce the fundamental concepts of reinforcement learning in two player zero
sum simultaneous action games and discuss the unique challenges this type of
game poses. Then we introduce two novel agents that attempt to handle these
challenges by using joint action Deep Q-Networks (DQN). The first agent, called
the Best Response AgenT (BRAT), builds an explicit model of its opponent's
policy using imitation learning, and then uses this model to find the best
response to exploit the opponent's strategy. The second agent, Meta-Nash DQN,
builds an implicit model of its opponent's policy in order to produce a context
variable that is used as part of the Q-value calculation. An explicit minimax
over Q-values is used to find actions close to Nash equilibrium. We find
empirically that both agents converge to Nash equilibrium in a self-play
setting for simple matrix games, while also performing well in games with
larger state and action spaces. These novel algorithms are evaluated against
vanilla RL algorithms as well as recent state of the art multi-agent and two
agent algorithms. This work combines ideas from traditional reinforcement
learning, game theory, and meta learning.
Related papers
- Toward Optimal LLM Alignments Using Two-Player Games [86.39338084862324]
In this paper, we investigate alignment through the lens of two-agent games, involving iterative interactions between an adversarial and a defensive agent.
We theoretically demonstrate that this iterative reinforcement learning optimization converges to a Nash Equilibrium for the game induced by the agents.
Experimental results in safety scenarios demonstrate that learning in such a competitive environment not only fully trains agents but also leads to policies with enhanced generalization capabilities for both adversarial and defensive agents.
arXiv Detail & Related papers (2024-06-16T15:24:50Z) - Mastering the Game of No-Press Diplomacy via Human-Regularized
Reinforcement Learning and Planning [95.78031053296513]
No-press Diplomacy is a complex strategy game involving both cooperation and competition.
We introduce a planning algorithm we call DiL-piKL that regularizes a reward-maximizing policy toward a human imitation-learned policy.
We show that DiL-piKL can be extended into a self-play reinforcement learning algorithm we call RL-DiL-piKL.
arXiv Detail & Related papers (2022-10-11T14:47:35Z) - Reinforcement Learning Agents in Colonel Blotto [0.0]
We focus on a specific instance of agent-based models, which uses reinforcement learning (RL) to train the agent how to act in its environment.
We find that the RL agent handily beats a single opponent, and still performs quite well when the number of opponents are increased.
We also analyze the RL agent and look at what strategies it has arrived by looking at the actions that it has given the highest and lowest Q-values.
arXiv Detail & Related papers (2022-04-04T16:18:01Z) - Towards convergence to Nash equilibria in two-team zero-sum games [17.4461045395989]
Two-team zero-sum games are defined as multi-player games where players are split into two competing sets of agents.
We focus on the solution concept of Nash equilibria (NE)
We show that computing NE for this class of games is $textithard$ for the complexity class $mathrm$.
arXiv Detail & Related papers (2021-11-07T21:15:35Z) - Discovering Multi-Agent Auto-Curricula in Two-Player Zero-Sum Games [31.97631243571394]
We introduce a framework, LMAC, that automates the discovery of the update rule without explicit human design.
Surprisingly, even without human design, the discovered MARL algorithms achieve competitive or even better performance.
We show that LMAC is able to generalise from small games to large games, for example training on Kuhn Poker and outperforming PSRO.
arXiv Detail & Related papers (2021-06-04T22:30:25Z) - L2E: Learning to Exploit Your Opponent [66.66334543946672]
We propose a novel Learning to Exploit framework for implicit opponent modeling.
L2E acquires the ability to exploit opponents by a few interactions with different opponents during training.
We propose a novel opponent strategy generation algorithm that produces effective opponents for training automatically.
arXiv Detail & Related papers (2021-02-18T14:27:59Z) - An Empirical Study on the Generalization Power of Neural Representations
Learned via Visual Guessing Games [79.23847247132345]
This work investigates how well an artificial agent can benefit from playing guessing games when later asked to perform on novel NLP downstream tasks such as Visual Question Answering (VQA)
We propose two ways to exploit playing guessing games: 1) a supervised learning scenario in which the agent learns to mimic successful guessing games and 2) a novel way for an agent to play by itself, called Self-play via Iterated Experience Learning (SPIEL)
arXiv Detail & Related papers (2021-01-31T10:30:48Z) - Multi-Agent Collaboration via Reward Attribution Decomposition [75.36911959491228]
We propose Collaborative Q-learning (CollaQ) that achieves state-of-the-art performance in the StarCraft multi-agent challenge.
CollaQ is evaluated on various StarCraft Attribution maps and shows that it outperforms existing state-of-the-art techniques.
arXiv Detail & Related papers (2020-10-16T17:42:11Z) - Efficient Competitive Self-Play Policy Optimization [20.023522000925094]
We propose a new algorithmic framework for competitive self-play reinforcement learning in two-player zero-sum games.
Our method trains several agents simultaneously, and intelligently takes each other as opponent based on simple adversarial rules.
We prove theoretically that our algorithm converges to an approximate equilibrium with high probability in convex-concave games.
arXiv Detail & Related papers (2020-09-13T21:01:38Z) - Learning to Play Sequential Games versus Unknown Opponents [93.8672371143881]
We consider a repeated sequential game between a learner, who plays first, and an opponent who responds to the chosen action.
We propose a novel algorithm for the learner when playing against an adversarial sequence of opponents.
Our results include algorithm's regret guarantees that depend on the regularity of the opponent's response.
arXiv Detail & Related papers (2020-07-10T09:33:05Z) - Learning to Play No-Press Diplomacy with Best Response Policy Iteration [31.367850729299665]
We apply deep reinforcement learning methods to Diplomacy, a 7-player board game.
We show that our agents convincingly outperform the previous state-of-the-art, and game theoretic equilibrium analysis shows that the new process yields consistent improvements.
arXiv Detail & Related papers (2020-06-08T14:33:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.