Does DQN really learn? Exploring adversarial training schemes in Pong
- URL: http://arxiv.org/abs/2203.10614v1
- Date: Sun, 20 Mar 2022 18:12:55 GMT
- Title: Does DQN really learn? Exploring adversarial training schemes in Pong
- Authors: Bowen He, Sreehari Rammohan, Jessica Forde, Michael Littman
- Abstract summary: We study two self-play training schemes, Chainer and Pool, and show they lead to improved agent performance in Atari Pong.
We show that training agents with Chainer or Pool leads to richer network activations with greater predictive power to estimate critical game-state features.
- Score: 1.0323063834827415
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work, we study two self-play training schemes, Chainer and Pool, and
show they lead to improved agent performance in Atari Pong compared to a
standard DQN agent -- trained against the built-in Atari opponent. To measure
agent performance, we define a robustness metric that captures how difficult it
is to learn a strategy that beats the agent's learned policy. Through playing
past versions of themselves, Chainer and Pool are able to target weaknesses in
their policies and improve their resistance to attack. Agents trained using
these methods score well on our robustness metric and can easily defeat the
standard DQN agent. We conclude by using linear probing to illuminate what
internal structures the different agents develop to play the game. We show that
training agents with Chainer or Pool leads to richer network activations with
greater predictive power to estimate critical game-state features compared to
the standard DQN agent.
Related papers
- Toward Optimal LLM Alignments Using Two-Player Games [86.39338084862324]
In this paper, we investigate alignment through the lens of two-agent games, involving iterative interactions between an adversarial and a defensive agent.
We theoretically demonstrate that this iterative reinforcement learning optimization converges to a Nash Equilibrium for the game induced by the agents.
Experimental results in safety scenarios demonstrate that learning in such a competitive environment not only fully trains agents but also leads to policies with enhanced generalization capabilities for both adversarial and defensive agents.
arXiv Detail & Related papers (2024-06-16T15:24:50Z) - Minimax Exploiter: A Data Efficient Approach for Competitive Self-Play [12.754819077905061]
Minimax Exploiter is a game theoretic approach to exploiting Main Agents that leverages knowledge of its opponents.
We validate our approach in a diversity of settings, including simple turn based games, the arcade learning environment, and For Honor, a modern video game.
arXiv Detail & Related papers (2023-11-28T19:34:40Z) - Mastering the Game of No-Press Diplomacy via Human-Regularized
Reinforcement Learning and Planning [95.78031053296513]
No-press Diplomacy is a complex strategy game involving both cooperation and competition.
We introduce a planning algorithm we call DiL-piKL that regularizes a reward-maximizing policy toward a human imitation-learned policy.
We show that DiL-piKL can be extended into a self-play reinforcement learning algorithm we call RL-DiL-piKL.
arXiv Detail & Related papers (2022-10-11T14:47:35Z) - A Technique to Create Weaker Abstract Board Game Agents via
Reinforcement Learning [0.0]
Board games need at least one other player to play.
We created AI agents to play against us when an opponent is missing.
In this work, we describe how to create weaker AI agents that play board games.
arXiv Detail & Related papers (2022-09-01T20:13:20Z) - Reinforcement Learning In Two Player Zero Sum Simultaneous Action Games [0.0]
Two player zero sum simultaneous action games are common in video games, financial markets, war, business competition, and many other settings.
We introduce the fundamental concepts of reinforcement learning in two player zero sum simultaneous action games and discuss the unique challenges this type of game poses.
We introduce two novel agents that attempt to handle these challenges by using joint action Deep Q-Networks.
arXiv Detail & Related papers (2021-10-10T16:03:44Z) - BACKDOORL: Backdoor Attack against Competitive Reinforcement Learning [80.99426477001619]
We migrate backdoor attacks to more complex RL systems involving multiple agents.
As a proof of concept, we demonstrate that an adversary agent can trigger the backdoor of the victim agent with its own action.
The results show that when the backdoor is activated, the winning rate of the victim drops by 17% to 37% compared to when not activated.
arXiv Detail & Related papers (2021-05-02T23:47:55Z) - Robust Reinforcement Learning on State Observations with Learned Optimal
Adversary [86.0846119254031]
We study the robustness of reinforcement learning with adversarially perturbed state observations.
With a fixed agent policy, we demonstrate that an optimal adversary to perturb state observations can be found.
For DRL settings, this leads to a novel empirical adversarial attack to RL agents via a learned adversary that is much stronger than previous ones.
arXiv Detail & Related papers (2021-01-21T05:38:52Z) - Multi-Agent Collaboration via Reward Attribution Decomposition [75.36911959491228]
We propose Collaborative Q-learning (CollaQ) that achieves state-of-the-art performance in the StarCraft multi-agent challenge.
CollaQ is evaluated on various StarCraft Attribution maps and shows that it outperforms existing state-of-the-art techniques.
arXiv Detail & Related papers (2020-10-16T17:42:11Z) - Evaluating the Rainbow DQN Agent in Hanabi with Unseen Partners [4.4532936483984065]
Hanabi is a cooperative game that challenges exist-ing AI techniques due to its focus on modeling the mental states ofother players to interpret and predict their behavior.
In this paper, we showthat agents trained through self-play using the popular RainbowDQN architecture fail to cooperate well with simple rule-basedagents that were not seen during training.
arXiv Detail & Related papers (2020-04-28T04:24:44Z) - Learning from Learners: Adapting Reinforcement Learning Agents to be
Competitive in a Card Game [71.24825724518847]
We present a study on how popular reinforcement learning algorithms can be adapted to learn and to play a real-world implementation of a competitive multiplayer card game.
We propose specific training and validation routines for the learning agents, in order to evaluate how the agents learn to be competitive and explain how they adapt to each others' playing style.
arXiv Detail & Related papers (2020-04-08T14:11:05Z) - "Other-Play" for Zero-Shot Coordination [21.607428852157273]
Other-play learning algorithm enhances self-play by looking for more robust strategies.
We study the cooperative card game Hanabi and show that OP agents achieve higher scores when paired with independently trained agents.
arXiv Detail & Related papers (2020-03-06T00:39:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.