Approximate exploitability: Learning a best response in large games
- URL: http://arxiv.org/abs/2004.09677v5
- Date: Thu, 3 Nov 2022 21:16:06 GMT
- Title: Approximate exploitability: Learning a best response in large games
- Authors: Finbarr Timbers, Nolan Bard, Edward Lockhart, Marc Lanctot, Martin
Schmid, Neil Burch, Julian Schrittwieser, Thomas Hubert, Michael Bowling
- Abstract summary: We introduce ISMCTS-BR, a scalable search-based deep reinforcement learning algorithm for learning a best response to an agent.
We demonstrate the technique in several two-player zero-sum games against a variety of agents.
- Score: 31.066412349285994
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Researchers have demonstrated that neural networks are vulnerable to
adversarial examples and subtle environment changes, both of which one can view
as a form of distribution shift. To humans, the resulting errors can look like
blunders, eroding trust in these agents. In prior games research, agent
evaluation often focused on the in-practice game outcomes. While valuable, such
evaluation typically fails to evaluate robustness to worst-case outcomes. Prior
research in computer poker has examined how to assess such worst-case
performance, both exactly and approximately. Unfortunately, exact computation
is infeasible with larger domains, and existing approximations rely on
poker-specific knowledge. We introduce ISMCTS-BR, a scalable search-based deep
reinforcement learning algorithm for learning a best response to an agent,
thereby approximating worst-case performance. We demonstrate the technique in
several two-player zero-sum games against a variety of agents, including
several AlphaZero-based agents.
Related papers
- Toward Optimal LLM Alignments Using Two-Player Games [86.39338084862324]
In this paper, we investigate alignment through the lens of two-agent games, involving iterative interactions between an adversarial and a defensive agent.
We theoretically demonstrate that this iterative reinforcement learning optimization converges to a Nash Equilibrium for the game induced by the agents.
Experimental results in safety scenarios demonstrate that learning in such a competitive environment not only fully trains agents but also leads to policies with enhanced generalization capabilities for both adversarial and defensive agents.
arXiv Detail & Related papers (2024-06-16T15:24:50Z) - Impact of Decentralized Learning on Player Utilities in Stackelberg Games [57.08270857260131]
In many two-agent systems, each agent learns separately and the rewards of the two agents are not perfectly aligned.
We model these systems as Stackelberg games with decentralized learning and show that standard regret benchmarks result in worst-case linear regret for at least one player.
We develop algorithms to achieve near-optimal $O(T2/3)$ regret for both players with respect to these benchmarks.
arXiv Detail & Related papers (2024-02-29T23:38:28Z) - Malicious Agent Detection for Robust Multi-Agent Collaborative Perception [52.261231738242266]
Multi-agent collaborative (MAC) perception is more vulnerable to adversarial attacks than single-agent perception.
We propose Malicious Agent Detection (MADE), a reactive defense specific to MAC perception.
We conduct comprehensive evaluations on a benchmark 3D dataset V2X-sim and a real-road dataset DAIR-V2X.
arXiv Detail & Related papers (2023-10-18T11:36:42Z) - Scaling Laws for Imitation Learning in Single-Agent Games [29.941613597833133]
We investigate whether carefully scaling up model and data size can bring similar improvements in the imitation learning setting for single-agent games.
We first demonstrate our findings on a variety of Atari games, and thereafter focus on the extremely challenging game of NetHack.
We find that IL loss and mean return scale smoothly with the compute budget and are strongly correlated, resulting in power laws for training compute-optimal IL agents.
arXiv Detail & Related papers (2023-07-18T16:43:03Z) - Adversarial Training Should Be Cast as a Non-Zero-Sum Game [121.95628660889628]
Two-player zero-sum paradigm of adversarial training has not engendered sufficient levels of robustness.
We show that the commonly used surrogate-based relaxation used in adversarial training algorithms voids all guarantees on robustness.
A novel non-zero-sum bilevel formulation of adversarial training yields a framework that matches and in some cases outperforms state-of-the-art attacks.
arXiv Detail & Related papers (2023-06-19T16:00:48Z) - Population-based Evaluation in Repeated Rock-Paper-Scissors as a
Benchmark for Multiagent Reinforcement Learning [14.37986882249142]
We propose a benchmark for multiagent learning based on repeated play of the simple game Rock, Paper, Scissors.
We describe metrics to measure the quality of agents based both on average returns and exploitability.
arXiv Detail & Related papers (2023-03-02T15:06:52Z) - An Empirical Study on the Generalization Power of Neural Representations
Learned via Visual Guessing Games [79.23847247132345]
This work investigates how well an artificial agent can benefit from playing guessing games when later asked to perform on novel NLP downstream tasks such as Visual Question Answering (VQA)
We propose two ways to exploit playing guessing games: 1) a supervised learning scenario in which the agent learns to mimic successful guessing games and 2) a novel way for an agent to play by itself, called Self-play via Iterated Experience Learning (SPIEL)
arXiv Detail & Related papers (2021-01-31T10:30:48Z) - Hindsight and Sequential Rationality of Correlated Play [18.176128899338433]
We look at algorithms that ensure strong performance in hindsight relative to what could have been achieved with modified behavior.
We develop and advocate for this hindsight framing of learning in general sequential decision-making settings.
We present examples illustrating the distinct strengths and weaknesses of each type of equilibrium in the literature.
arXiv Detail & Related papers (2020-12-10T18:30:21Z) - Robust Deep Reinforcement Learning through Adversarial Loss [74.20501663956604]
Recent studies have shown that deep reinforcement learning agents are vulnerable to small adversarial perturbations on the agent's inputs.
We propose RADIAL-RL, a principled framework to train reinforcement learning agents with improved robustness against adversarial attacks.
arXiv Detail & Related papers (2020-08-05T07:49:42Z) - Contextual Search in the Presence of Adversarial Corruptions [33.28268414842846]
We study contextual search, a generalization of binary search in higher dimensions.
We show that these algorithms attain near-optimal regret in the absence of adversarial corruptions.
Our techniques draw inspiration from learning theory, game theory, high-dimensional geometry, and convex analysis.
arXiv Detail & Related papers (2020-02-26T17:25:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.