Playing Large Games with Oracles and AI Debate
- URL: http://arxiv.org/abs/2312.04792v4
- Date: Wed, 10 Jul 2024 18:11:40 GMT
- Title: Playing Large Games with Oracles and AI Debate
- Authors: Xinyi Chen, Angelica Chen, Dean Foster, Elad Hazan,
- Abstract summary: Existing algorithms for online game playing require per-iteration in the number of actions, which can be prohibitive for large games.
We give a novel efficient algorithm for simultaneous external and internal regret minimization whose regret depends logarithmically on the number of actions.
- Score: 27.355621483737913
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider regret minimization in repeated games with a very large number of actions. Such games are inherent in the setting of AI Safety via Debate \cite{irving2018ai}, and more generally games whose actions are language-based. Existing algorithms for online game playing require per-iteration computation polynomial in the number of actions, which can be prohibitive for large games. We thus consider oracle-based algorithms, as oracles naturally model access to AI agents. With oracle access, we characterize when internal and external regret can be minimized efficiently. We give a novel efficient algorithm for simultaneous external and internal regret minimization whose regret depends logarithmically on the number of actions. We conclude with experiments in the setting of AI Safety via Debate that shows the benefit of insights from our algorithmic analysis.
Related papers
- DanZero+: Dominating the GuanDan Game through Reinforcement Learning [95.90682269990705]
We develop an AI program for an exceptionally complex and popular card game called GuanDan.
We first put forward an AI program named DanZero for this game.
In order to further enhance the AI's capabilities, we apply policy-based reinforcement learning algorithm to GuanDan.
arXiv Detail & Related papers (2023-12-05T08:07:32Z) - Online Learning and Solving Infinite Games with an ERM Oracle [20.1330044382824]
We propose an algorithm for online binary classification setting that relies solely on ERM oracle calls.
We show that it has finite regret in the realizable setting and sublinearly growing regret in the agnostic setting.
Our algorithms apply to both binary-valued and real-valued games and can be viewed as providing justification for the wide use of double oracle and multiple oracle algorithms in the practice of solving large games.
arXiv Detail & Related papers (2023-07-04T12:51:21Z) - Hardness of Independent Learning and Sparse Equilibrium Computation in
Markov Games [70.19141208203227]
We consider the problem of decentralized multi-agent reinforcement learning in Markov games.
We show that no algorithm attains no-regret in general-sum games when executed independently by all players.
We show that our lower bounds hold even for seemingly easier setting in which all agents are controlled by a centralized algorithm.
arXiv Detail & Related papers (2023-03-22T03:28:12Z) - On the Convergence of No-Regret Learning Dynamics in Time-Varying Games [89.96815099996132]
We characterize the convergence of optimistic gradient descent (OGD) in time-varying games.
Our framework yields sharp convergence bounds for the equilibrium gap of OGD in zero-sum games.
We also provide new insights on dynamic regret guarantees in static games.
arXiv Detail & Related papers (2023-01-26T17:25:45Z) - Are AlphaZero-like Agents Robust to Adversarial Perturbations? [73.13944217915089]
AlphaZero (AZ) has demonstrated that neural-network-based Go AIs can surpass human performance by a large margin.
We ask whether adversarial states exist for Go AIs that may lead them to play surprisingly wrong actions.
We develop the first adversarial attack on Go AIs that can efficiently search for adversarial states by strategically reducing the search space.
arXiv Detail & Related papers (2022-11-07T18:43:25Z) - Public Information Representation for Adversarial Team Games [31.29335755664997]
adversarial team games reside in the asymmetric information available to the team members during the play.
Our algorithms convert a sequential team game with adversaries to a classical two-player zero-sum game.
Due to the NP-hard nature of the problem, the resulting Public Team game may be exponentially larger than the original one.
arXiv Detail & Related papers (2022-01-25T15:07:12Z) - Revisiting Game Representations: The Hidden Costs of Efficiency in
Sequential Decision-making Algorithms [0.6749750044497732]
Recent advancements in algorithms for sequential decision-making under imperfect information have shown remarkable success in large games.
These algorithms traditionally formalize the games using the extensive-form game formalism.
We show that a popular workaround involves using a specialized representation based on player specific information-state trees.
arXiv Detail & Related papers (2021-12-20T22:34:19Z) - Model-Free Online Learning in Unknown Sequential Decision Making
Problems and Games [114.90723492840499]
In large two-player zero-sum imperfect-information games, modern extensions of counterfactual regret minimization (CFR) are currently the practical state of the art for computing a Nash equilibrium.
We formalize an online learning setting in which the strategy space is not known to the agent.
We give an efficient algorithm that achieves $O(T3/4)$ regret with high probability for that setting, even when the agent faces an adversarial environment.
arXiv Detail & Related papers (2021-03-08T04:03:24Z) - Complexity and Algorithms for Exploiting Quantal Opponents in Large
Two-Player Games [16.43565579998679]
Solution concepts of traditional game theory assume entirely rational players; therefore, their ability to exploit subrational opponents is limited.
This paper aims to analyze and propose scalable algorithms for computing effective and robust strategies against a quantal opponent in normal-form and extensive-form games.
arXiv Detail & Related papers (2020-09-30T09:14:56Z) - Learning to Play Sequential Games versus Unknown Opponents [93.8672371143881]
We consider a repeated sequential game between a learner, who plays first, and an opponent who responds to the chosen action.
We propose a novel algorithm for the learner when playing against an adversarial sequence of opponents.
Our results include algorithm's regret guarantees that depend on the regularity of the opponent's response.
arXiv Detail & Related papers (2020-07-10T09:33:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.