No-Regret Learning in Time-Varying Zero-Sum Games
- URL: http://arxiv.org/abs/2201.12736v1
- Date: Sun, 30 Jan 2022 06:10:04 GMT
- Title: No-Regret Learning in Time-Varying Zero-Sum Games
- Authors: Mengxiao Zhang, Peng Zhao, Haipeng Luo, Zhi-Hua Zhou
- Abstract summary: Learning from repeated play in a fixed zero-sum game is a classic problem in game theory and online learning.
We develop a single parameter-free algorithm that simultaneously enjoys favorable guarantees under three performance measures.
Our algorithm is based on a two-layer structure with a meta-algorithm learning over a group of black-box base-learners satisfying a certain property.
- Score: 99.86860277006318
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning from repeated play in a fixed two-player zero-sum game is a classic
problem in game theory and online learning. We consider a variant of this
problem where the game payoff matrix changes over time, possibly in an
adversarial manner. We first present three performance measures to guide the
algorithmic design for this problem: 1) the well-studied individual regret, 2)
an extension of duality gap, and 3) a new measure called dynamic Nash
Equilibrium regret, which quantifies the cumulative difference between the
player's payoff and the minimax game value. Next, we develop a single
parameter-free algorithm that simultaneously enjoys favorable guarantees under
all these three performance measures. These guarantees are adaptive to
different non-stationarity measures of the payoff matrices and, importantly,
recover the best known results when the payoff matrix is fixed. Our algorithm
is based on a two-layer structure with a meta-algorithm learning over a group
of black-box base-learners satisfying a certain property, along with several
novel ingredients specifically designed for the time-varying game setting.
Empirical results further validate the effectiveness of our algorithm.
Related papers
- Distributed Online Bandit Nonconvex Optimization with One-Point Residual Feedback via Dynamic Regret [10.700891331004799]
This paper considers the distributed online bandit optimization problem with non loss functions over a time-varying digraph.
Players select an adversary and then the adversary assigns an arbitrary non-linear loss function to this player.
The expected regret of our algorithms is comparable to existing algorithms that use two-point deviation.
arXiv Detail & Related papers (2024-09-24T02:37:33Z) - Responsible AI (RAI) Games and Ensembles [30.110052769733247]
We provide a general framework for studying problems, which we refer to as Responsible AI (RAI) games.
We provide two classes of algorithms for solving these games: (a) game-play based algorithms, and (b) greedy stagewise estimation algorithms.
We empirically demonstrate the applicability and competitive performance of our techniques for solving several RAI problems, particularly around subpopulation shift.
arXiv Detail & Related papers (2023-10-28T22:17:30Z) - Online Learning and Solving Infinite Games with an ERM Oracle [20.1330044382824]
We propose an algorithm for online binary classification setting that relies solely on ERM oracle calls.
We show that it has finite regret in the realizable setting and sublinearly growing regret in the agnostic setting.
Our algorithms apply to both binary-valued and real-valued games and can be viewed as providing justification for the wide use of double oracle and multiple oracle algorithms in the practice of solving large games.
arXiv Detail & Related papers (2023-07-04T12:51:21Z) - Hardness of Independent Learning and Sparse Equilibrium Computation in
Markov Games [70.19141208203227]
We consider the problem of decentralized multi-agent reinforcement learning in Markov games.
We show that no algorithm attains no-regret in general-sum games when executed independently by all players.
We show that our lower bounds hold even for seemingly easier setting in which all agents are controlled by a centralized algorithm.
arXiv Detail & Related papers (2023-03-22T03:28:12Z) - On the Convergence of No-Regret Learning Dynamics in Time-Varying Games [89.96815099996132]
We characterize the convergence of optimistic gradient descent (OGD) in time-varying games.
Our framework yields sharp convergence bounds for the equilibrium gap of OGD in zero-sum games.
We also provide new insights on dynamic regret guarantees in static games.
arXiv Detail & Related papers (2023-01-26T17:25:45Z) - A Unified Approach to Reinforcement Learning, Quantal Response
Equilibria, and Two-Player Zero-Sum Games [104.3339905200105]
This work studies an algorithm, which we call magnetic mirror descent, that is inspired by mirror descent and the non-Euclidean proximal gradient algorithm.
Our contribution is demonstrating the virtues of magnetic mirror descent as both an equilibrium solver and as an approach to reinforcement learning in two-player zero-sum games.
arXiv Detail & Related papers (2022-06-12T19:49:14Z) - Learning to Play Sequential Games versus Unknown Opponents [93.8672371143881]
We consider a repeated sequential game between a learner, who plays first, and an opponent who responds to the chosen action.
We propose a novel algorithm for the learner when playing against an adversarial sequence of opponents.
Our results include algorithm's regret guarantees that depend on the regularity of the opponent's response.
arXiv Detail & Related papers (2020-07-10T09:33:05Z) - Learning Zero-Sum Simultaneous-Move Markov Games Using Function
Approximation and Correlated Equilibrium [116.56359444619441]
We develop provably efficient reinforcement learning algorithms for two-player zero-sum finite-horizon Markov games.
In the offline setting, we control both players and aim to find the Nash Equilibrium by minimizing the duality gap.
In the online setting, we control a single player playing against an arbitrary opponent and aim to minimize the regret.
arXiv Detail & Related papers (2020-02-17T17:04:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.