Robust No-Regret Learning in Min-Max Stackelberg Games
- URL: http://arxiv.org/abs/2203.14126v1
- Date: Sat, 26 Mar 2022 18:12:40 GMT
- Title: Robust No-Regret Learning in Min-Max Stackelberg Games
- Authors: Denizalp Goktas, Jiayi Zhao, Amy Greenwald
- Abstract summary: We investigate the behavior of no-regret learning in min-max games with dependent strategy sets.
We show that no-regret dynamics converge to a Stackelberg equilibrium.
We prove OMD dynamics are robust for a large class of online min-max Stackelberg games.
- Score: 1.6500749121196987
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The behavior of no-regret learning algorithms is well understood in
two-player min-max (i.e, zero-sum) games. In this paper, we investigate the
behavior of no-regret learning in min-max games with dependent strategy sets,
where the strategy of the first player constrains the behavior of the second.
Such games are best understood as sequential, i.e., min-max Stackelberg, games.
We consider two settings, one in which only the first player chooses their
actions using a no-regret algorithm while the second player best responds, and
one in which both players use no-regret algorithms. For the former case, we
show that no-regret dynamics converge to a Stackelberg equilibrium. For the
latter case, we introduce a new type of regret, which we call Lagrangian
regret, and show that if both players minimize their Lagrangian regrets, then
play converges to a Stackelberg equilibrium. We then observe that online mirror
descent (OMD) dynamics in these two settings correspond respectively to a known
nested (i.e., sequential) gradient descent-ascent (GDA) algorithm and a new
simultaneous GDA-like algorithm, thereby establishing convergence of these
algorithms to Stackelberg equilibrium. Finally, we analyze the robustness of
OMD dynamics to perturbations by investigating online min-max Stackelberg
games. We prove that OMD dynamics are robust for a large class of online
min-max games with independent strategy sets. In the dependent case, we
demonstrate the robustness of OMD dynamics experimentally by simulating them in
online Fisher markets, a canonical example of a min-max Stackelberg game with
dependent strategy sets.
Related papers
- Provably Efficient Generalized Lagrangian Policy Optimization for Safe
Multi-Agent Reinforcement Learning [105.7510838453122]
We examine online safe multi-agent reinforcement learning using constrained Markov games.
We develop an upper confidence reinforcement learning algorithm to solve this Lagrangian problem.
Our algorithm updates the minimax decision primal variables via online mirror descent and the dual variable via projected gradient step.
arXiv Detail & Related papers (2023-05-31T22:09:24Z) - No-Regret Learning in Time-Varying Zero-Sum Games [99.86860277006318]
Learning from repeated play in a fixed zero-sum game is a classic problem in game theory and online learning.
We develop a single parameter-free algorithm that simultaneously enjoys favorable guarantees under three performance measures.
Our algorithm is based on a two-layer structure with a meta-algorithm learning over a group of black-box base-learners satisfying a certain property.
arXiv Detail & Related papers (2022-01-30T06:10:04Z) - Can Reinforcement Learning Find Stackelberg-Nash Equilibria in
General-Sum Markov Games with Myopic Followers? [156.5760265539888]
We study multi-player general-sum Markov games with one of the players designated as the leader and the other players regarded as followers.
For such a game, our goal is to find a Stackelberg-Nash equilibrium (SNE), which is a policy pair $(pi*, nu*)$.
We develop sample-efficient reinforcement learning (RL) algorithms for solving for an SNE in both online and offline settings.
arXiv Detail & Related papers (2021-12-27T05:41:14Z) - Doubly Optimal No-Regret Online Learning in Strongly Monotone Games with Bandit Feedback [29.553652241608997]
We study the class of textitsmooth and strongly monotone games and study optimal no-regret learning therein.
We first construct a new bandit learning algorithm and show that it achieves the single-agent optimal regret of $tildeTheta(nsqrtT)$.
Our results thus settle this open problem and contribute to the broad landscape of bandit game-theoretical learning.
arXiv Detail & Related papers (2021-12-06T08:27:54Z) - Fast Rates for Nonparametric Online Learning: From Realizability to
Learning in Games [36.969021834291745]
We propose a proper learning algorithm which gets a near-optimal mistake bound in terms of the sequential fat-shattering dimension of the hypothesis class.
This result answers a question as to whether proper learners could achieve near-optimal mistake bounds.
For the real-valued (regression) setting, the optimal mistake bound was not even known for improper learners.
arXiv Detail & Related papers (2021-11-17T05:24:21Z) - Convex-Concave Min-Max Stackelberg Games [0.0]
We introduce two first-order methods that solve a class of convex-concave min-max Stackelberg games.
We show that our methods converge in time.
We observe that the computation of competitive equilibria in Fisher markets also comprises a min-max Stackelberg game.
arXiv Detail & Related papers (2021-10-05T06:09:45Z) - Towards General Function Approximation in Zero-Sum Markov Games [126.58493169301012]
This paper considers two-player zero-sum finite-horizon Markov games with simultaneous moves.
Provably efficient algorithms for both decoupled and coordinated settings are developed.
arXiv Detail & Related papers (2021-07-30T15:25:13Z) - Gap-Dependent Bounds for Two-Player Markov Games [122.20541282562363]
We analyze the cumulative regret when conducting Nash-learning algorithm on 2-player turn-based Markov games (2-TBSG)
This bound matches the theoretical lower bound only up to a logarithmic term.
We extend the conclusion to the discounted game setting with infinite horizon and propose a similar gap dependent logarithmic regret bound.
arXiv Detail & Related papers (2021-07-01T18:25:07Z) - Learning Zero-Sum Simultaneous-Move Markov Games Using Function
Approximation and Correlated Equilibrium [116.56359444619441]
We develop provably efficient reinforcement learning algorithms for two-player zero-sum finite-horizon Markov games.
In the offline setting, we control both players and aim to find the Nash Equilibrium by minimizing the duality gap.
In the online setting, we control a single player playing against an arbitrary opponent and aim to minimize the regret.
arXiv Detail & Related papers (2020-02-17T17:04:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.