Differentiable Arbitrating in Zero-sum Markov Games
- URL: http://arxiv.org/abs/2302.10058v1
- Date: Mon, 20 Feb 2023 16:05:04 GMT
- Title: Differentiable Arbitrating in Zero-sum Markov Games
- Authors: Jing Wang, Meichen Song, Feng Gao, Boyi Liu, Zhaoran Wang, Yi Wu
- Abstract summary: We study how to perturb the reward in a zero-sum Markov game with two players to induce a desirable Nash equilibrium, namely arbitrating.
The lower level requires solving the Nash equilibrium under a given reward function, which makes the overall problem challenging to optimize in an end-to-end way.
We propose a backpropagation scheme that differentiates through the Nash equilibrium, which provides the gradient feedback for the upper level.
- Score: 59.62061049680365
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We initiate the study of how to perturb the reward in a zero-sum Markov game
with two players to induce a desirable Nash equilibrium, namely arbitrating.
Such a problem admits a bi-level optimization formulation. The lower level
requires solving the Nash equilibrium under a given reward function, which
makes the overall problem challenging to optimize in an end-to-end way. We
propose a backpropagation scheme that differentiates through the Nash
equilibrium, which provides the gradient feedback for the upper level. In
particular, our method only requires a black-box solver for the (regularized)
Nash equilibrium (NE). We develop the convergence analysis for the proposed
framework with proper black-box NE solvers and demonstrate the empirical
successes in two multi-agent reinforcement learning (MARL) environments.
Related papers
- A Black-box Approach for Non-stationary Multi-agent Reinforcement Learning [53.83345471268163]
We investigate learning the equilibria in non-stationary multi-agent systems.
We show how to test for various types of equilibria by a black-box reduction to single-agent learning.
arXiv Detail & Related papers (2023-06-12T23:48:24Z) - Soft-Bellman Equilibrium in Affine Markov Games: Forward Solutions and
Inverse Learning [37.176741793213694]
We formulate a class of Markov games, termed affine Markov games, where an affine reward function couples the players' actions.
We introduce a novel solution concept, the soft-Bellman equilibrium, where each player is boundedly rational and chooses a soft-Bellman policy.
We then solve the inverse game problem of inferring the players' reward parameters from observed state-action trajectories via a projected-gradient algorithm.
arXiv Detail & Related papers (2023-03-31T22:50:47Z) - Global Nash Equilibrium in Non-convex Multi-player Game: Theory and
Algorithms [66.8634598612777]
We show that Nash equilibrium (NE) is acceptable to all players in a multi-player game.
We also show that no one can benefit unilaterally from the general theory step by step.
arXiv Detail & Related papers (2023-01-19T11:36:50Z) - Learning Two-Player Mixture Markov Games: Kernel Function Approximation
and Correlated Equilibrium [157.0902680672422]
We consider learning Nash equilibria in two-player zero-sum Markov Games with nonlinear function approximation.
We propose a novel online learning algorithm to find a Nash equilibrium by minimizing the duality gap.
arXiv Detail & Related papers (2022-08-10T14:21:54Z) - Regularized Gradient Descent Ascent for Two-Player Zero-Sum Markov Games [16.09467599829253]
We study the problem of finding Nash equilibrium in a two-player zero-sum game.
Our main contribution is to show that under proper choices of the regularization parameter, the gradient descents to the Nash equilibrium of the original unregularized problem.
arXiv Detail & Related papers (2022-05-27T03:24:12Z) - Towards convergence to Nash equilibria in two-team zero-sum games [17.4461045395989]
Two-team zero-sum games are defined as multi-player games where players are split into two competing sets of agents.
We focus on the solution concept of Nash equilibria (NE)
We show that computing NE for this class of games is $textithard$ for the complexity class $mathrm$.
arXiv Detail & Related papers (2021-11-07T21:15:35Z) - Sample-Efficient Learning of Stackelberg Equilibria in General-Sum Games [78.65798135008419]
It remains vastly open how to learn the Stackelberg equilibrium in general-sum games efficiently from samples.
This paper initiates the theoretical study of sample-efficient learning of the Stackelberg equilibrium in two-player turn-based general-sum games.
arXiv Detail & Related papers (2021-02-23T05:11:07Z) - Linear Regression Games: Convergence Guarantees to Approximate
Out-of-Distribution Solutions [35.313551211453266]
In this work, we extend the framework in Ahuja et al. for linear regressions by projecting the ensemble-game on an $ell_infty$ ball.
We show that such projections help achieve non-trivial OOD guarantees despite not achieving perfect invariance.
arXiv Detail & Related papers (2020-10-28T21:10:24Z) - GANs May Have No Nash Equilibria [12.691047660244331]
Generative adversarial networks (GANs) represent a zero-sum game between two machine players, a generator and a discriminator.
We show through several theoretical and numerical results that indeed GAN zero-sum games may not have any local Nash equilibria.
We propose a new approach, which we call proximal training, for solving GAN problems.
arXiv Detail & Related papers (2020-02-21T04:30:05Z) - Learning Zero-Sum Simultaneous-Move Markov Games Using Function
Approximation and Correlated Equilibrium [116.56359444619441]
We develop provably efficient reinforcement learning algorithms for two-player zero-sum finite-horizon Markov games.
In the offline setting, we control both players and aim to find the Nash Equilibrium by minimizing the duality gap.
In the online setting, we control a single player playing against an arbitrary opponent and aim to minimize the regret.
arXiv Detail & Related papers (2020-02-17T17:04:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.