Soft-Bellman Equilibrium in Affine Markov Games: Forward Solutions and
Inverse Learning
- URL: http://arxiv.org/abs/2304.00163v2
- Date: Fri, 8 Sep 2023 17:33:12 GMT
- Title: Soft-Bellman Equilibrium in Affine Markov Games: Forward Solutions and
Inverse Learning
- Authors: Shenghui Chen, Yue Yu, David Fridovich-Keil, Ufuk Topcu
- Abstract summary: We formulate a class of Markov games, termed affine Markov games, where an affine reward function couples the players' actions.
We introduce a novel solution concept, the soft-Bellman equilibrium, where each player is boundedly rational and chooses a soft-Bellman policy.
We then solve the inverse game problem of inferring the players' reward parameters from observed state-action trajectories via a projected-gradient algorithm.
- Score: 37.176741793213694
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Markov games model interactions among multiple players in a stochastic,
dynamic environment. Each player in a Markov game maximizes its expected total
discounted reward, which depends upon the policies of the other players. We
formulate a class of Markov games, termed affine Markov games, where an affine
reward function couples the players' actions. We introduce a novel solution
concept, the soft-Bellman equilibrium, where each player is boundedly rational
and chooses a soft-Bellman policy rather than a purely rational policy as in
the well-known Nash equilibrium concept. We provide conditions for the
existence and uniqueness of the soft-Bellman equilibrium and propose a
nonlinear least-squares algorithm to compute such an equilibrium in the forward
problem. We then solve the inverse game problem of inferring the players'
reward parameters from observed state-action trajectories via a
projected-gradient algorithm. Experiments in a predator-prey OpenAI Gym
environment show that the reward parameters inferred by the proposed algorithm
outperform those inferred by a baseline algorithm: they reduce the
Kullback-Leibler divergence between the equilibrium policies and observed
policies by at least two orders of magnitude.
Related papers
- Convex Markov Games: A Framework for Fairness, Imitation, and Creativity in Multi-Agent Learning [31.958202912400925]
We introduce the class of convex Markov games that allow general convex preferences over occupancy measures.
Despite infinite time horizon and strictly higher generality than Markov games, pure strategy Nash equilibria exist under strict convexity.
Our experiments imitate human choices in ultimatum games, reveal novel solutions to the repeated prisoner's dilemma, and find fair solutions in a repeated asymmetric coordination game.
arXiv Detail & Related papers (2024-10-22T00:55:04Z) - Optimistic Policy Gradient in Multi-Player Markov Games with a Single
Controller: Convergence Beyond the Minty Property [89.96815099996132]
We develop a new framework to characterize optimistic policy gradient methods in multi-player games with a single controller.
Our approach relies on a natural generalization of the classical Minty property that we introduce, which we anticipate to have further applications beyond Markov games.
arXiv Detail & Related papers (2023-12-19T11:34:10Z) - Bayes correlated equilibria and no-regret dynamics [9.89901717499058]
This paper explores equilibrium concepts for Bayesian games, which are fundamental models of games with incomplete information.
We focus on communication equilibria, which can be realized by a mediator who gathers each player's private information and then sends correlated recommendations to the players.
We present an efficient algorithm for minimizing untruthful swap regret with a sublinear upper bound, which we prove to be tight up to a multiplicative constant.
arXiv Detail & Related papers (2023-04-11T06:22:51Z) - Differentiable Arbitrating in Zero-sum Markov Games [59.62061049680365]
We study how to perturb the reward in a zero-sum Markov game with two players to induce a desirable Nash equilibrium, namely arbitrating.
The lower level requires solving the Nash equilibrium under a given reward function, which makes the overall problem challenging to optimize in an end-to-end way.
We propose a backpropagation scheme that differentiates through the Nash equilibrium, which provides the gradient feedback for the upper level.
arXiv Detail & Related papers (2023-02-20T16:05:04Z) - Breaking the Curse of Multiagents in a Large State Space: RL in Markov
Games with Independent Linear Function Approximation [56.715186432566576]
We propose a new model, independent linear Markov game, for reinforcement learning with a large state space and a large number of agents.
We design new algorithms for learning correlated equilibria (CCE) and Markov correlated equilibria (CE) with sample bounds complexity that only scalely with each agent's own function class complexity.
Our algorithms rely on two key technical innovations: (1) utilizing policy replay to tackle non-stationarity incurred by multiple agents and the use of function approximation; and (2) separating learning Markov equilibria and exploration in the Markov games.
arXiv Detail & Related papers (2023-02-07T18:47:48Z) - Offline Learning in Markov Games with General Function Approximation [22.2472618685325]
We study offline multi-agent reinforcement learning (RL) in Markov games.
We provide the first framework for sample-efficient offline learning in Markov games.
arXiv Detail & Related papers (2023-02-06T05:22:27Z) - Finding mixed-strategy equilibria of continuous-action games without
gradients using randomized policy networks [83.28949556413717]
We study the problem of computing an approximate Nash equilibrium of continuous-action game without access to gradients.
We model players' strategies using artificial neural networks.
This paper is the first to solve general continuous-action games with unrestricted mixed strategies and without any gradient information.
arXiv Detail & Related papers (2022-11-29T05:16:41Z) - Efficiently Computing Nash Equilibria in Adversarial Team Markov Games [19.717850955051837]
We introduce a class of games in which a team identically players is competing against an adversarial player.
This setting allows for a unifying treatment of zero-sum Markov games potential games.
Our main contribution is the first algorithm for computing stationary $epsilon$-approximate Nash equilibria in adversarial team Markov games.
arXiv Detail & Related papers (2022-08-03T16:41:01Z) - Learning Zero-Sum Simultaneous-Move Markov Games Using Function
Approximation and Correlated Equilibrium [116.56359444619441]
We develop provably efficient reinforcement learning algorithms for two-player zero-sum finite-horizon Markov games.
In the offline setting, we control both players and aim to find the Nash Equilibrium by minimizing the duality gap.
In the online setting, we control a single player playing against an arbitrary opponent and aim to minimize the regret.
arXiv Detail & Related papers (2020-02-17T17:04:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.