Regret Minimization and Convergence to Equilibria in General-sum Markov
Games
- URL: http://arxiv.org/abs/2207.14211v1
- Date: Thu, 28 Jul 2022 16:27:59 GMT
- Title: Regret Minimization and Convergence to Equilibria in General-sum Markov
Games
- Authors: Liad Erez, Tal Lancewicki, Uri Sherman, Tomer Koren and Yishay Mansour
- Abstract summary: We present the first algorithm for learning in general-sum Markov games that provides sublinear regret guarantees when executed by all agents.
Our algorithm is decentralized, computationally efficient, and does not require any communication between agents.
- Score: 57.568118148036376
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: An abundance of recent impossibility results establish that regret
minimization in Markov games with adversarial opponents is both statistically
and computationally intractable. Nevertheless, none of these results preclude
the possibility of regret minimization under the assumption that all parties
adopt the same learning procedure. In this work, we present the first (to our
knowledge) algorithm for learning in general-sum Markov games that provides
sublinear regret guarantees when executed by all agents. The bounds we obtain
are for swap regret, and thus, along the way, imply convergence to a correlated
equilibrium. Our algorithm is decentralized, computationally efficient, and
does not require any communication between agents. Our key observation is that
online learning via policy optimization in Markov games essentially reduces to
a form of weighted regret minimization, with unknown weights determined by the
path length of the agents' policy sequence. Consequently, controlling the path
length leads to weighted regret objectives for which sufficiently adaptive
algorithms provide sublinear regret guarantees.
Related papers
- Convex Markov Games: A Framework for Fairness, Imitation, and Creativity in Multi-Agent Learning [31.958202912400925]
We introduce the class of convex Markov games that allow general convex preferences over occupancy measures.
Despite infinite time horizon and strictly higher generality than Markov games, pure strategy Nash equilibria exist under strict convexity.
Our experiments imitate human choices in ultimatum games, reveal novel solutions to the repeated prisoner's dilemma, and find fair solutions in a repeated asymmetric coordination game.
arXiv Detail & Related papers (2024-10-22T00:55:04Z) - Taming Equilibrium Bias in Risk-Sensitive Multi-Agent Reinforcement Learning [14.571671587217764]
We study risk-sensitive multi-agent reinforcement learning under general-sum Markov games.
We show that using the regret naively adapted from existing literature as a performance metric could induce policies with equilibrium bias.
We propose a novel notion of regret, which we call risk-balanced regret, and show through a lower bound that it overcomes the issue of equilibrium bias.
arXiv Detail & Related papers (2024-05-04T17:47:45Z) - Hardness of Independent Learning and Sparse Equilibrium Computation in
Markov Games [70.19141208203227]
We consider the problem of decentralized multi-agent reinforcement learning in Markov games.
We show that no algorithm attains no-regret in general-sum games when executed independently by all players.
We show that our lower bounds hold even for seemingly easier setting in which all agents are controlled by a centralized algorithm.
arXiv Detail & Related papers (2023-03-22T03:28:12Z) - Breaking the Curse of Multiagents in a Large State Space: RL in Markov
Games with Independent Linear Function Approximation [56.715186432566576]
We propose a new model, independent linear Markov game, for reinforcement learning with a large state space and a large number of agents.
We design new algorithms for learning correlated equilibria (CCE) and Markov correlated equilibria (CE) with sample bounds complexity that only scalely with each agent's own function class complexity.
Our algorithms rely on two key technical innovations: (1) utilizing policy replay to tackle non-stationarity incurred by multiple agents and the use of function approximation; and (2) separating learning Markov equilibria and exploration in the Markov games.
arXiv Detail & Related papers (2023-02-07T18:47:48Z) - Offline Learning in Markov Games with General Function Approximation [22.2472618685325]
We study offline multi-agent reinforcement learning (RL) in Markov games.
We provide the first framework for sample-efficient offline learning in Markov games.
arXiv Detail & Related papers (2023-02-06T05:22:27Z) - Learning Rationalizable Equilibria in Multiplayer Games [38.922957434291554]
Existing algorithms require a number of samples exponential in the number of players to learn rationalizable equilibria under bandit feedback.
This paper develops the first line of efficient algorithms for learning rationalizable Coarse Correlated Equilibria (CCE) and Correlated Equilibria (CE)
Our algorithms incorporate several novel techniques to guarantee rationalizability and no (swap-)regret simultaneously, including a correlated exploration scheme and adaptive learning rates.
arXiv Detail & Related papers (2022-10-20T16:49:00Z) - No-Regret Learning in Time-Varying Zero-Sum Games [99.86860277006318]
Learning from repeated play in a fixed zero-sum game is a classic problem in game theory and online learning.
We develop a single parameter-free algorithm that simultaneously enjoys favorable guarantees under three performance measures.
Our algorithm is based on a two-layer structure with a meta-algorithm learning over a group of black-box base-learners satisfying a certain property.
arXiv Detail & Related papers (2022-01-30T06:10:04Z) - Model-Free Online Learning in Unknown Sequential Decision Making
Problems and Games [114.90723492840499]
In large two-player zero-sum imperfect-information games, modern extensions of counterfactual regret minimization (CFR) are currently the practical state of the art for computing a Nash equilibrium.
We formalize an online learning setting in which the strategy space is not known to the agent.
We give an efficient algorithm that achieves $O(T3/4)$ regret with high probability for that setting, even when the agent faces an adversarial environment.
arXiv Detail & Related papers (2021-03-08T04:03:24Z) - Learning Zero-Sum Simultaneous-Move Markov Games Using Function
Approximation and Correlated Equilibrium [116.56359444619441]
We develop provably efficient reinforcement learning algorithms for two-player zero-sum finite-horizon Markov games.
In the offline setting, we control both players and aim to find the Nash Equilibrium by minimizing the duality gap.
In the online setting, we control a single player playing against an arbitrary opponent and aim to minimize the regret.
arXiv Detail & Related papers (2020-02-17T17:04:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.