Breaking the Curse of Multiagents in a Large State Space: RL in Markov
Games with Independent Linear Function Approximation
- URL: http://arxiv.org/abs/2302.03673v3
- Date: Wed, 21 Jun 2023 21:15:10 GMT
- Title: Breaking the Curse of Multiagents in a Large State Space: RL in Markov
Games with Independent Linear Function Approximation
- Authors: Qiwen Cui, Kaiqing Zhang, Simon S. Du
- Abstract summary: We propose a new model, independent linear Markov game, for reinforcement learning with a large state space and a large number of agents.
We design new algorithms for learning correlated equilibria (CCE) and Markov correlated equilibria (CE) with sample bounds complexity that only scalely with each agent's own function class complexity.
Our algorithms rely on two key technical innovations: (1) utilizing policy replay to tackle non-stationarity incurred by multiple agents and the use of function approximation; and (2) separating learning Markov equilibria and exploration in the Markov games.
- Score: 56.715186432566576
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a new model, independent linear Markov game, for multi-agent
reinforcement learning with a large state space and a large number of agents.
This is a class of Markov games with independent linear function approximation,
where each agent has its own function approximation for the state-action value
functions that are marginalized by other players' policies. We design new
algorithms for learning the Markov coarse correlated equilibria (CCE) and
Markov correlated equilibria (CE) with sample complexity bounds that only scale
polynomially with each agent's own function class complexity, thus breaking the
curse of multiagents. In contrast, existing works for Markov games with
function approximation have sample complexity bounds scale with the size of the
\emph{joint action space} when specialized to the canonical tabular Markov game
setting, which is exponentially large in the number of agents. Our algorithms
rely on two key technical innovations: (1) utilizing policy replay to tackle
non-stationarity incurred by multiple agents and the use of function
approximation; (2) separating learning Markov equilibria and exploration in the
Markov games, which allows us to use the full-information no-regret learning
oracle instead of the stronger bandit-feedback no-regret learning oracle used
in the tabular setting. Furthermore, we propose an iterative-best-response type
algorithm that can learn pure Markov Nash equilibria in independent linear
Markov potential games. In the tabular case, by adapting the policy replay
mechanism for independent linear Markov games, we propose an algorithm with
$\widetilde{O}(\epsilon^{-2})$ sample complexity to learn Markov CCE, which
improves the state-of-the-art result $\widetilde{O}(\epsilon^{-3})$ in
Daskalakis et al. 2022, where $\epsilon$ is the desired accuracy, and also
significantly improves other problem parameters.
Related papers
- Optimistic Policy Gradient in Multi-Player Markov Games with a Single
Controller: Convergence Beyond the Minty Property [89.96815099996132]
We develop a new framework to characterize optimistic policy gradient methods in multi-player games with a single controller.
Our approach relies on a natural generalization of the classical Minty property that we introduce, which we anticipate to have further applications beyond Markov games.
arXiv Detail & Related papers (2023-12-19T11:34:10Z) - Representation Learning for General-sum Low-rank Markov Games [63.119870889883224]
We study multi-agent general-sum Markov games with nonlinear function approximation.
We focus on low-rank Markov games whose transition matrix admits a hidden low-rank structure on top of an unknown non-linear representation.
arXiv Detail & Related papers (2022-10-30T22:58:22Z) - The Complexity of Markov Equilibrium in Stochastic Games [44.77547027158141]
We show that computing approximate stationary Markov coarse correlated equilibria (CCE) in general-sum games is computationally intractable.
Our results stand in sharp contrast to normal-form games where exact CCEs are efficiently computable.
arXiv Detail & Related papers (2022-04-08T10:51:01Z) - Towards General Function Approximation in Zero-Sum Markov Games [126.58493169301012]
This paper considers two-player zero-sum finite-horizon Markov games with simultaneous moves.
Provably efficient algorithms for both decoupled and coordinated settings are developed.
arXiv Detail & Related papers (2021-07-30T15:25:13Z) - A Sharp Analysis of Model-based Reinforcement Learning with Self-Play [49.88233710867315]
We present a sharp analysis of model-based self-play algorithms for multi-agent Markov games.
We design an algorithm -- Optimistic Nash Value Iteration (Nash-VI) for two-player zero-sum Markov games.
We further adapt our analysis to designing a provably efficient task-agnostic algorithm for zero-sum Markov games.
arXiv Detail & Related papers (2020-10-04T15:27:39Z) - Learning Zero-Sum Simultaneous-Move Markov Games Using Function
Approximation and Correlated Equilibrium [116.56359444619441]
We develop provably efficient reinforcement learning algorithms for two-player zero-sum finite-horizon Markov games.
In the offline setting, we control both players and aim to find the Nash Equilibrium by minimizing the duality gap.
In the online setting, we control a single player playing against an arbitrary opponent and aim to minimize the regret.
arXiv Detail & Related papers (2020-02-17T17:04:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.