Breaking the Curse of Multiagency: Provably Efficient Decentralized
Multi-Agent RL with Function Approximation
- URL: http://arxiv.org/abs/2302.06606v1
- Date: Mon, 13 Feb 2023 18:59:25 GMT
- Title: Breaking the Curse of Multiagency: Provably Efficient Decentralized
Multi-Agent RL with Function Approximation
- Authors: Yuanhao Wang, Qinghua Liu, Yu Bai, Chi Jin
- Abstract summary: This paper presents the first line of MARL algorithms that provably resolve the curse of multiagency approximation.
In exchange for learning a weaker version of CCEs, this algorithm applies to a wider range of problems under generic function approximation.
Our algorithm always outputs Markov CCEs, and an optimal rate of $widetildemathcalO(epsilon-2)$ for finding $epsilon$-optimal solutions.
- Score: 44.051717720483595
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A unique challenge in Multi-Agent Reinforcement Learning (MARL) is the curse
of multiagency, where the description length of the game as well as the
complexity of many existing learning algorithms scale exponentially with the
number of agents. While recent works successfully address this challenge under
the model of tabular Markov Games, their mechanisms critically rely on the
number of states being finite and small, and do not extend to practical
scenarios with enormous state spaces where function approximation must be used
to approximate value functions or policies.
This paper presents the first line of MARL algorithms that provably resolve
the curse of multiagency under function approximation. We design a new
decentralized algorithm -- V-Learning with Policy Replay, which gives the first
polynomial sample complexity results for learning approximate Coarse Correlated
Equilibria (CCEs) of Markov Games under decentralized linear function
approximation. Our algorithm always outputs Markov CCEs, and achieves an
optimal rate of $\widetilde{\mathcal{O}}(\epsilon^{-2})$ for finding
$\epsilon$-optimal solutions. Also, when restricted to the tabular case, our
result improves over the current best decentralized result
$\widetilde{\mathcal{O}}(\epsilon^{-3})$ for finding Markov CCEs. We further
present an alternative algorithm -- Decentralized Optimistic Policy Mirror
Descent, which finds policy-class-restricted CCEs using a polynomial number of
samples. In exchange for learning a weaker version of CCEs, this algorithm
applies to a wider range of problems under generic function approximation, such
as linear quadratic games and MARL problems with low ''marginal'' Eluder
dimension.
Related papers
- RL in Markov Games with Independent Function Approximation: Improved Sample Complexity Bound under the Local Access Model [15.596599935486534]
We introduce a new algorithm, Lin-Confident-FTRL, for learning coarse correlated equilibria (CCE) with local access to the simulator.
Up to a logarithmic dependence on the size of the state space, Lin-Confident-FTRL learns $epsilon$-CCE with a provable optimal accuracy bound $O(epsilon-2)$.
Our analysis generalizes the virtual policy bounds in the single-agent local planning literature.
arXiv Detail & Related papers (2024-03-18T07:54:11Z) - Improving Sample Efficiency of Model-Free Algorithms for Zero-Sum Markov Games [66.2085181793014]
We show that a model-free stage-based Q-learning algorithm can enjoy the same optimality in the $H$ dependence as model-based algorithms.
Our algorithm features a key novel design of updating the reference value functions as the pair of optimistic and pessimistic value functions.
arXiv Detail & Related papers (2023-08-17T08:34:58Z) - Breaking the Curse of Multiagents in a Large State Space: RL in Markov
Games with Independent Linear Function Approximation [56.715186432566576]
We propose a new model, independent linear Markov game, for reinforcement learning with a large state space and a large number of agents.
We design new algorithms for learning correlated equilibria (CCE) and Markov correlated equilibria (CE) with sample bounds complexity that only scalely with each agent's own function class complexity.
Our algorithms rely on two key technical innovations: (1) utilizing policy replay to tackle non-stationarity incurred by multiple agents and the use of function approximation; and (2) separating learning Markov equilibria and exploration in the Markov games.
arXiv Detail & Related papers (2023-02-07T18:47:48Z) - Representation Learning for General-sum Low-rank Markov Games [63.119870889883224]
We study multi-agent general-sum Markov games with nonlinear function approximation.
We focus on low-rank Markov games whose transition matrix admits a hidden low-rank structure on top of an unknown non-linear representation.
arXiv Detail & Related papers (2022-10-30T22:58:22Z) - Sample and Communication-Efficient Decentralized Actor-Critic Algorithms
with Finite-Time Analysis [27.21581944906418]
Actor-critic (AC) algorithms have been widely adopted in decentralized multi-agent systems.
We develop two decentralized AC and natural AC (NAC) algorithms that are private, and sample and communication-efficient.
arXiv Detail & Related papers (2021-09-08T15:02:21Z) - Parallel Stochastic Mirror Descent for MDPs [72.75921150912556]
We consider the problem of learning the optimal policy for infinite-horizon Markov decision processes (MDPs)
Some variant of Mirror Descent is proposed for convex programming problems with Lipschitz-continuous functionals.
We analyze this algorithm in a general case and obtain an estimate of the convergence rate that does not accumulate errors during the operation of the method.
arXiv Detail & Related papers (2021-02-27T19:28:39Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.