The Power of Exploiter: Provable Multi-Agent RL in Large State Spaces
- URL: http://arxiv.org/abs/2106.03352v1
- Date: Mon, 7 Jun 2021 05:39:09 GMT
- Title: The Power of Exploiter: Provable Multi-Agent RL in Large State Spaces
- Authors: Chi Jin, Qinghua Liu, Tiancheng Yu
- Abstract summary: We propose an algorithm that can provably find the Nash equilibrium policy using a number of samples.
A key component of our new algorithm is the exploiter, which facilitates the learning of the main player by deliberately exploiting her weakness.
Our theoretical framework is generic, which applies to a wide range of models including but not limited to MGs, MGs with linear or kernel function approximation, and MGs with rich observations.
- Score: 36.097537237660234
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern reinforcement learning (RL) commonly engages practical problems with
large state spaces, where function approximation must be deployed to
approximate either the value function or the policy. While recent progresses in
RL theory address a rich set of RL problems with general function
approximation, such successes are mostly restricted to the single-agent
setting. It remains elusive how to extend these results to multi-agent RL,
especially due to the new challenges arising from its game-theoretical nature.
This paper considers two-player zero-sum Markov Games (MGs). We propose a new
algorithm that can provably find the Nash equilibrium policy using a polynomial
number of samples, for any MG with low multi-agent Bellman-Eluder dimension --
a new complexity measure adapted from its single-agent version (Jin et al.,
2021). A key component of our new algorithm is the exploiter, which facilitates
the learning of the main player by deliberately exploiting her weakness. Our
theoretical framework is generic, which applies to a wide range of models
including but not limited to tabular MGs, MGs with linear or kernel function
approximation, and MGs with rich observations.
Related papers
- Independent Policy Mirror Descent for Markov Potential Games: Scaling to Large Number of Players [17.55330497310932]
Markov Potential Games (MPGs) form an important sub-class of Markov games.
MPGs include as a special case the identical-interest setting where all the agents share the same reward function.
Scaling the performance of Nash equilibrium learning algorithms to a large number of agents is crucial for multi-agent systems.
arXiv Detail & Related papers (2024-08-15T11:02:05Z) - Refined Sample Complexity for Markov Games with Independent Linear Function Approximation [49.5660193419984]
Markov Games (MG) is an important model for Multi-Agent Reinforcement Learning (MARL)
This paper first refines the AVLPR framework by Wang et al. (2023), with an insight of designing pessimistic estimation of the sub-optimality gap.
We give the first algorithm that tackles the curse of multi-agents, attains the optimal $O(T-1/2) convergence rate, and avoids $textpoly(A_max)$ dependency simultaneously.
arXiv Detail & Related papers (2024-02-11T01:51:15Z) - Model-Based RL for Mean-Field Games is not Statistically Harder than Single-Agent RL [57.745700271150454]
We study the sample complexity of reinforcement learning in Mean-Field Games (MFGs) with model-based function approximation.
We introduce the Partial Model-Based Eluder Dimension (P-MBED), a more effective notion to characterize the model class complexity.
arXiv Detail & Related papers (2024-02-08T14:54:47Z) - Multi-agent Policy Reciprocity with Theoretical Guarantee [24.65151626601257]
We propose a novel multi-agent policy reciprocity (PR) framework, where each agent can fully exploit cross-agent policies even in mismatched states.
Experimental results on discrete and continuous environments demonstrate that PR outperforms various existing RL and transfer RL methods.
arXiv Detail & Related papers (2023-04-12T06:27:10Z) - Online Sub-Sampling for Reinforcement Learning with General Function
Approximation [111.01990889581243]
In this paper, we establish an efficient online sub-sampling framework that measures the information gain of data points collected by an RL algorithm.
For a value-based method with complexity-bounded function class, we show that the policy only needs to be updated for $proptooperatornamepolylog(K)$ times.
In contrast to existing approaches that update the policy for at least $Omega(K)$ times, our approach drastically reduces the number of optimization calls in solving for a policy.
arXiv Detail & Related papers (2021-06-14T07:36:25Z) - Distributed Reinforcement Learning for Cooperative Multi-Robot Object
Manipulation [53.262360083572005]
We consider solving a cooperative multi-robot object manipulation task using reinforcement learning (RL)
We propose two distributed multi-agent RL approaches: distributed approximate RL (DA-RL) and game-theoretic RL (GT-RL)
Although we focus on a small system of two agents in this paper, both DA-RL and GT-RL apply to general multi-agent systems, and are expected to scale well to large systems.
arXiv Detail & Related papers (2020-03-21T00:43:54Z) - Scalable Multi-Agent Inverse Reinforcement Learning via
Actor-Attention-Critic [54.2180984002807]
Multi-agent adversarial inverse reinforcement learning (MA-AIRL) is a recent approach that applies single-agent AIRL to multi-agent problems.
We propose a multi-agent inverse RL algorithm that is more sample-efficient and scalable than previous works.
arXiv Detail & Related papers (2020-02-24T20:30:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.