Decentralized Cooperative Multi-Agent Reinforcement Learning with
Exploration
- URL: http://arxiv.org/abs/2110.05707v1
- Date: Tue, 12 Oct 2021 02:45:12 GMT
- Title: Decentralized Cooperative Multi-Agent Reinforcement Learning with
Exploration
- Authors: Weichao Mao, Tamer Ba\c{s}ar, Lin F. Yang, Kaiqing Zhang
- Abstract summary: We study multi-agent reinforcement learning in the most basic cooperative setting -- Markov teams.
We propose an algorithm in which each agent independently runs a stage-based V-learning style algorithm.
We show that the agents can learn an $epsilon$-approximate Nash equilibrium policy in at most $proptowidetildeO (1/epsilon4)$ episodes.
- Score: 35.75029940279768
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many real-world applications of multi-agent reinforcement learning (RL), such
as multi-robot navigation and decentralized control of cyber-physical systems,
involve the cooperation of agents as a team with aligned objectives. We study
multi-agent RL in the most basic cooperative setting -- Markov teams -- a class
of Markov games where the cooperating agents share a common reward. We propose
an algorithm in which each agent independently runs stage-based V-learning (a
Q-learning style algorithm) to efficiently explore the unknown environment,
while using a stochastic gradient descent (SGD) subroutine for policy updates.
We show that the agents can learn an $\epsilon$-approximate Nash equilibrium
policy in at most $\propto\widetilde{O}(1/\epsilon^4)$ episodes. Our results
advocate the use of a novel \emph{stage-based} V-learning approach to create a
stage-wise stationary environment. We also show that under certain smoothness
assumptions of the team, our algorithm can achieve a nearly \emph{team-optimal}
Nash equilibrium. Simulation results corroborate our theoretical findings. One
key feature of our algorithm is being \emph{decentralized}, in the sense that
each agent has access to only the state and its local actions, and is even
\emph{oblivious} to the presence of the other agents. Neither communication
among teammates nor coordination by a central controller is required during
learning. Hence, our algorithm can readily generalize to an arbitrary number of
agents, without suffering from the exponential dependence on the number of
agents.
Related papers
- N-Agent Ad Hoc Teamwork [36.10108537776956]
Current approaches to learning cooperative multi-agent behaviors assume relatively restrictive settings.
This paper formalizes the problem, and proposes the Policy Optimization with Agent Modelling (POAM) algorithm.
POAM is a policy gradient, multi-agent reinforcement learning approach to the NAHT problem, that enables adaptation to diverse teammate behaviors.
arXiv Detail & Related papers (2024-04-16T17:13:08Z) - Decentralized and Lifelong-Adaptive Multi-Agent Collaborative Learning [57.652899266553035]
Decentralized and lifelong-adaptive multi-agent collaborative learning aims to enhance collaboration among multiple agents without a central server.
We propose DeLAMA, a decentralized multi-agent lifelong collaborative learning algorithm with dynamic collaboration graphs.
arXiv Detail & Related papers (2024-03-11T09:21:11Z) - MADiff: Offline Multi-agent Learning with Diffusion Models [79.18130544233794]
Diffusion model (DM) recently achieved huge success in various scenarios including offline reinforcement learning.
We propose MADiff, a novel generative multi-agent learning framework to tackle this problem.
Our experiments show the superior performance of MADiff compared to baseline algorithms in a wide range of multi-agent learning tasks.
arXiv Detail & Related papers (2023-05-27T02:14:09Z) - ELIGN: Expectation Alignment as a Multi-Agent Intrinsic Reward [29.737986509769808]
We propose a self-supervised intrinsic reward ELIGN - expectation alignment.
Similar to how animals collaborate in a decentralized manner with those in their vicinity, agents trained with expectation alignment learn behaviors that match their neighbors' expectations.
We show that agent coordination improves through expectation alignment because agents learn to divide tasks amongst themselves, break coordination symmetries, and confuse adversaries.
arXiv Detail & Related papers (2022-10-09T22:24:44Z) - Provably Efficient Reinforcement Learning in Decentralized General-Sum
Markov Games [5.205867750232226]
This paper addresses the problem of learning an equilibrium efficiently in general-sum Markov games.
We propose an algorithm in which each agent independently runs optimistic V-learning to efficiently explore the unknown environment.
We show that the agents can find an $epsilon$-approximate CCE in at most $widetildeO( H6S A /epsilon2)$ episodes.
arXiv Detail & Related papers (2021-10-12T02:01:22Z) - Learning to Coordinate in Multi-Agent Systems: A Coordinated
Actor-Critic Algorithm and Finite-Time Guarantees [43.10380224532313]
We study the emergence of coordinated behavior by autonomous agents using an actor-critic (AC) algorithm.
We propose and analyze a class of coordinated actor-critic algorithms (CAC) in which individually parametrized policies have a it shared part and a it personalized part.
This work provides the first finite-sample guarantee for decentralized AC algorithm with partially personalized policies.
arXiv Detail & Related papers (2021-10-11T20:26:16Z) - Distributed Heuristic Multi-Agent Path Finding with Communication [7.854890646114447]
Multi-Agent Path Finding (MAPF) is essential to large-scale robotic systems.
Recent methods have applied reinforcement learning (RL) to learn decentralized polices in partially observable environments.
This paper combines communication with deep Q-learning to provide a novel learning based method for MAPF.
arXiv Detail & Related papers (2021-06-21T18:50:58Z) - PsiPhi-Learning: Reinforcement Learning with Demonstrations using
Successor Features and Inverse Temporal Difference Learning [102.36450942613091]
We propose an inverse reinforcement learning algorithm, called emphinverse temporal difference learning (ITD)
We show how to seamlessly integrate ITD with learning from online environment interactions, arriving at a novel algorithm for reinforcement learning with demonstrations, called $Psi Phi$-learning.
arXiv Detail & Related papers (2021-02-24T21:12:09Z) - Is Independent Learning All You Need in the StarCraft Multi-Agent
Challenge? [100.48692829396778]
Independent PPO (IPPO) is a form of independent learning in which each agent simply estimates its local value function.
IPPO's strong performance may be due to its robustness to some forms of environment non-stationarity.
arXiv Detail & Related papers (2020-11-18T20:29:59Z) - F2A2: Flexible Fully-decentralized Approximate Actor-critic for
Cooperative Multi-agent Reinforcement Learning [110.35516334788687]
Decentralized multi-agent reinforcement learning algorithms are sometimes unpractical in complicated applications.
We propose a flexible fully decentralized actor-critic MARL framework, which can handle large-scale general cooperative multi-agent setting.
Our framework can achieve scalability and stability for large-scale environment and reduce information transmission.
arXiv Detail & Related papers (2020-04-17T14:56:29Z) - Distributed Reinforcement Learning for Cooperative Multi-Robot Object
Manipulation [53.262360083572005]
We consider solving a cooperative multi-robot object manipulation task using reinforcement learning (RL)
We propose two distributed multi-agent RL approaches: distributed approximate RL (DA-RL) and game-theoretic RL (GT-RL)
Although we focus on a small system of two agents in this paper, both DA-RL and GT-RL apply to general multi-agent systems, and are expected to scale well to large systems.
arXiv Detail & Related papers (2020-03-21T00:43:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.