Learning to Coordinate in Multi-Agent Systems: A Coordinated
Actor-Critic Algorithm and Finite-Time Guarantees
- URL: http://arxiv.org/abs/2110.05597v1
- Date: Mon, 11 Oct 2021 20:26:16 GMT
- Title: Learning to Coordinate in Multi-Agent Systems: A Coordinated
Actor-Critic Algorithm and Finite-Time Guarantees
- Authors: Siliang Zeng, Tianyi Chen, Alfredo Garcia, Mingyi Hong
- Abstract summary: We study the emergence of coordinated behavior by autonomous agents using an actor-critic (AC) algorithm.
We propose and analyze a class of coordinated actor-critic algorithms (CAC) in which individually parametrized policies have a it shared part and a it personalized part.
This work provides the first finite-sample guarantee for decentralized AC algorithm with partially personalized policies.
- Score: 43.10380224532313
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-agent reinforcement learning (MARL) has attracted much research
attention recently. However, unlike its single-agent counterpart, many
theoretical and algorithmic aspects of MARL have not been well-understood. In
this paper, we study the emergence of coordinated behavior by autonomous agents
using an actor-critic (AC) algorithm. Specifically, we propose and analyze a
class of coordinated actor-critic algorithms (CAC) in which individually
parametrized policies have a {\it shared} part (which is jointly optimized
among all agents) and a {\it personalized} part (which is only locally
optimized). Such kind of {\it partially personalized} policy allows agents to
learn to coordinate by leveraging peers' past experience and adapt to
individual tasks. The flexibility in our design allows the proposed MARL-CAC
algorithm to be used in a {\it fully decentralized} setting, where the agents
can only communicate with their neighbors, as well as a {\it federated}
setting, where the agents occasionally communicate with a server while
optimizing their (partially personalized) local models. Theoretically, we show
that under some standard regularity assumptions, the proposed MARL-CAC
algorithm requires $\mathcal{O}(\epsilon^{-\frac{5}{2}})$ samples to achieve an
$\epsilon$-stationary solution (defined as the solution whose squared norm of
the gradient of the objective function is less than $\epsilon$). To the best of
our knowledge, this work provides the first finite-sample guarantee for
decentralized AC algorithm with partially personalized policies.
Related papers
- Multi-agent Deep Covering Skill Discovery [50.812414209206054]
We propose Multi-agent Deep Covering Option Discovery, which constructs the multi-agent options through minimizing the expected cover time of the multiple agents' joint state space.
Also, we propose a novel framework to adopt the multi-agent options in the MARL process.
We show that the proposed algorithm can effectively capture the agent interactions with the attention mechanism, successfully identify multi-agent options, and significantly outperforms prior works using single-agent options or no options.
arXiv Detail & Related papers (2022-10-07T00:40:59Z) - Actor-Critic based Improper Reinforcement Learning [61.430513757337486]
We consider an improper reinforcement learning setting where a learner is given $M$ base controllers for an unknown Markov decision process.
We propose two algorithms: (1) a Policy Gradient-based approach; and (2) an algorithm that can switch between a simple Actor-Critic scheme and a Natural Actor-Critic scheme.
arXiv Detail & Related papers (2022-07-19T05:55:02Z) - RACA: Relation-Aware Credit Assignment for Ad-Hoc Cooperation in
Multi-Agent Deep Reinforcement Learning [55.55009081609396]
We propose a novel method, called Relation-Aware Credit Assignment (RACA), which achieves zero-shot generalization in ad-hoc cooperation scenarios.
RACA takes advantage of a graph-based encoder relation to encode the topological structure between agents.
Our method outperforms baseline methods on the StarCraftII micromanagement benchmark and ad-hoc cooperation scenarios.
arXiv Detail & Related papers (2022-06-02T03:39:27Z) - Decentralized Multi-Agent Reinforcement Learning: An Off-Policy Method [6.261762915564555]
We discuss the problem of decentralized multi-agent reinforcement learning (MARL) in this work.
In our setting, the global state, action, and reward are assumed to be fully observable, while the local policy is protected as privacy by each agent, and thus cannot be shared with others.
The policy evaluation and policy improvement algorithms are designed for discrete and continuous state-action-space Markov Decision Process (MDP) respectively.
arXiv Detail & Related papers (2021-10-31T09:08:46Z) - Decentralized Cooperative Multi-Agent Reinforcement Learning with
Exploration [35.75029940279768]
We study multi-agent reinforcement learning in the most basic cooperative setting -- Markov teams.
We propose an algorithm in which each agent independently runs a stage-based V-learning style algorithm.
We show that the agents can learn an $epsilon$-approximate Nash equilibrium policy in at most $proptowidetildeO (1/epsilon4)$ episodes.
arXiv Detail & Related papers (2021-10-12T02:45:12Z) - Sample and Communication-Efficient Decentralized Actor-Critic Algorithms
with Finite-Time Analysis [27.21581944906418]
Actor-critic (AC) algorithms have been widely adopted in decentralized multi-agent systems.
We develop two decentralized AC and natural AC (NAC) algorithms that are private, and sample and communication-efficient.
arXiv Detail & Related papers (2021-09-08T15:02:21Z) - Multi-Agent Trust Region Policy Optimization [34.91180300856614]
We show that the policy update of TRPO can be transformed into a distributed consensus optimization problem for multi-agent cases.
We propose a decentralized MARL algorithm, which we call multi-agent TRPO (MATRPO)
arXiv Detail & Related papers (2020-10-15T17:49:47Z) - FedPD: A Federated Learning Framework with Optimal Rates and Adaptivity
to Non-IID Data [59.50904660420082]
Federated Learning (FL) has become a popular paradigm for learning from distributed data.
To effectively utilize data at different devices without moving them to the cloud, algorithms such as the Federated Averaging (FedAvg) have adopted a "computation then aggregation" (CTA) model.
arXiv Detail & Related papers (2020-05-22T23:07:42Z) - F2A2: Flexible Fully-decentralized Approximate Actor-critic for
Cooperative Multi-agent Reinforcement Learning [110.35516334788687]
Decentralized multi-agent reinforcement learning algorithms are sometimes unpractical in complicated applications.
We propose a flexible fully decentralized actor-critic MARL framework, which can handle large-scale general cooperative multi-agent setting.
Our framework can achieve scalability and stability for large-scale environment and reduce information transmission.
arXiv Detail & Related papers (2020-04-17T14:56:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.