Asynchronous, Option-Based Multi-Agent Policy Gradient: A Conditional
Reasoning Approach
- URL: http://arxiv.org/abs/2203.15925v3
- Date: Wed, 2 Aug 2023 05:57:05 GMT
- Title: Asynchronous, Option-Based Multi-Agent Policy Gradient: A Conditional
Reasoning Approach
- Authors: Xubo Lyu, Amin Banitalebi-Dehkordi, Mo Chen, Yong Zhang
- Abstract summary: Multi-agent policy gradient (MAPG) methods are commonly used to learn such policies.
In complex problems with large state and action spaces, it is advantageous to extend MAPG methods to use higher-level actions.
We propose a novel, conditional reasoning approach to address this problem.
- Score: 10.904610735933145
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cooperative multi-agent problems often require coordination between agents,
which can be achieved through a centralized policy that considers the global
state. Multi-agent policy gradient (MAPG) methods are commonly used to learn
such policies, but they are often limited to problems with low-level action
spaces. In complex problems with large state and action spaces, it is
advantageous to extend MAPG methods to use higher-level actions, also known as
options, to improve the policy search efficiency. However, multi-robot option
executions are often asynchronous, that is, agents may select and complete
their options at different time steps. This makes it difficult for MAPG methods
to derive a centralized policy and evaluate its gradient, as centralized policy
always select new options at the same time. In this work, we propose a novel,
conditional reasoning approach to address this problem and demonstrate its
effectiveness on representative option-based multi-agent cooperative tasks
through empirical validation. Find code and videos at:
\href{https://sites.google.com/view/mahrlsupp/}{https://sites.google.com/view/mahrlsupp/}
Related papers
- TAPE: Leveraging Agent Topology for Cooperative Multi-Agent Policy
Gradient [36.83464785085713]
We propose an agent topology framework, which decides whether other agents should be considered in policy.
Agents can use coalition utility as learning objective instead of global utility by centralized critics or local utility by individual critics.
We prove the policy improvement theorem for TAPE and give a theoretical explanation for the improved cooperation among agents.
arXiv Detail & Related papers (2023-12-25T09:24:33Z) - Optimistic Multi-Agent Policy Gradient [23.781837938235036]
Relative overgeneralization (RO) occurs when agents converge towards a suboptimal joint policy.
No methods have been proposed for addressing RO in multi-agent policy gradient (MAPG) methods.
We propose a general, yet simple, framework to enable optimistic updates in MAPG methods that alleviate the RO problem.
arXiv Detail & Related papers (2023-11-03T14:47:54Z) - Policy Diversity for Cooperative Agents [8.689289576285095]
Multi-agent reinforcement learning aims to find the optimal team cooperative policy to complete a task.
There may exist multiple different ways of cooperating, which usually are very needed by domain experts.
Unfortunately, there is a general lack of effective policy diversity approaches specifically designed for the multi-agent domain.
arXiv Detail & Related papers (2023-08-28T05:23:16Z) - Local Optimization Achieves Global Optimality in Multi-Agent
Reinforcement Learning [139.53668999720605]
We present a multi-agent PPO algorithm in which the local policy of each agent is updated similarly to vanilla PPO.
We prove that with standard regularity conditions on the Markov game and problem-dependent quantities, our algorithm converges to the globally optimal policy at a sublinear rate.
arXiv Detail & Related papers (2023-05-08T16:20:03Z) - Distributed-Training-and-Execution Multi-Agent Reinforcement Learning
for Power Control in HetNet [48.96004919910818]
We propose a multi-agent deep reinforcement learning (MADRL) based power control scheme for the HetNet.
To promote cooperation among agents, we develop a penalty-based Q learning (PQL) algorithm for MADRL systems.
In this way, an agent's policy can be learned by other agents more easily, resulting in a more efficient collaboration process.
arXiv Detail & Related papers (2022-12-15T17:01:56Z) - Multi-agent Deep Covering Skill Discovery [50.812414209206054]
We propose Multi-agent Deep Covering Option Discovery, which constructs the multi-agent options through minimizing the expected cover time of the multiple agents' joint state space.
Also, we propose a novel framework to adopt the multi-agent options in the MARL process.
We show that the proposed algorithm can effectively capture the agent interactions with the attention mechanism, successfully identify multi-agent options, and significantly outperforms prior works using single-agent options or no options.
arXiv Detail & Related papers (2022-10-07T00:40:59Z) - Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement
Learning with Actor Rectification [74.10976684469435]
offline reinforcement learning (RL) algorithms can be transferred to multi-agent settings directly.
We propose a simple yet effective method, Offline Multi-Agent RL with Actor Rectification (OMAR), to tackle this critical challenge.
OMAR significantly outperforms strong baselines with state-of-the-art performance in multi-agent continuous control benchmarks.
arXiv Detail & Related papers (2021-11-22T13:27:42Z) - MACRPO: Multi-Agent Cooperative Recurrent Policy Optimization [17.825845543579195]
We propose a new multi-agent actor-critic method called textitMulti-Agent Cooperative Recurrent Proximal Policy Optimization (MACRPO)
We use a recurrent layer in critic's network architecture and propose a new framework to use a meta-trajectory to train the recurrent layer.
We evaluate our algorithm on three challenging multi-agent environments with continuous and discrete action spaces.
arXiv Detail & Related papers (2021-09-02T12:43:35Z) - Multi-agent Policy Optimization with Approximatively Synchronous
Advantage Estimation [55.96893934962757]
In multi-agent system, polices of different agents need to be evaluated jointly.
In current methods, value functions or advantage functions use counter-factual joint actions which are evaluated asynchronously.
In this work, we propose the approximatively synchronous advantage estimation.
arXiv Detail & Related papers (2020-12-07T07:29:19Z) - Multiagent Value Iteration Algorithms in Dynamic Programming and
Reinforcement Learning [0.0]
We consider infinite horizon dynamic programming problems, where the control at each stage consists of several distinct decisions.
In an earlier work we introduced a policy iteration algorithm, where the policy improvement is done one-agent-at-a-time in a given order.
arXiv Detail & Related papers (2020-05-04T16:34:24Z) - FACMAC: Factored Multi-Agent Centralised Policy Gradients [103.30380537282517]
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC)
It is a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces.
We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2020-03-14T21:29:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.