Multi-agent Policy Optimization with Approximatively Synchronous
Advantage Estimation
- URL: http://arxiv.org/abs/2012.03488v3
- Date: Sat, 8 May 2021 07:13:37 GMT
- Title: Multi-agent Policy Optimization with Approximatively Synchronous
Advantage Estimation
- Authors: Lipeng Wan, Xuwei Song, Xuguang Lan, Nanning Zheng
- Abstract summary: In multi-agent system, polices of different agents need to be evaluated jointly.
In current methods, value functions or advantage functions use counter-factual joint actions which are evaluated asynchronously.
In this work, we propose the approximatively synchronous advantage estimation.
- Score: 55.96893934962757
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Cooperative multi-agent tasks require agents to deduce their own
contributions with shared global rewards, known as the challenge of credit
assignment. General methods for policy based multi-agent reinforcement learning
to solve the challenge introduce differentiate value functions or advantage
functions for individual agents. In multi-agent system, polices of different
agents need to be evaluated jointly. In order to update polices synchronously,
such value functions or advantage functions also need synchronous evaluation.
However, in current methods, value functions or advantage functions use
counter-factual joint actions which are evaluated asynchronously, thus suffer
from natural estimation bias. In this work, we propose the approximatively
synchronous advantage estimation. We first derive the marginal advantage
function, an expansion from single-agent advantage function to multi-agent
system. Further more, we introduce a policy approximation for synchronous
advantage estimation, and break down the multi-agent policy optimization
problem into multiple sub-problems of single-agent policy optimization. Our
method is compared with baseline algorithms on StarCraft multi-agent
challenges, and shows the best performance on most of the tasks.
Related papers
- On the Hardness of Decentralized Multi-Agent Policy Evaluation under Byzantine Attacks [12.696705862929337]
We study a fully-decentralized multi-agent policy evaluation problem in the presence of up to $f$ faulty agents.
In particular, we focus on the so-called Byzantine faulty model with model poisoning setting.
arXiv Detail & Related papers (2024-09-19T16:27:08Z) - Distributed Optimization via Kernelized Multi-armed Bandits [6.04275169308491]
We model a distributed optimization problem as a multi-agent kernelized multi-armed bandit problem with a heterogeneous reward setting.
We present a fully decentralized algorithm, Multi-agent IGP-UCB (MA-IGP-UCB), which achieves a sub-linear regret bound for popular classes for kernels.
We also propose an extension, Multi-agent Delayed IGP-UCB (MAD-IGP-UCB) algorithm, which reduces the dependence of the regret bound on the number of agents in the network.
arXiv Detail & Related papers (2023-12-07T21:57:48Z) - Local Optimization Achieves Global Optimality in Multi-Agent
Reinforcement Learning [139.53668999720605]
We present a multi-agent PPO algorithm in which the local policy of each agent is updated similarly to vanilla PPO.
We prove that with standard regularity conditions on the Markov game and problem-dependent quantities, our algorithm converges to the globally optimal policy at a sublinear rate.
arXiv Detail & Related papers (2023-05-08T16:20:03Z) - Multi-agent Deep Covering Skill Discovery [50.812414209206054]
We propose Multi-agent Deep Covering Option Discovery, which constructs the multi-agent options through minimizing the expected cover time of the multiple agents' joint state space.
Also, we propose a novel framework to adopt the multi-agent options in the MARL process.
We show that the proposed algorithm can effectively capture the agent interactions with the attention mechanism, successfully identify multi-agent options, and significantly outperforms prior works using single-agent options or no options.
arXiv Detail & Related papers (2022-10-07T00:40:59Z) - MACRPO: Multi-Agent Cooperative Recurrent Policy Optimization [17.825845543579195]
We propose a new multi-agent actor-critic method called textitMulti-Agent Cooperative Recurrent Proximal Policy Optimization (MACRPO)
We use a recurrent layer in critic's network architecture and propose a new framework to use a meta-trajectory to train the recurrent layer.
We evaluate our algorithm on three challenging multi-agent environments with continuous and discrete action spaces.
arXiv Detail & Related papers (2021-09-02T12:43:35Z) - Simple Agent, Complex Environment: Efficient Reinforcement Learning with
Agent State [35.69801203107371]
We design a simple reinforcement learning agent that can operate in any environment.
The agent maintains only visitation counts and value estimates for each agent-state-action pair.
There is no further dependence on the number of environment states or mixing times associated with other policies or statistics of history.
arXiv Detail & Related papers (2021-02-10T04:53:12Z) - Modeling the Interaction between Agents in Cooperative Multi-Agent
Reinforcement Learning [2.9360071145551068]
We propose a novel cooperative MARL algorithm named as interactive actor-critic(IAC)
IAC models the interaction of agents from perspectives of policy and value function.
We extend the value decomposition methods to continuous control tasks and evaluate IAC on benchmark tasks including classic control and multi-agent particle environments.
arXiv Detail & Related papers (2021-02-10T01:58:28Z) - Randomized Entity-wise Factorization for Multi-Agent Reinforcement
Learning [59.62721526353915]
Multi-agent settings in the real world often involve tasks with varying types and quantities of agents and non-agent entities.
Our method aims to leverage these commonalities by asking the question: What is the expected utility of each agent when only considering a randomly selected sub-group of its observed entities?''
arXiv Detail & Related papers (2020-06-07T18:28:41Z) - FACMAC: Factored Multi-Agent Centralised Policy Gradients [103.30380537282517]
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC)
It is a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces.
We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2020-03-14T21:29:09Z) - Scalable Multi-Agent Inverse Reinforcement Learning via
Actor-Attention-Critic [54.2180984002807]
Multi-agent adversarial inverse reinforcement learning (MA-AIRL) is a recent approach that applies single-agent AIRL to multi-agent problems.
We propose a multi-agent inverse RL algorithm that is more sample-efficient and scalable than previous works.
arXiv Detail & Related papers (2020-02-24T20:30:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.