Asynchronous Actor-Critic for Multi-Agent Reinforcement Learning
- URL: http://arxiv.org/abs/2209.10113v1
- Date: Tue, 20 Sep 2022 16:36:23 GMT
- Title: Asynchronous Actor-Critic for Multi-Agent Reinforcement Learning
- Authors: Yuchen Xiao, Weihao Tan and Christopher Amato
- Abstract summary: Synchronizing decisions across multiple agents in realistic settings is problematic since it requires agents to wait for other agents to terminate and communicate about termination reliably.
We formulate a set of asynchronous multi-agent actor-critic methods that allow agents to directly optimize asynchronous policies in three standard training paradigms.
- Score: 19.540926205375857
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Synchronizing decisions across multiple agents in realistic settings is
problematic since it requires agents to wait for other agents to terminate and
communicate about termination reliably. Ideally, agents should learn and
execute asynchronously instead. Such asynchronous methods also allow temporally
extended actions that can take different amounts of time based on the situation
and action executed. Unfortunately, current policy gradient methods are not
applicable in asynchronous settings, as they assume that agents synchronously
reason about action selection at every time step. To allow asynchronous
learning and decision-making, we formulate a set of asynchronous multi-agent
actor-critic methods that allow agents to directly optimize asynchronous
policies in three standard training paradigms: decentralized learning,
centralized learning, and centralized training for decentralized execution.
Empirical results (in simulation and hardware) in a variety of realistic
domains demonstrate the superiority of our approaches in large multi-agent
problems and validate the effectiveness of our algorithms for learning
high-quality and asynchronous solutions.
Related papers
- ProAgent: Building Proactive Cooperative Agents with Large Language
Models [89.53040828210945]
ProAgent is a novel framework that harnesses large language models to create proactive agents.
ProAgent can analyze the present state, and infer the intentions of teammates from observations.
ProAgent exhibits a high degree of modularity and interpretability, making it easily integrated into various coordination scenarios.
arXiv Detail & Related papers (2023-08-22T10:36:56Z) - Asynchronous Decentralized Q-Learning: Two Timescale Analysis By
Persistence [0.0]
Non-stationarity is a fundamental challenge in multi-agent reinforcement learning (MARL), where agents update their behaviour as they learn.
Many theoretical advances in MARL avoid the challenge of non-stationarity by coordinating the policy updates of agents in various ways.
Synchronization enables analysis of many MARL algorithms via multi-timescale methods, but such synchrony is infeasible in many decentralized applications.
arXiv Detail & Related papers (2023-08-07T01:32:09Z) - Dealing With Non-stationarity in Decentralized Cooperative Multi-Agent
Deep Reinforcement Learning via Multi-Timescale Learning [15.935860288840466]
Decentralized cooperative deep reinforcement learning (MARL) can be a versatile learning framework.
One of the critical challenges in decentralized deep MARL is the non-stationarity of the learning environment when multiple agents are learning concurrently.
We propose a decentralized cooperative MARL algorithm based on multi-timescale learning.
arXiv Detail & Related papers (2023-02-06T14:10:53Z) - Macro-Action-Based Multi-Agent/Robot Deep Reinforcement Learning under
Partial Observability [4.111899441919164]
State-of-the-art multi-agent reinforcement learning (MARL) methods have provided promising solutions to a variety of complex problems.
We first propose a group of value-based RL approaches for MacDec-POMDPs.
We formulate a set of macro-action-based policy gradient algorithms under the three training paradigms.
arXiv Detail & Related papers (2022-09-20T21:13:51Z) - Multi-agent Policy Optimization with Approximatively Synchronous
Advantage Estimation [55.96893934962757]
In multi-agent system, polices of different agents need to be evaluated jointly.
In current methods, value functions or advantage functions use counter-factual joint actions which are evaluated asynchronously.
In this work, we propose the approximatively synchronous advantage estimation.
arXiv Detail & Related papers (2020-12-07T07:29:19Z) - A Cordial Sync: Going Beyond Marginal Policies for Multi-Agent Embodied
Tasks [111.34055449929487]
We introduce the novel task FurnMove in which agents work together to move a piece of furniture through a living room to a goal.
Unlike existing tasks, FurnMove requires agents to coordinate at every timestep.
We identify two challenges when training agents to complete FurnMove: existing decentralized action sampling procedures do not permit expressive joint action policies.
Using SYNC-policies and CORDIAL, our agents achieve a 58% completion rate on FurnMove, an impressive absolute gain of 25 percentage points over competitive decentralized baselines.
arXiv Detail & Related papers (2020-07-09T17:59:57Z) - F2A2: Flexible Fully-decentralized Approximate Actor-critic for
Cooperative Multi-agent Reinforcement Learning [110.35516334788687]
Decentralized multi-agent reinforcement learning algorithms are sometimes unpractical in complicated applications.
We propose a flexible fully decentralized actor-critic MARL framework, which can handle large-scale general cooperative multi-agent setting.
Our framework can achieve scalability and stability for large-scale environment and reduce information transmission.
arXiv Detail & Related papers (2020-04-17T14:56:29Z) - Dynamic Federated Learning [57.14673504239551]
Federated learning has emerged as an umbrella term for centralized coordination strategies in multi-agent environments.
We consider a federated learning model where at every iteration, a random subset of available agents perform local updates based on their data.
Under a non-stationary random walk model on the true minimizer for the aggregate optimization problem, we establish that the performance of the architecture is determined by three factors, namely, the data variability at each agent, the model variability across all agents, and a tracking term that is inversely proportional to the learning rate of the algorithm.
arXiv Detail & Related papers (2020-02-20T15:00:54Z) - Multi-Agent Interactions Modeling with Correlated Policies [53.38338964628494]
In this paper, we cast the multi-agent interactions modeling problem into a multi-agent imitation learning framework.
We develop a Decentralized Adrial Imitation Learning algorithm with Correlated policies (CoDAIL)
Various experiments demonstrate that CoDAIL can better regenerate complex interactions close to the demonstrators.
arXiv Detail & Related papers (2020-01-04T17:31:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.