Latent Interactive A2C for Improved RL in Open Many-Agent Systems
- URL: http://arxiv.org/abs/2305.05159v1
- Date: Tue, 9 May 2023 04:03:40 GMT
- Title: Latent Interactive A2C for Improved RL in Open Many-Agent Systems
- Authors: Keyang He, Prashant Doshi, Bikramjit Banerjee
- Abstract summary: Interactive advantage actor critic (IA2C) engages in decentralized training and decentralized execution.
We present the latent IA2C that utilizes an encoder-decoder architecture to learn a latent representation of the hidden state and other agents' actions.
Our experiments in two domains -- each populated by many agents -- reveal that the latent IA2C significantly improves sample efficiency by reducing variance and converging faster.
- Score: 12.41853254173419
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There is a prevalence of multiagent reinforcement learning (MARL) methods
that engage in centralized training. But, these methods involve obtaining
various types of information from the other agents, which may not be feasible
in competitive or adversarial settings. A recent method, the interactive
advantage actor critic (IA2C), engages in decentralized training coupled with
decentralized execution, aiming to predict the other agents' actions from
possibly noisy observations. In this paper, we present the latent IA2C that
utilizes an encoder-decoder architecture to learn a latent representation of
the hidden state and other agents' actions. Our experiments in two domains --
each populated by many agents -- reveal that the latent IA2C significantly
improves sample efficiency by reducing variance and converging faster.
Additionally, we introduce open versions of these domains where the agent
population may change over time, and evaluate on these instances as well.
Related papers
- From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.
We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z) - MA2CL:Masked Attentive Contrastive Learning for Multi-Agent
Reinforcement Learning [128.19212716007794]
We propose an effective framework called textbfMulti-textbfAgent textbfMasked textbfAttentive textbfContrastive textbfLearning (MA2CL)
MA2CL encourages learning representation to be both temporal and agent-level predictive by reconstructing the masked agent observation in latent space.
Our method significantly improves the performance and sample efficiency of different MARL algorithms and outperforms other methods in various vision-based and state-based scenarios.
arXiv Detail & Related papers (2023-06-03T05:32:19Z) - MADiff: Offline Multi-agent Learning with Diffusion Models [79.18130544233794]
Diffusion model (DM) recently achieved huge success in various scenarios including offline reinforcement learning.
We propose MADiff, a novel generative multi-agent learning framework to tackle this problem.
Our experiments show the superior performance of MADiff compared to baseline algorithms in a wide range of multi-agent learning tasks.
arXiv Detail & Related papers (2023-05-27T02:14:09Z) - Beyond Rewards: a Hierarchical Perspective on Offline Multiagent
Behavioral Analysis [14.656957226255628]
We introduce a model-agnostic method for discovery of behavior clusters in multiagent domains.
Our framework makes no assumption about agents' underlying learning algorithms, does not require access to their latent states or models, and can be trained using entirely offline observational data.
arXiv Detail & Related papers (2022-06-17T23:07:33Z) - Recursive Reasoning Graph for Multi-Agent Reinforcement Learning [44.890087638530524]
Multi-agent reinforcement learning (MARL) provides an efficient way for simultaneously learning policies for multiple agents interacting with each other.
Existing algorithms can suffer from an inability to accurately anticipate the influence of self-actions on other agents.
The proposed algorithm, referred to as the Recursive Reasoning Graph (R2G), shows state-of-the-art performance on multiple multi-agent particle and robotics games.
arXiv Detail & Related papers (2022-03-06T00:57:50Z) - Explaining Reinforcement Learning Policies through Counterfactual
Trajectories [147.7246109100945]
A human developer must validate that an RL agent will perform well at test-time.
Our method conveys how the agent performs under distribution shifts by showing the agent's behavior across a wider trajectory distribution.
In a user study, we demonstrate that our method enables users to score better than baseline methods on one of two agent validation tasks.
arXiv Detail & Related papers (2022-01-29T00:52:37Z) - Effects of Smart Traffic Signal Control on Air Quality [0.0]
Multi-agent deep reinforcement learning (MARL) has been studied experimentally in traffic systems.
A recently developed multi-agent variant of the well-established advantage actor-critic (A2C) algorithm, called MA2C, exploits the promising idea of some communication among the agents.
In this view, the agents share their strategies with other neighbor agents, thereby stabilizing the learning process even when the agents grow in number and variety.
arXiv Detail & Related papers (2021-07-06T02:48:42Z) - SA-MATD3:Self-attention-based multi-agent continuous control method in
cooperative environments [12.959163198988536]
Existing algorithms suffer from the problem of uneven learning degree with the increase of the number of agents.
A new structure for a multi-agent actor critic is proposed, and the self-attention mechanism is applied in the critic network.
The proposed algorithm makes full use of the samples in the replay memory buffer to learn the behavior of a class of agents.
arXiv Detail & Related papers (2021-07-01T08:15:05Z) - Many Agent Reinforcement Learning Under Partial Observability [10.11960004698409]
We show that our instantiations can learn the optimal behavior in a broader class of agent networks than the mean-field method.
We show that our instantiations can learn the optimal behavior in a broader class of agent networks than the mean-field method.
arXiv Detail & Related papers (2021-06-17T21:24:29Z) - F2A2: Flexible Fully-decentralized Approximate Actor-critic for
Cooperative Multi-agent Reinforcement Learning [110.35516334788687]
Decentralized multi-agent reinforcement learning algorithms are sometimes unpractical in complicated applications.
We propose a flexible fully decentralized actor-critic MARL framework, which can handle large-scale general cooperative multi-agent setting.
Our framework can achieve scalability and stability for large-scale environment and reduce information transmission.
arXiv Detail & Related papers (2020-04-17T14:56:29Z) - Scalable Multi-Agent Inverse Reinforcement Learning via
Actor-Attention-Critic [54.2180984002807]
Multi-agent adversarial inverse reinforcement learning (MA-AIRL) is a recent approach that applies single-agent AIRL to multi-agent problems.
We propose a multi-agent inverse RL algorithm that is more sample-efficient and scalable than previous works.
arXiv Detail & Related papers (2020-02-24T20:30:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.