Macro-Action-Based Multi-Agent/Robot Deep Reinforcement Learning under
Partial Observability
- URL: http://arxiv.org/abs/2209.10003v1
- Date: Tue, 20 Sep 2022 21:13:51 GMT
- Title: Macro-Action-Based Multi-Agent/Robot Deep Reinforcement Learning under
Partial Observability
- Authors: Yuchen Xiao
- Abstract summary: State-of-the-art multi-agent reinforcement learning (MARL) methods have provided promising solutions to a variety of complex problems.
We first propose a group of value-based RL approaches for MacDec-POMDPs.
We formulate a set of macro-action-based policy gradient algorithms under the three training paradigms.
- Score: 4.111899441919164
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The state-of-the-art multi-agent reinforcement learning (MARL) methods have
provided promising solutions to a variety of complex problems. Yet, these
methods all assume that agents perform synchronized primitive-action executions
so that they are not genuinely scalable to long-horizon real-world
multi-agent/robot tasks that inherently require agents/robots to asynchronously
reason about high-level action selection at varying time durations. The
Macro-Action Decentralized Partially Observable Markov Decision Process
(MacDec-POMDP) is a general formalization for asynchronous decision-making
under uncertainty in fully cooperative multi-agent tasks. In this thesis, we
first propose a group of value-based RL approaches for MacDec-POMDPs, where
agents are allowed to perform asynchronous learning and decision-making with
macro-action-value functions in three paradigms: decentralized learning and
control, centralized learning and control, and centralized training for
decentralized execution (CTDE). Building on the above work, we formulate a set
of macro-action-based policy gradient algorithms under the three training
paradigms, where agents are allowed to directly optimize their parameterized
policies in an asynchronous manner. We evaluate our methods both in simulation
and on real robots over a variety of realistic domains. Empirical results
demonstrate the superiority of our approaches in large multi-agent problems and
validate the effectiveness of our algorithms for learning high-quality and
asynchronous solutions with macro-actions.
Related papers
- From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.
We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z) - ROMA-iQSS: An Objective Alignment Approach via State-Based Value Learning and ROund-Robin Multi-Agent Scheduling [44.276285521929424]
We introduce a decentralized state-based value learning algorithm that enables agents to independently discover optimal states.
Our theoretical analysis shows that our approach leads decentralized agents to an optimal collective policy.
Empirical experiments further demonstrate that our method outperforms existing decentralized state-based and action-based value learning strategies.
arXiv Detail & Related papers (2024-04-05T09:39:47Z) - Effective Multi-Agent Deep Reinforcement Learning Control with Relative
Entropy Regularization [6.441951360534903]
Multi-Agent Continuous Dynamic Policy Gradient (MACDPP) was proposed to tackle the issues of limited capability and sample efficiency in various scenarios controlled by multiple agents.
It alleviates the inconsistency of multiple agents' policy updates by introducing the relative entropy regularization to the Training with Decentralized Execution (CTDE) framework with the Actor-Critic (AC) structure.
arXiv Detail & Related papers (2023-09-26T07:38:19Z) - Asynchronous Actor-Critic for Multi-Agent Reinforcement Learning [19.540926205375857]
Synchronizing decisions across multiple agents in realistic settings is problematic since it requires agents to wait for other agents to terminate and communicate about termination reliably.
We formulate a set of asynchronous multi-agent actor-critic methods that allow agents to directly optimize asynchronous policies in three standard training paradigms.
arXiv Detail & Related papers (2022-09-20T16:36:23Z) - Policy Diagnosis via Measuring Role Diversity in Cooperative Multi-agent
RL [107.58821842920393]
We quantify the agent's behavior difference and build its relationship with the policy performance via bf Role Diversity
We find that the error bound in MARL can be decomposed into three parts that have a strong relation to the role diversity.
The decomposed factors can significantly impact policy optimization on three popular directions.
arXiv Detail & Related papers (2022-06-01T04:58:52Z) - Emergence of Theory of Mind Collaboration in Multiagent Systems [65.97255691640561]
We propose an adaptive training algorithm to develop effective collaboration between agents with ToM.
We evaluate our algorithms with two games, where our algorithm surpasses all previous decentralized execution algorithms without modeling ToM.
arXiv Detail & Related papers (2021-09-30T23:28:00Z) - Dif-MAML: Decentralized Multi-Agent Meta-Learning [54.39661018886268]
We propose a cooperative multi-agent meta-learning algorithm, referred to as MAML or Dif-MAML.
We show that the proposed strategy allows a collection of agents to attain agreement at a linear rate and to converge to a stationary point of the aggregate MAML.
Simulation results illustrate the theoretical findings and the superior performance relative to the traditional non-cooperative setting.
arXiv Detail & Related papers (2020-10-06T16:51:09Z) - Macro-Action-Based Deep Multi-Agent Reinforcement Learning [17.73081797556005]
This paper proposes two Deep Q-Network (DQN) based methods for learning decentralized and centralized macro-action-value functions.
Evaluations on benchmark problems and a larger domain demonstrate the advantage of learning with macro-actions over primitive-actions.
arXiv Detail & Related papers (2020-04-18T15:46:38Z) - F2A2: Flexible Fully-decentralized Approximate Actor-critic for
Cooperative Multi-agent Reinforcement Learning [110.35516334788687]
Decentralized multi-agent reinforcement learning algorithms are sometimes unpractical in complicated applications.
We propose a flexible fully decentralized actor-critic MARL framework, which can handle large-scale general cooperative multi-agent setting.
Our framework can achieve scalability and stability for large-scale environment and reduce information transmission.
arXiv Detail & Related papers (2020-04-17T14:56:29Z) - FACMAC: Factored Multi-Agent Centralised Policy Gradients [103.30380537282517]
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC)
It is a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces.
We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2020-03-14T21:29:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.