Developing cooperative policies for multi-stage tasks
- URL: http://arxiv.org/abs/2007.00203v1
- Date: Wed, 1 Jul 2020 03:32:14 GMT
- Title: Developing cooperative policies for multi-stage tasks
- Authors: Jordan Erskine, Chris Lehnert
- Abstract summary: This paper proposes the Cooperative Soft Actor Critic (CSAC) method of enabling consecutive reinforcement learning agents to cooperatively solve a long time horizon multi-stage task.
CSAC achieved a success rate of at least 20% higher than the uncooperative policies, and converged on a solution at least 4 times faster than the single agent.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper proposes the Cooperative Soft Actor Critic (CSAC) method of
enabling consecutive reinforcement learning agents to cooperatively solve a
long time horizon multi-stage task. This method is achieved by modifying the
policy of each agent to maximise both the current and next agent's critic.
Cooperatively maximising each agent's critic allows each agent to take actions
that are beneficial for its task as well as subsequent tasks. Using this method
in a multi-room maze domain, the cooperative policies were able to outperform
both uncooperative policies as well as a single agent trained across the entire
domain. CSAC achieved a success rate of at least 20\% higher than the
uncooperative policies, and converged on a solution at least 4 times faster
than the single agent.
Related papers
- CaPo: Cooperative Plan Optimization for Efficient Embodied Multi-Agent Cooperation [98.11670473661587]
CaPo improves cooperation efficiency with two phases: 1) meta-plan generation, and 2) progress-adaptive meta-plan and execution.
Experimental results on the ThreeDworld Multi-Agent Transport and Communicative Watch-And-Help tasks demonstrate that CaPo achieves much higher task completion rate and efficiency compared with state-of-the-arts.
arXiv Detail & Related papers (2024-11-07T13:08:04Z) - Fully Decentralized Cooperative Multi-Agent Reinforcement Learning: A
Survey [48.77342627610471]
Cooperative multi-agent reinforcement learning is a powerful tool to solve many real-world cooperative tasks.
It is challenging to derive algorithms that can converge to the optimal joint policy in a fully decentralized setting.
arXiv Detail & Related papers (2024-01-10T05:07:42Z) - Policy Diversity for Cooperative Agents [8.689289576285095]
Multi-agent reinforcement learning aims to find the optimal team cooperative policy to complete a task.
There may exist multiple different ways of cooperating, which usually are very needed by domain experts.
Unfortunately, there is a general lack of effective policy diversity approaches specifically designed for the multi-agent domain.
arXiv Detail & Related papers (2023-08-28T05:23:16Z) - Minimum Coverage Sets for Training Robust Ad Hoc Teamwork Agents [39.19326531319873]
Existing Ad Hoc Teamwork (AHT) methods address this challenge by training an agent with a population of diverse teammate policies.
We introduce the L-BRDiv algorithm that generates a set of teammate policies that, when used for AHT training, encourage agents to emulate policies from the MCS.
We empirically demonstrate that L-BRDiv produces more robust AHT agents than state-of-the-art methods in a broader range of two-player cooperative problems.
arXiv Detail & Related papers (2023-08-18T14:45:22Z) - Learning Reward Machines in Cooperative Multi-Agent Tasks [75.79805204646428]
This paper presents a novel approach to Multi-Agent Reinforcement Learning (MARL)
It combines cooperative task decomposition with the learning of reward machines (RMs) encoding the structure of the sub-tasks.
The proposed method helps deal with the non-Markovian nature of the rewards in partially observable environments.
arXiv Detail & Related papers (2023-03-24T15:12:28Z) - Multi-agent Deep Covering Skill Discovery [50.812414209206054]
We propose Multi-agent Deep Covering Option Discovery, which constructs the multi-agent options through minimizing the expected cover time of the multiple agents' joint state space.
Also, we propose a novel framework to adopt the multi-agent options in the MARL process.
We show that the proposed algorithm can effectively capture the agent interactions with the attention mechanism, successfully identify multi-agent options, and significantly outperforms prior works using single-agent options or no options.
arXiv Detail & Related papers (2022-10-07T00:40:59Z) - Developing cooperative policies for multi-stage reinforcement learning
tasks [0.0]
Many hierarchical reinforcement learning algorithms utilise a series of independent skills as a basis to solve tasks at a higher level of reasoning.
This paper proposes the Cooperative Consecutive Policies (CCP) method of enabling consecutive agents to cooperatively solve long time horizon multi-stage tasks.
arXiv Detail & Related papers (2022-05-11T01:31:04Z) - Influencing Long-Term Behavior in Multiagent Reinforcement Learning [59.98329270954098]
We propose a principled framework for considering the limiting policies of other agents as the time approaches infinity.
Specifically, we develop a new optimization objective that maximizes each agent's average reward by directly accounting for the impact of its behavior on the limiting set of policies that other agents will take on.
Thanks to our farsighted evaluation, we demonstrate better long-term performance than state-of-the-art baselines in various domains.
arXiv Detail & Related papers (2022-03-07T17:32:35Z) - Coordinated Proximal Policy Optimization [28.780862892562308]
Coordinated Proximal Policy Optimization (CoPPO) is an algorithm that extends the original Proximal Policy Optimization (PPO) to the multi-agent setting.
We prove the monotonicity of policy improvement when optimizing a theoretically-grounded joint objective.
We then interpret that such an objective in CoPPO can achieve dynamic credit assignment among agents, thereby alleviating the high variance issue during the concurrent update of agent policies.
arXiv Detail & Related papers (2021-11-07T11:14:19Z) - HAVEN: Hierarchical Cooperative Multi-Agent Reinforcement Learning with
Dual Coordination Mechanism [17.993973801986677]
Multi-agent reinforcement learning often suffers from the exponentially larger action space caused by a large number of agents.
We propose a novel value decomposition framework HAVEN based on hierarchical reinforcement learning for the fully cooperative multi-agent problems.
arXiv Detail & Related papers (2021-10-14T10:43:47Z) - FACMAC: Factored Multi-Agent Centralised Policy Gradients [103.30380537282517]
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC)
It is a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces.
We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2020-03-14T21:29:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.