Curriculum Learning for Relative Overgeneralization
- URL: http://arxiv.org/abs/2212.02733v2
- Date: Mon, 15 May 2023 05:06:17 GMT
- Title: Curriculum Learning for Relative Overgeneralization
- Authors: Lin Shi and Bei Peng
- Abstract summary: In multi-agent reinforcement learning (MARL), many popular methods are susceptible to a critical multi-agent pathology known as relative overgeneralization (RO)
RO arises when the optimal joint action's utility falls below that of a sub-optimal joint action in cooperative tasks.
We propose curriculum learning for relative overgeneralization (CURO) to better overcome RO.
- Score: 10.30259249058635
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In multi-agent reinforcement learning (MARL), many popular methods, such as
VDN and QMIX, are susceptible to a critical multi-agent pathology known as
relative overgeneralization (RO), which arises when the optimal joint action's
utility falls below that of a sub-optimal joint action in cooperative tasks. RO
can cause the agents to get stuck into local optima or fail to solve
cooperative tasks that require significant coordination between agents within a
given timestep. Recent value-based MARL algorithms such as QPLEX and WQMIX can
overcome RO to some extent. However, our experimental results show that they
can still fail to solve cooperative tasks that exhibit strong RO. In this work,
we propose a novel approach called curriculum learning for relative
overgeneralization (CURO) to better overcome RO. To solve a target task that
exhibits strong RO, in CURO, we first fine-tune the reward function of the
target task to generate source tasks that are tailored to the current ability
of the learning agent and train the agent on these source tasks first. Then, to
effectively transfer the knowledge acquired in one task to the next, we use a
transfer learning method that combines value function transfer with buffer
transfer, which enables more efficient exploration in the target task. We
demonstrate that, when applied to QMIX, CURO overcomes severe RO problem and
significantly improves performance, yielding state-of-the-art results in a
variety of cooperative multi-agent tasks, including the challenging StarCraft
II micromanagement benchmarks.
Related papers
- From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.
We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z) - Multi-Agent Reinforcement Learning with a Hierarchy of Reward Machines [5.600971575680638]
We study the cooperative Multi-Agent Reinforcement Learning (MARL) problems using Reward Machines (RMs)
We present Multi-Agent Reinforcement Learning with a hierarchy of RMs (MAHRM) that is capable of dealing with more complex scenarios.
Experimental results in three cooperative MARL domains show that MAHRM outperforms other MARL methods using the same prior knowledge of high-level events.
arXiv Detail & Related papers (2024-03-08T06:38:22Z) - Joint Intrinsic Motivation for Coordinated Exploration in Multi-Agent
Deep Reinforcement Learning [0.0]
We propose an approach for rewarding strategies where agents collectively exhibit novel behaviors.
Jim rewards joint trajectories based on a centralized measure of novelty designed to function in continuous environments.
Results show that joint exploration is crucial for solving tasks where the optimal strategy requires a high level of coordination.
arXiv Detail & Related papers (2024-02-06T13:02:00Z) - Learning Reward Machines in Cooperative Multi-Agent Tasks [75.79805204646428]
This paper presents a novel approach to Multi-Agent Reinforcement Learning (MARL)
It combines cooperative task decomposition with the learning of reward machines (RMs) encoding the structure of the sub-tasks.
The proposed method helps deal with the non-Markovian nature of the rewards in partially observable environments.
arXiv Detail & Related papers (2023-03-24T15:12:28Z) - Learning Action Translator for Meta Reinforcement Learning on
Sparse-Reward Tasks [56.63855534940827]
This work introduces a novel objective function to learn an action translator among training tasks.
We theoretically verify that the value of the transferred policy with the action translator can be close to the value of the source policy.
We propose to combine the action translator with context-based meta-RL algorithms for better data collection and more efficient exploration during meta-training.
arXiv Detail & Related papers (2022-07-19T04:58:06Z) - LDSA: Learning Dynamic Subtask Assignment in Cooperative Multi-Agent
Reinforcement Learning [122.47938710284784]
We propose a novel framework for learning dynamic subtask assignment (LDSA) in cooperative MARL.
To reasonably assign agents to different subtasks, we propose an ability-based subtask selection strategy.
We show that LDSA learns reasonable and effective subtask assignment for better collaboration.
arXiv Detail & Related papers (2022-05-05T10:46:16Z) - Softmax with Regularization: Better Value Estimation in Multi-Agent
Reinforcement Learning [72.28520951105207]
Overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning.
We propose a novel regularization-based update scheme that penalizes large joint action-values deviating from a baseline.
We show that our method provides a consistent performance improvement on a set of challenging StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2021-03-22T14:18:39Z) - UneVEn: Universal Value Exploration for Multi-Agent Reinforcement
Learning [53.73686229912562]
We propose a novel MARL approach called Universal Value Exploration (UneVEn)
UneVEn learns a set of related tasks simultaneously with a linear decomposition of universal successor features.
Empirical results on a set of exploration games, challenging cooperative predator-prey tasks requiring significant coordination among agents, and StarCraft II micromanagement benchmarks show that UneVEn can solve tasks where other state-of-the-art MARL methods fail.
arXiv Detail & Related papers (2020-10-06T19:08:47Z) - Reward Machines for Cooperative Multi-Agent Reinforcement Learning [30.84689303706561]
In cooperative multi-agent reinforcement learning, a collection of agents learns to interact in a shared environment to achieve a common goal.
We propose the use of reward machines (RM) -- Mealy machines used as structured representations of reward functions -- to encode the team's task.
The proposed novel interpretation of RMs in the multi-agent setting explicitly encodes required teammate interdependencies, allowing the team-level task to be decomposed into sub-tasks for individual agents.
arXiv Detail & Related papers (2020-07-03T23:08:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.