Curriculum Learning for Relative Overgeneralization
- URL: http://arxiv.org/abs/2212.02733v2
- Date: Mon, 15 May 2023 05:06:17 GMT
- Title: Curriculum Learning for Relative Overgeneralization
- Authors: Lin Shi and Bei Peng
- Abstract summary: In multi-agent reinforcement learning (MARL), many popular methods are susceptible to a critical multi-agent pathology known as relative overgeneralization (RO)
RO arises when the optimal joint action's utility falls below that of a sub-optimal joint action in cooperative tasks.
We propose curriculum learning for relative overgeneralization (CURO) to better overcome RO.
- Score: 10.30259249058635
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In multi-agent reinforcement learning (MARL), many popular methods, such as
VDN and QMIX, are susceptible to a critical multi-agent pathology known as
relative overgeneralization (RO), which arises when the optimal joint action's
utility falls below that of a sub-optimal joint action in cooperative tasks. RO
can cause the agents to get stuck into local optima or fail to solve
cooperative tasks that require significant coordination between agents within a
given timestep. Recent value-based MARL algorithms such as QPLEX and WQMIX can
overcome RO to some extent. However, our experimental results show that they
can still fail to solve cooperative tasks that exhibit strong RO. In this work,
we propose a novel approach called curriculum learning for relative
overgeneralization (CURO) to better overcome RO. To solve a target task that
exhibits strong RO, in CURO, we first fine-tune the reward function of the
target task to generate source tasks that are tailored to the current ability
of the learning agent and train the agent on these source tasks first. Then, to
effectively transfer the knowledge acquired in one task to the next, we use a
transfer learning method that combines value function transfer with buffer
transfer, which enables more efficient exploration in the target task. We
demonstrate that, when applied to QMIX, CURO overcomes severe RO problem and
significantly improves performance, yielding state-of-the-art results in a
variety of cooperative multi-agent tasks, including the challenging StarCraft
II micromanagement benchmarks.
Related papers
- Multi-Agent Reinforcement Learning with a Hierarchy of Reward Machines [5.600971575680638]
We study the cooperative Multi-Agent Reinforcement Learning (MARL) problems using Reward Machines (RMs)
We present Multi-Agent Reinforcement Learning with a hierarchy of RMs (MAHRM) that is capable of dealing with more complex scenarios.
Experimental results in three cooperative MARL domains show that MAHRM outperforms other MARL methods using the same prior knowledge of high-level events.
arXiv Detail & Related papers (2024-03-08T06:38:22Z) - Joint Intrinsic Motivation for Coordinated Exploration in Multi-Agent
Deep Reinforcement Learning [0.0]
We propose an approach for rewarding strategies where agents collectively exhibit novel behaviors.
Jim rewards joint trajectories based on a centralized measure of novelty designed to function in continuous environments.
Results show that joint exploration is crucial for solving tasks where the optimal strategy requires a high level of coordination.
arXiv Detail & Related papers (2024-02-06T13:02:00Z) - Provable Benefits of Multi-task RL under Non-Markovian Decision Making
Processes [56.714690083118406]
In multi-task reinforcement learning (RL) under Markov decision processes (MDPs), the presence of shared latent structures has been shown to yield significant benefits to the sample efficiency compared to single-task RL.
We investigate whether such a benefit can extend to more general sequential decision making problems, such as partially observable MDPs (POMDPs) and more general predictive state representations (PSRs)
We propose a provably efficient algorithm UMT-PSR for finding near-optimal policies for all PSRs, and demonstrate that the advantage of multi-task learning manifests if the joint model class of PSR
arXiv Detail & Related papers (2023-10-20T14:50:28Z) - Learning Reward Machines in Cooperative Multi-Agent Tasks [75.79805204646428]
This paper presents a novel approach to Multi-Agent Reinforcement Learning (MARL)
It combines cooperative task decomposition with the learning of reward machines (RMs) encoding the structure of the sub-tasks.
The proposed method helps deal with the non-Markovian nature of the rewards in partially observable environments.
arXiv Detail & Related papers (2023-03-24T15:12:28Z) - ForkMerge: Mitigating Negative Transfer in Auxiliary-Task Learning [59.08197876733052]
Auxiliary-Task Learning (ATL) aims to improve the performance of the target task by leveraging the knowledge obtained from related tasks.
Sometimes, learning multiple tasks simultaneously results in lower accuracy than learning only the target task, known as negative transfer.
ForkMerge is a novel approach that periodically forks the model into multiple branches, automatically searches the varying task weights.
arXiv Detail & Related papers (2023-01-30T02:27:02Z) - LDSA: Learning Dynamic Subtask Assignment in Cooperative Multi-Agent
Reinforcement Learning [122.47938710284784]
We propose a novel framework for learning dynamic subtask assignment (LDSA) in cooperative MARL.
To reasonably assign agents to different subtasks, we propose an ability-based subtask selection strategy.
We show that LDSA learns reasonable and effective subtask assignment for better collaboration.
arXiv Detail & Related papers (2022-05-05T10:46:16Z) - Softmax with Regularization: Better Value Estimation in Multi-Agent
Reinforcement Learning [72.28520951105207]
Overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning.
We propose a novel regularization-based update scheme that penalizes large joint action-values deviating from a baseline.
We show that our method provides a consistent performance improvement on a set of challenging StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2021-03-22T14:18:39Z) - UneVEn: Universal Value Exploration for Multi-Agent Reinforcement
Learning [53.73686229912562]
We propose a novel MARL approach called Universal Value Exploration (UneVEn)
UneVEn learns a set of related tasks simultaneously with a linear decomposition of universal successor features.
Empirical results on a set of exploration games, challenging cooperative predator-prey tasks requiring significant coordination among agents, and StarCraft II micromanagement benchmarks show that UneVEn can solve tasks where other state-of-the-art MARL methods fail.
arXiv Detail & Related papers (2020-10-06T19:08:47Z) - Reward Machines for Cooperative Multi-Agent Reinforcement Learning [30.84689303706561]
In cooperative multi-agent reinforcement learning, a collection of agents learns to interact in a shared environment to achieve a common goal.
We propose the use of reward machines (RM) -- Mealy machines used as structured representations of reward functions -- to encode the team's task.
The proposed novel interpretation of RMs in the multi-agent setting explicitly encodes required teammate interdependencies, allowing the team-level task to be decomposed into sub-tasks for individual agents.
arXiv Detail & Related papers (2020-07-03T23:08:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.