Related papers: CCL: Collaborative Curriculum Learning for Sparse-Reward Multi-Agent Reinforcement Learning via Co-evolutionary Task Evolution

CCL: Collaborative Curriculum Learning for Sparse-Reward Multi-Agent Reinforcement Learning via Co-evolutionary Task Evolution

URL: http://arxiv.org/abs/2505.07854v1
Date: Thu, 08 May 2025 04:23:47 GMT
Title: CCL: Collaborative Curriculum Learning for Sparse-Reward Multi-Agent Reinforcement Learning via Co-evolutionary Task Evolution
Authors: Yufei Lin, Chengwei Ye, Huanzhen Zhang, Kangsheng Wang, Linuo Xu, Shuyan Liu, Zeyu Zhang,
Abstract summary: Sparse reward environments pose significant challenges in reinforcement learning, especially within multi-agent systems.<n>We propose Collaborative Multi-dimensional Course Learning (CCL), a novel curriculum learning framework that addresses this by (1) refining intermediate tasks for individual agents, (2) using a variational evolutionary algorithm to generate informative subtasks, and (3) co-evolving agents with their environment to enhance training stability.
Score: 4.0873807995771
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Sparse reward environments pose significant challenges in reinforcement learning, especially within multi-agent systems (MAS) where feedback is delayed and shared across agents, leading to suboptimal learning. We propose Collaborative Multi-dimensional Course Learning (CCL), a novel curriculum learning framework that addresses this by (1) refining intermediate tasks for individual agents, (2) using a variational evolutionary algorithm to generate informative subtasks, and (3) co-evolving agents with their environment to enhance training stability. Experiments on five cooperative tasks in the MPE and Hide-and-Seek environments show that CCL outperforms existing methods in sparse reward settings.

Related papers

Omni-Thinker: Scaling Cross-Domain Generalization in LLMs via Multi-Task RL with Hybrid Rewards [50.21528417884747]
We introduce Omni-Thinker, a unified reinforcement learning framework that enhances large language models (LLMs) performance across diverse tasks.<n>Our approach enables consistent optimization across task types and scales RL-based training to subjective domains.<n> Experimental results across four domains reveal that curriculum learning improves performance by 5.2% over joint training and 9.1% over model merging.
arXiv Detail & Related papers (2025-07-20T01:50:16Z)
Curriculum Learning With Counterfactual Group Relative Policy Advantage For Multi-Agent Reinforcement Learning [15.539607264374242]
Multi-agent reinforcement learning (MARL) has achieved strong performance in cooperative adversarial tasks.<n>We propose a dynamic curriculum learning framework that employs an self-adaptive difficulty adjustment mechanism.<n>Our method improves both training stability and final performance, achieving competitive results against state-of-the-art methods.
arXiv Detail & Related papers (2025-06-09T08:38:18Z)
Cross-Task Experiential Learning on LLM-based Multi-Agent Collaboration [63.90193684394165]
We introduce multi-agent cross-task experiential learning (MAEL), a novel framework that endows LLM-driven agents with explicit cross-task learning and experience accumulation.<n>During the experiential learning phase, we quantify the quality for each step in the task-solving workflow and store the resulting rewards.<n>During inference, agents retrieve high-reward, task-relevant experiences as few-shot examples to enhance the effectiveness of each reasoning step.
arXiv Detail & Related papers (2025-05-29T07:24:37Z)
Multi-Agent Collaboration via Evolving Orchestration [61.93162413517026]
Large language models (LLMs) have achieved remarkable results across diverse downstream tasks, but their monolithic nature restricts scalability and efficiency in complex problem-solving.<n>We propose a puppeteer-style paradigm for LLM-based multi-agent collaboration, where a central orchestrator dynamically directs agents in response to evolving task states.<n> Experiments on closed- and open-domain scenarios show that this method achieves superior performance with reduced computational costs.
arXiv Detail & Related papers (2025-05-26T07:02:17Z)
LERO: LLM-driven Evolutionary framework with Hybrid Rewards and Enhanced Observation for Multi-Agent Reinforcement Learning [4.343021413805699]
Multi-agent reinforcement learning (MARL) faces two critical bottlenecks distinct from single-agent RL.<n>We propose LERO, a framework integrating Large language models (LLMs) with evolutionary optimization to address these MARL-specific challenges.
arXiv Detail & Related papers (2025-03-25T06:28:42Z)
From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.<n>We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z)
Learning Reward Machines in Cooperative Multi-Agent Tasks [75.79805204646428]
This paper presents a novel approach to Multi-Agent Reinforcement Learning (MARL) It combines cooperative task decomposition with the learning of reward machines (RMs) encoding the structure of the sub-tasks. The proposed method helps deal with the non-Markovian nature of the rewards in partially observable environments.
arXiv Detail & Related papers (2023-03-24T15:12:28Z)
Towards Skilled Population Curriculum for Multi-Agent Reinforcement Learning [42.540853953923495]
We introduce a novel automatic curriculum learning framework, Skilled Population Curriculum (SPC), which adapts curriculum learning to multi-agent coordination. Specifically, we endow the student with population-invariant communication and a hierarchical skill set, allowing it to learn cooperation and behavior skills from distinct tasks with varying numbers of agents. We also analyze the inherent non-stationarity of this multi-agent automatic curriculum teaching problem and provide a corresponding regret bound.
arXiv Detail & Related papers (2023-02-07T12:30:52Z)
Non-local Policy Optimization via Diversity-regularized Collaborative Exploration [45.997521480637836]
We propose a novel non-local policy optimization framework called Diversity-regularized Collaborative Exploration (DiCE) DiCE utilizes a group of heterogeneous agents to explore the environment simultaneously and share the collected experiences. We implement the framework in both on-policy and off-policy settings and the experimental results show that DiCE can achieve substantial improvement over the baselines.
arXiv Detail & Related papers (2020-06-14T03:31:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.