Related papers: Beyond Joint Demonstrations: Personalized Expert Guidance for Efficient Multi-Agent Reinforcement Learning

Beyond Joint Demonstrations: Personalized Expert Guidance for Efficient Multi-Agent Reinforcement Learning

URL: http://arxiv.org/abs/2403.08936v2
Date: Thu, 21 Nov 2024 21:31:36 GMT
Title: Beyond Joint Demonstrations: Personalized Expert Guidance for Efficient Multi-Agent Reinforcement Learning
Authors: Peihong Yu, Manav Mishra, Alec Koppel, Carl Busart, Priya Narayan, Dinesh Manocha, Amrit Bedi, Pratap Tokekar,
Abstract summary: We introduce a novel concept of personalized expert demonstrations, tailored for each individual agent or, more broadly, each individual type of agent within a heterogeneous team. These demonstrations solely pertain to single-agent behaviors and how each agent can achieve personal goals without encompassing any cooperative elements. We propose an approach that selectively utilizes personalized expert demonstrations as guidance and allows agents to learn to cooperate.
Score: 54.40927310957792
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multi-Agent Reinforcement Learning (MARL) algorithms face the challenge of efficient exploration due to the exponential increase in the size of the joint state-action space. While demonstration-guided learning has proven beneficial in single-agent settings, its direct applicability to MARL is hindered by the practical difficulty of obtaining joint expert demonstrations. In this work, we introduce a novel concept of personalized expert demonstrations, tailored for each individual agent or, more broadly, each individual type of agent within a heterogeneous team. These demonstrations solely pertain to single-agent behaviors and how each agent can achieve personal goals without encompassing any cooperative elements, thus naively imitating them will not achieve cooperation due to potential conflicts. To this end, we propose an approach that selectively utilizes personalized expert demonstrations as guidance and allows agents to learn to cooperate, namely personalized expert-guided MARL (PegMARL). This algorithm utilizes two discriminators: the first provides incentives based on the alignment of individual agent behavior with demonstrations, and the second regulates incentives based on whether the behaviors lead to the desired outcome. We evaluate PegMARL using personalized demonstrations in both discrete and continuous environments. The results demonstrate that PegMARL learns near-optimal policies even when provided with suboptimal demonstrations and outperforms state-of-the-art MARL algorithms in solving coordinated tasks. We also showcase PegMARL's capability of leveraging joint demonstrations in the StarCraft scenario and converging effectively even with demonstrations from non-co-trained policies.

Related papers

DICE: Dynamic In-Context Example Selection in LLM Agents via Efficient Knowledge Transfer [50.64531021352504]
Large language model-based agents, empowered by in-context learning (ICL), have demonstrated strong capabilities in complex reasoning and tool-use tasks.<n>Existing approaches typically rely on example selection, including in some agentic or multi-step settings.<n>We propose DICE, a theoretically grounded ICL framework for agentic tasks that selects the most relevant demonstrations at each step of reasoning.
arXiv Detail & Related papers (2025-07-31T13:42:14Z)
Learning Individual Intrinsic Reward in Multi-Agent Reinforcement Learning via Incorporating Generalized Human Expertise [6.441011477647557]
Efficient exploration in multi-agent reinforcement learning (MARL) is a challenging problem when receiving only a team reward.<n>A powerful method to mitigate this issue involves crafting dense individual rewards to guide the agents toward efficient exploration.<n>We propose a novel framework, LIGHT, which can integrate human knowledge into MARL algorithms in an end-to-end manner.
arXiv Detail & Related papers (2025-07-25T00:59:10Z)
SAND: Boosting LLM Agents with Self-Taught Action Deliberation [53.732649189709285]
Large Language Model (LLM) agents are commonly tuned with supervised finetuning on ReAct-style expert trajectories or preference optimization over pairwise rollouts.<n>We propose Self-taught ActioN Deliberation (SAND) framework, enabling LLM agents to explicitly deliberate over candidate actions before committing to one.<n>SAND achieves an average 20% improvement over initial supervised finetuning and also outperforms state-of-the-art agent tuning approaches.
arXiv Detail & Related papers (2025-07-10T05:38:15Z)
O-MAPL: Offline Multi-agent Preference Learning [5.4482836906033585]
Inferring reward functions from demonstrations is a key challenge in reinforcement learning (RL) We introduce a novel end-to-end preference-based learning framework for cooperative MARL. Our algorithm outperforms existing methods across various tasks.
arXiv Detail & Related papers (2025-01-31T08:08:20Z)
Imitation Learning from Suboptimal Demonstrations via Meta-Learning An Action Ranker [9.6508237676589]
A major bottleneck in imitation learning is the requirement of a large number of expert demonstrations. We propose a novel approach named imitation learning via meta-learning an action ranker (ILMAR) ILMAR implements weighted behavior cloning (weighted BC) on a limited set of expert demonstrations along with supplementary demonstrations.
arXiv Detail & Related papers (2024-12-28T16:06:44Z)
Watch Every Step! LLM Agent Learning via Iterative Step-Level Process Refinement [50.481380478458945]
Iterative step-level Process Refinement (IPR) framework provides detailed step-by-step guidance to enhance agent training. Our experiments on three complex agent tasks demonstrate that our framework outperforms a variety of strong baselines.
arXiv Detail & Related papers (2024-06-17T03:29:13Z)
A Generalized Apprenticeship Learning Framework for Modeling Heterogeneous Student Pedagogical Strategies [8.137664701386198]
We propose an expectation-maximization(EM)-EDM, a general AL framework to induce effective pedagogical policies from given optimal or near-optimal demonstrations. We compare the effectiveness of the policies induced by our proposed EM-EDM against four AL-based baselines and two policies induced by DRL.
arXiv Detail & Related papers (2024-06-04T16:14:55Z)
Skill Disentanglement for Imitation Learning from Suboptimal Demonstrations [60.241144377865716]
We consider the imitation of sub-optimal demonstrations, with both a small clean demonstration set and a large noisy set. We propose method by evaluating and imitating at the sub-demonstration level, encoding action primitives of varying quality into different skills.
arXiv Detail & Related papers (2023-06-13T17:24:37Z)
MA2CL:Masked Attentive Contrastive Learning for Multi-Agent Reinforcement Learning [128.19212716007794]
We propose an effective framework called textbfMulti-textbfAgent textbfMasked textbfAttentive textbfContrastive textbfLearning (MA2CL) MA2CL encourages learning representation to be both temporal and agent-level predictive by reconstructing the masked agent observation in latent space. Our method significantly improves the performance and sample efficiency of different MARL algorithms and outperforms other methods in various vision-based and state-based scenarios.
arXiv Detail & Related papers (2023-06-03T05:32:19Z)
Unlabeled Imperfect Demonstrations in Adversarial Imitation Learning [48.595574101874575]
In the real world, expert demonstrations are more likely to be imperfect. A positive-unlabeled adversarial imitation learning algorithm is developed. Agent policy will be optimized to cheat the discriminator and produce trajectories similar to those optimal expert demonstrations.
arXiv Detail & Related papers (2023-02-13T11:26:44Z)
ACE: Cooperative Multi-agent Q-learning with Bidirectional Action-Dependency [65.28061634546577]
Multi-agent reinforcement learning (MARL) suffers from the non-stationarity problem. In this paper, we propose bidirectional action-dependent Q-learning (ACE) ACE outperforms the state-of-the-art algorithms on Google Research Football and StarCraft Multi-Agent Challenge.
arXiv Detail & Related papers (2022-11-29T10:22:55Z)
Sample-Efficient Multi-Agent Reinforcement Learning with Demonstrations for Flocking Control [6.398557794102739]
Flocking control is a significant problem in multi-agent systems such as unmanned aerial vehicles and autonomous underwater vehicles. In contrast to traditional methods, multi-agent reinforcement learning (MARL) solves the problem of flocking control more flexibly. We propose a novel method Pretraining with Demonstrations for MARL (PwD-MARL), which can utilize non-expert demonstrations collected in advance with traditional methods to pretrain agents.
arXiv Detail & Related papers (2022-09-17T15:24:37Z)
Multi-Agent Imitation Learning with Copulas [102.27052968901894]
Multi-agent imitation learning aims to train multiple agents to perform tasks from demonstrations by learning a mapping between observations and actions. In this paper, we propose to use copula, a powerful statistical tool for capturing dependence among random variables, to explicitly model the correlation and coordination in multi-agent systems. Our proposed model is able to separately learn marginals that capture the local behavioral patterns of each individual agent, as well as a copula function that solely and fully captures the dependence structure among agents.
arXiv Detail & Related papers (2021-07-10T03:49:41Z)
Automatic Curricula via Expert Demonstrations [6.651864489482536]
We propose Automatic Curricula via Expert Demonstrations (ACED) as a reinforcement learning (RL) approach. ACED extracts curricula from expert demonstration trajectories by dividing demonstrations into sections and initializing training episodes to states sampled from different sections of demonstrations. We show that a combination of ACED with behavior cloning allows pick-and-place tasks to be learned with as few as 1 demonstration and block stacking tasks to be learned with 20 demonstrations.
arXiv Detail & Related papers (2021-06-16T22:21:09Z)
Celebrating Diversity in Shared Multi-Agent Reinforcement Learning [20.901606233349177]
Deep multi-agent reinforcement learning has shown the promise to solve complex cooperative tasks. In this paper, we aim to introduce diversity in both optimization and representation of shared multi-agent reinforcement learning. Our method achieves state-of-the-art performance on Google Research Football and super hard StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2021-06-04T00:55:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.