Variational Offline Multi-agent Skill Discovery
- URL: http://arxiv.org/abs/2405.16386v2
- Date: Tue, 15 Oct 2024 04:08:33 GMT
- Title: Variational Offline Multi-agent Skill Discovery
- Authors: Jiayu Chen, Bhargav Ganguly, Tian Lan, Vaneet Aggarwal,
- Abstract summary: We propose two novel auto-encoder schemes to simultaneously capture subgroup- and temporal-level abstractions and form multi-agent skills.
Our method can be applied to offline multi-task data, and the discovered subgroup skills can be transferred across relevant tasks without retraining.
- Score: 43.869625428099425
- License:
- Abstract: Skills are effective temporal abstractions established for sequential decision making, which enable efficient hierarchical learning for long-horizon tasks and facilitate multi-task learning through their transferability. Despite extensive research, research gaps remain in multi-agent scenarios, particularly for automatically extracting subgroup coordination patterns in a multi-agent task. In this case, we propose two novel auto-encoder schemes: VO-MASD-3D and VO-MASD-Hier, to simultaneously capture subgroup- and temporal-level abstractions and form multi-agent skills, which firstly solves the aforementioned challenge. An essential algorithm component of these schemes is a dynamic grouping function that can automatically detect latent subgroups based on agent interactions in a task. Our method can be applied to offline multi-task data, and the discovered subgroup skills can be transferred across relevant tasks without retraining. Empirical evaluations on StarCraft tasks indicate that our approach significantly outperforms existing hierarchical multi-agent reinforcement learning (MARL) methods. Moreover, skills discovered using our method can effectively reduce the learning difficulty in MARL scenarios with delayed and sparse reward signals.
Related papers
- Multi-Agent Transfer Learning via Temporal Contrastive Learning [8.487274986507922]
This paper introduces a novel transfer learning framework for deep multi-agent reinforcement learning.
The approach automatically combines goal-conditioned policies with temporal contrastive learning to discover meaningful sub-goals.
arXiv Detail & Related papers (2024-06-03T14:42:14Z) - Enabling Multi-Agent Transfer Reinforcement Learning via Scenario
Independent Representation [0.7366405857677227]
Multi-Agent Reinforcement Learning (MARL) algorithms are widely adopted in tackling complex tasks that require collaboration and competition among agents.
We introduce a novel framework that enables transfer learning for MARL through unifying various state spaces into fixed-size inputs.
We show significant enhancements in multi-agent learning performance using maneuvering skills learned from other scenarios compared to agents learning from scratch.
arXiv Detail & Related papers (2024-02-13T02:48:18Z) - Inverse Factorized Q-Learning for Cooperative Multi-agent Imitation
Learning [13.060023718506917]
imitation learning (IL) is a problem of learning to mimic expert behaviors from demonstrations in cooperative multi-agent systems.
We introduce a novel multi-agent IL algorithm designed to address these challenges.
Our approach enables the centralized learning by leveraging mixing networks to aggregate decentralized Q functions.
arXiv Detail & Related papers (2023-10-10T17:11:20Z) - Semantically Aligned Task Decomposition in Multi-Agent Reinforcement
Learning [56.26889258704261]
We propose a novel "disentangled" decision-making method, Semantically Aligned task decomposition in MARL (SAMA)
SAMA prompts pretrained language models with chain-of-thought that can suggest potential goals, provide suitable goal decomposition and subgoal allocation as well as self-reflection-based replanning.
SAMA demonstrates considerable advantages in sample efficiency compared to state-of-the-art ASG methods.
arXiv Detail & Related papers (2023-05-18T10:37:54Z) - Learning Complex Teamwork Tasks Using a Given Sub-task Decomposition [11.998708550268978]
We propose an approach which uses an expert-provided decomposition of a task into simpler multi-agent sub-tasks.
In each sub-task, a subset of the entire team is trained to acquire sub-task-specific policies.
The sub-teams are then merged and transferred to the target task, where their policies are collectively fine-tuned to solve the more complex target task.
arXiv Detail & Related papers (2023-02-09T21:24:56Z) - LDSA: Learning Dynamic Subtask Assignment in Cooperative Multi-Agent
Reinforcement Learning [122.47938710284784]
We propose a novel framework for learning dynamic subtask assignment (LDSA) in cooperative MARL.
To reasonably assign agents to different subtasks, we propose an ability-based subtask selection strategy.
We show that LDSA learns reasonable and effective subtask assignment for better collaboration.
arXiv Detail & Related papers (2022-05-05T10:46:16Z) - Skill-based Meta-Reinforcement Learning [65.31995608339962]
We devise a method that enables meta-learning on long-horizon, sparse-reward tasks.
Our core idea is to leverage prior experience extracted from offline datasets during meta-learning.
arXiv Detail & Related papers (2022-04-25T17:58:19Z) - Multi-Task Learning with Sequence-Conditioned Transporter Networks [67.57293592529517]
We aim to solve multi-task learning through the lens of sequence-conditioning and weighted sampling.
We propose a new suite of benchmark aimed at compositional tasks, MultiRavens, which allows defining custom task combinations.
Second, we propose a vision-based end-to-end system architecture, Sequence-Conditioned Transporter Networks, which augments Goal-Conditioned Transporter Networks with sequence-conditioning and weighted sampling.
arXiv Detail & Related papers (2021-09-15T21:19:11Z) - Multi-task Over-the-Air Federated Learning: A Non-Orthogonal
Transmission Approach [52.85647632037537]
We propose a multi-task over-theair federated learning (MOAFL) framework, where multiple learning tasks share edge devices for data collection and learning models under the coordination of a edge server (ES)
Both the convergence analysis and numerical results demonstrate that the MOAFL framework can significantly reduce the uplink bandwidth consumption of multiple tasks without causing substantial learning performance degradation.
arXiv Detail & Related papers (2021-06-27T13:09:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.