Semantically Aligned Task Decomposition in Multi-Agent Reinforcement
Learning
- URL: http://arxiv.org/abs/2305.10865v2
- Date: Sat, 30 Sep 2023 08:27:28 GMT
- Title: Semantically Aligned Task Decomposition in Multi-Agent Reinforcement
Learning
- Authors: Wenhao Li, Dan Qiao, Baoxiang Wang, Xiangfeng Wang, Bo Jin and
Hongyuan Zha
- Abstract summary: We propose a novel "disentangled" decision-making method, Semantically Aligned task decomposition in MARL (SAMA)
SAMA prompts pretrained language models with chain-of-thought that can suggest potential goals, provide suitable goal decomposition and subgoal allocation as well as self-reflection-based replanning.
SAMA demonstrates considerable advantages in sample efficiency compared to state-of-the-art ASG methods.
- Score: 56.26889258704261
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The difficulty of appropriately assigning credit is particularly heightened
in cooperative MARL with sparse reward, due to the concurrent time and
structural scales involved. Automatic subgoal generation (ASG) has recently
emerged as a viable MARL approach inspired by utilizing subgoals in
intrinsically motivated reinforcement learning. However, end-to-end learning of
complex task planning from sparse rewards without prior knowledge, undoubtedly
requires massive training samples. Moreover, the diversity-promoting nature of
existing ASG methods can lead to the "over-representation" of subgoals,
generating numerous spurious subgoals of limited relevance to the actual task
reward and thus decreasing the sample efficiency of the algorithm. To address
this problem and inspired by the disentangled representation learning, we
propose a novel "disentangled" decision-making method, Semantically Aligned
task decomposition in MARL (SAMA), that prompts pretrained language models with
chain-of-thought that can suggest potential goals, provide suitable goal
decomposition and subgoal allocation as well as self-reflection-based
replanning. Additionally, SAMA incorporates language-grounded RL to train each
agent's subgoal-conditioned policy. SAMA demonstrates considerable advantages
in sample efficiency compared to state-of-the-art ASG methods, as evidenced by
its performance on two challenging sparse-reward tasks, Overcooked and MiniRTS.
Related papers
- Sample-Efficient Reinforcement Learning with Temporal Logic Objectives: Leveraging the Task Specification to Guide Exploration [13.053013407015628]
This paper addresses the problem of learning optimal control policies for systems with uncertain dynamics.
We propose an accelerated RL algorithm that can learn control policies significantly faster than competitive approaches.
arXiv Detail & Related papers (2024-10-16T00:53:41Z) - Directed Exploration in Reinforcement Learning from Linear Temporal Logic [59.707408697394534]
Linear temporal logic (LTL) is a powerful language for task specification in reinforcement learning.
We show that the synthesized reward signal remains fundamentally sparse, making exploration challenging.
We show how better exploration can be achieved by further leveraging the specification and casting its corresponding Limit Deterministic B"uchi Automaton (LDBA) as a Markov reward process.
arXiv Detail & Related papers (2024-08-18T14:25:44Z) - Gradual Divergence for Seamless Adaptation: A Novel Domain Incremental Learning Method [19.751735234229972]
Domain incremental learning (DIL) poses a significant challenge in real-world scenarios.
Mitigating representation drift, which refers to the phenomenon of learned representations undergoing changes as the model adapts to new tasks, can help alleviate catastrophic forgetting.
We propose a novel DIL method named DARE, featuring a three-stage training process: Divergence, Adaptation, and REfinement.
arXiv Detail & Related papers (2024-06-23T22:05:52Z) - World Models with Hints of Large Language Models for Goal Achieving [56.91610333715712]
Reinforcement learning struggles in the face of long-horizon tasks and sparse goals.
Inspired by human cognition, we propose a new multi-modal model-based RL approach named Dreaming with Large Language Models (M).DLL.M integrates the proposed hinting subgoals into the model rollouts to encourage goal discovery and reaching in challenging tasks.
arXiv Detail & Related papers (2024-06-11T15:49:08Z) - Variational Offline Multi-agent Skill Discovery [43.869625428099425]
We propose two novel auto-encoder schemes to simultaneously capture subgroup- and temporal-level abstractions and form multi-agent skills.
Our method can be applied to offline multi-task data, and the discovered subgroup skills can be transferred across relevant tasks without retraining.
arXiv Detail & Related papers (2024-05-26T00:24:46Z) - Task-Distributionally Robust Data-Free Meta-Learning [99.56612787882334]
Data-Free Meta-Learning (DFML) aims to efficiently learn new tasks by leveraging multiple pre-trained models without requiring their original training data.
For the first time, we reveal two major challenges hindering their practical deployments: Task-Distribution Shift ( TDS) and Task-Distribution Corruption (TDC)
arXiv Detail & Related papers (2023-11-23T15:46:54Z) - Meta-Reinforcement Learning Based on Self-Supervised Task Representation
Learning [23.45043290237396]
MoSS is a context-based Meta-reinforcement learning algorithm based on Self-Supervised task representation learning.
On MuJoCo and Meta-World benchmarks, MoSS outperforms prior in terms of performance, sample efficiency (3-50x faster), adaptation efficiency, and generalization.
arXiv Detail & Related papers (2023-04-29T15:46:19Z) - Automaton-Guided Curriculum Generation for Reinforcement Learning Agents [14.20447398253189]
Automaton-guided Curriculum Learning (AGCL) is a novel method for automatically generating curricula for the target task in the form of Directed Acyclic Graphs (DAGs)
AGCL encodes the specification in the form of a deterministic finite automaton (DFA), and then uses the DFA along with the Object-Oriented MDP representation to generate a curriculum as a DAG.
Experiments in gridworld and physics-based simulated robotics domains show that the curricula produced by AGCL achieve improved time-to-threshold performance.
arXiv Detail & Related papers (2023-04-11T15:14:31Z) - MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven
Reinforcement Learning [65.52675802289775]
We show that an uncertainty aware classifier can solve challenging reinforcement learning problems.
We propose a novel method for computing the normalized maximum likelihood (NML) distribution.
We show that the resulting algorithm has a number of intriguing connections to both count-based exploration methods and prior algorithms for learning reward functions.
arXiv Detail & Related papers (2021-07-15T08:19:57Z) - Meta Reinforcement Learning with Autonomous Inference of Subtask
Dependencies [57.27944046925876]
We propose and address a novel few-shot RL problem, where a task is characterized by a subtask graph.
Instead of directly learning a meta-policy, we develop a Meta-learner with Subtask Graph Inference.
Our experiment results on two grid-world domains and StarCraft II environments show that the proposed method is able to accurately infer the latent task parameter.
arXiv Detail & Related papers (2020-01-01T17:34:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.