Semantically Aligned Task Decomposition in Multi-Agent Reinforcement
Learning
- URL: http://arxiv.org/abs/2305.10865v2
- Date: Sat, 30 Sep 2023 08:27:28 GMT
- Title: Semantically Aligned Task Decomposition in Multi-Agent Reinforcement
Learning
- Authors: Wenhao Li, Dan Qiao, Baoxiang Wang, Xiangfeng Wang, Bo Jin and
Hongyuan Zha
- Abstract summary: We propose a novel "disentangled" decision-making method, Semantically Aligned task decomposition in MARL (SAMA)
SAMA prompts pretrained language models with chain-of-thought that can suggest potential goals, provide suitable goal decomposition and subgoal allocation as well as self-reflection-based replanning.
SAMA demonstrates considerable advantages in sample efficiency compared to state-of-the-art ASG methods.
- Score: 56.26889258704261
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The difficulty of appropriately assigning credit is particularly heightened
in cooperative MARL with sparse reward, due to the concurrent time and
structural scales involved. Automatic subgoal generation (ASG) has recently
emerged as a viable MARL approach inspired by utilizing subgoals in
intrinsically motivated reinforcement learning. However, end-to-end learning of
complex task planning from sparse rewards without prior knowledge, undoubtedly
requires massive training samples. Moreover, the diversity-promoting nature of
existing ASG methods can lead to the "over-representation" of subgoals,
generating numerous spurious subgoals of limited relevance to the actual task
reward and thus decreasing the sample efficiency of the algorithm. To address
this problem and inspired by the disentangled representation learning, we
propose a novel "disentangled" decision-making method, Semantically Aligned
task decomposition in MARL (SAMA), that prompts pretrained language models with
chain-of-thought that can suggest potential goals, provide suitable goal
decomposition and subgoal allocation as well as self-reflection-based
replanning. Additionally, SAMA incorporates language-grounded RL to train each
agent's subgoal-conditioned policy. SAMA demonstrates considerable advantages
in sample efficiency compared to state-of-the-art ASG methods, as evidenced by
its performance on two challenging sparse-reward tasks, Overcooked and MiniRTS.
Related papers
- Scaling Autonomous Agents via Automatic Reward Modeling And Planning [52.39395405893965]
Large language models (LLMs) have demonstrated remarkable capabilities across a range of tasks.
However, they still struggle with problems requiring multi-step decision-making and environmental feedback.
We propose a framework that can automatically learn a reward model from the environment without human annotations.
arXiv Detail & Related papers (2025-02-17T18:49:25Z) - Beyond Any-Shot Adaptation: Predicting Optimization Outcome for Robustness Gains without Extra Pay [46.92143725900031]
We present Model Predictive Task Sampling (MPTS) to establish connections between the task space and adaptation risk landscape.
MPTS characterizes the task episodic information with a generative model and directly predicts task-specific adaptation risk values from posterior inference.
MPTS can be seamlessly integrated into zero-shot, few-shot, and many-shot learning paradigms.
arXiv Detail & Related papers (2025-01-19T13:14:53Z) - Adaptive Reward Design for Reinforcement Learning in Complex Robotic Tasks [2.3031174164121127]
We propose a suite of reward functions that incentivize an RL agent to make measurable progress on tasks specified by formulas.
We develop an adaptive reward shaping approach that dynamically updates these reward functions during the learning process.
Experimental results on a range of RL-based robotic tasks demonstrate that the proposed approach is compatible with various RL algorithms.
arXiv Detail & Related papers (2024-12-14T18:04:18Z) - Sample-Efficient Reinforcement Learning with Temporal Logic Objectives: Leveraging the Task Specification to Guide Exploration [13.053013407015628]
This paper addresses the problem of learning optimal control policies for systems with uncertain dynamics.
We propose an accelerated RL algorithm that can learn control policies significantly faster than competitive approaches.
arXiv Detail & Related papers (2024-10-16T00:53:41Z) - Directed Exploration in Reinforcement Learning from Linear Temporal Logic [59.707408697394534]
Linear temporal logic (LTL) is a powerful language for task specification in reinforcement learning.
We show that the synthesized reward signal remains fundamentally sparse, making exploration challenging.
We show how better exploration can be achieved by further leveraging the specification and casting its corresponding Limit Deterministic B"uchi Automaton (LDBA) as a Markov reward process.
arXiv Detail & Related papers (2024-08-18T14:25:44Z) - World Models with Hints of Large Language Models for Goal Achieving [56.91610333715712]
Reinforcement learning struggles in the face of long-horizon tasks and sparse goals.
Inspired by human cognition, we propose a new multi-modal model-based RL approach named Dreaming with Large Language Models (M).DLL.M integrates the proposed hinting subgoals into the model rollouts to encourage goal discovery and reaching in challenging tasks.
arXiv Detail & Related papers (2024-06-11T15:49:08Z) - Variational Offline Multi-agent Skill Discovery [43.869625428099425]
We propose two novel auto-encoder schemes to simultaneously capture subgroup- and temporal-level abstractions and form multi-agent skills.
Our method can be applied to offline multi-task data, and the discovered subgroup skills can be transferred across relevant tasks without retraining.
arXiv Detail & Related papers (2024-05-26T00:24:46Z) - Task-Distributionally Robust Data-Free Meta-Learning [99.56612787882334]
Data-Free Meta-Learning (DFML) aims to efficiently learn new tasks by leveraging multiple pre-trained models without requiring their original training data.
For the first time, we reveal two major challenges hindering their practical deployments: Task-Distribution Shift ( TDS) and Task-Distribution Corruption (TDC)
arXiv Detail & Related papers (2023-11-23T15:46:54Z) - Meta-Reinforcement Learning Based on Self-Supervised Task Representation
Learning [23.45043290237396]
MoSS is a context-based Meta-reinforcement learning algorithm based on Self-Supervised task representation learning.
On MuJoCo and Meta-World benchmarks, MoSS outperforms prior in terms of performance, sample efficiency (3-50x faster), adaptation efficiency, and generalization.
arXiv Detail & Related papers (2023-04-29T15:46:19Z) - MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven
Reinforcement Learning [65.52675802289775]
We show that an uncertainty aware classifier can solve challenging reinforcement learning problems.
We propose a novel method for computing the normalized maximum likelihood (NML) distribution.
We show that the resulting algorithm has a number of intriguing connections to both count-based exploration methods and prior algorithms for learning reward functions.
arXiv Detail & Related papers (2021-07-15T08:19:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.