Multi-Level Compositional Reasoning for Interactive Instruction
Following
- URL: http://arxiv.org/abs/2308.09387v2
- Date: Wed, 13 Mar 2024 02:37:47 GMT
- Title: Multi-Level Compositional Reasoning for Interactive Instruction
Following
- Authors: Suvaansh Bhambri, Byeonghwi Kim, Jonghyun Choi
- Abstract summary: Multi-level Compositional Reasoning Agent (MCR-Agent)
At the highest level, we infer a sequence of human-interpretable subgoals to be executed based on language instructions by a high-level policy composition controller.
At the middle level, we discriminatively control the agent's navigation by a master policy by alternating between a navigation policy and various independent interaction policies.
At the lowest level, we infer manipulation actions with the corresponding object masks using the appropriate interaction policy.
- Score: 24.581542880280203
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Robotic agents performing domestic chores by natural language directives are
required to master the complex job of navigating environment and interacting
with objects in the environments. The tasks given to the agents are often
composite thus are challenging as completing them require to reason about
multiple subtasks, e.g., bring a cup of coffee. To address the challenge, we
propose to divide and conquer it by breaking the task into multiple subgoals
and attend to them individually for better navigation and interaction. We call
it Multi-level Compositional Reasoning Agent (MCR-Agent). Specifically, we
learn a three-level action policy. At the highest level, we infer a sequence of
human-interpretable subgoals to be executed based on language instructions by a
high-level policy composition controller. At the middle level, we
discriminatively control the agent's navigation by a master policy by
alternating between a navigation policy and various independent interaction
policies. Finally, at the lowest level, we infer manipulation actions with the
corresponding object masks using the appropriate interaction policy. Our
approach not only generates human interpretable subgoals but also achieves
2.03% absolute gain to comparable state of the arts in the efficiency metric
(PLWSR in unseen set) without using rule-based planning or a semantic spatial
memory.
Related papers
- Human-Object Interaction from Human-Level Instructions [16.70362477046958]
We present the first complete system that can synthesize object motion, full-body motion, and finger motion simultaneously from human-level instructions.
Our experiments demonstrate the effectiveness of our high-level planner in generating plausible target layouts and our low-level motion generator in synthesizing realistic interactions for diverse objects.
arXiv Detail & Related papers (2024-06-25T17:46:28Z) - Cooperation, Competition, and Maliciousness: LLM-Stakeholders Interactive Negotiation [52.930183136111864]
We propose using scorable negotiation to evaluate Large Language Models (LLMs)
To reach an agreement, agents must have strong arithmetic, inference, exploration, and planning capabilities.
We provide procedures to create new games and increase games' difficulty to have an evolving benchmark.
arXiv Detail & Related papers (2023-09-29T13:33:06Z) - LEMMA: Learning Language-Conditioned Multi-Robot Manipulation [21.75163634731677]
LanguagE-Conditioned Multi-robot MAnipulation (LEMMA)
LeMMA features 8 types of procedurally generated tasks with varying degree of complexity.
For each task, we provide 800 expert demonstrations and human instructions for training and evaluations.
arXiv Detail & Related papers (2023-08-02T04:37:07Z) - Entity Divider with Language Grounding in Multi-Agent Reinforcement
Learning [28.619845209653274]
We investigate the use of natural language to drive the generalization of policies in multi-agent settings.
We propose a novel framework for language grounding in multi-agent reinforcement learning, entity divider (EnDi)
EnDi enables agents to independently learn subgoal division at the entity level and act in the environment based on the associated entities.
arXiv Detail & Related papers (2022-10-25T11:53:52Z) - ALMA: Hierarchical Learning for Composite Multi-Agent Tasks [21.556661319375255]
We introduce ALMA, a general learning method for taking advantage of structured tasks.
ALMA simultaneously learns a high-level subtask allocation policy and low-level agent policies.
We demonstrate that ALMA learns sophisticated coordination behavior in a number of challenging environments.
arXiv Detail & Related papers (2022-05-27T19:12:23Z) - Planning to Practice: Efficient Online Fine-Tuning by Composing Goals in
Latent Space [76.46113138484947]
General-purpose robots require diverse repertoires of behaviors to complete challenging tasks in real-world unstructured environments.
To address this issue, goal-conditioned reinforcement learning aims to acquire policies that can reach goals for a wide range of tasks on command.
We propose Planning to Practice, a method that makes it practical to train goal-conditioned policies for long-horizon tasks.
arXiv Detail & Related papers (2022-05-17T06:58:17Z) - LDSA: Learning Dynamic Subtask Assignment in Cooperative Multi-Agent
Reinforcement Learning [122.47938710284784]
We propose a novel framework for learning dynamic subtask assignment (LDSA) in cooperative MARL.
To reasonably assign agents to different subtasks, we propose an ability-based subtask selection strategy.
We show that LDSA learns reasonable and effective subtask assignment for better collaboration.
arXiv Detail & Related papers (2022-05-05T10:46:16Z) - A Persistent Spatial Semantic Representation for High-level Natural
Language Instruction Execution [54.385344986265714]
We propose a persistent spatial semantic representation method to bridge the gap between language and robot actions.
We evaluate our approach on the ALFRED benchmark and achieve state-of-the-art results, despite completely avoiding the commonly used step-by-step instructions.
arXiv Detail & Related papers (2021-07-12T17:47:19Z) - Towards Coordinated Robot Motions: End-to-End Learning of Motion
Policies on Transform Trees [63.31965375413414]
We propose to solve multi-task problems through learning structured policies from human demonstrations.
Our structured policy is inspired by RMPflow, a framework for combining subtask policies on different spaces.
We derive an end-to-end learning objective function that is suitable for the multi-task problem.
arXiv Detail & Related papers (2020-12-24T22:46:22Z) - RODE: Learning Roles to Decompose Multi-Agent Tasks [69.56458960841165]
Role-based learning holds the promise of achieving scalable multi-agent learning by decomposing complex tasks using roles.
We propose to first decompose joint action spaces into restricted role action spaces by clustering actions according to their effects on the environment and other agents.
By virtue of these advances, our method outperforms the current state-of-the-art MARL algorithms on 10 of the 14 scenarios that comprise the challenging StarCraft II micromanagement benchmark.
arXiv Detail & Related papers (2020-10-04T09:20:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.