Prompter: Utilizing Large Language Model Prompting for a Data Efficient
Embodied Instruction Following
- URL: http://arxiv.org/abs/2211.03267v2
- Date: Tue, 12 Mar 2024 09:01:54 GMT
- Title: Prompter: Utilizing Large Language Model Prompting for a Data Efficient
Embodied Instruction Following
- Authors: Yuki Inoue and Hiroki Ohashi
- Abstract summary: Embodied Instruction Following studies how autonomous mobile manipulation robots should be controlled to accomplish long-horizon tasks.
We show that embedding the physical constraints of the deployed robots into the module design is highly effective.
Our design also allows the same modular system to work across robots of different configurations with minimal modifications.
- Score: 4.532517021515834
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Embodied Instruction Following (EIF) studies how autonomous mobile
manipulation robots should be controlled to accomplish long-horizon tasks
described by natural language instructions. While much research on EIF is
conducted in simulators, the ultimate goal of the field is to deploy the agents
in real life. This is one of the reasons why recent methods have moved away
from training models end-to-end and take modular approaches, which do not need
the costly expert operation data. However, as it is still in the early days of
importing modular ideas to EIF, a search for modules effective in the EIF task
is still far from a conclusion. In this paper, we propose to extend the modular
design using knowledge obtained from two external sources. First, we show that
embedding the physical constraints of the deployed robots into the module
design is highly effective. Our design also allows the same modular system to
work across robots of different configurations with minimal modifications.
Second, we show that the landmark-based object search, previously implemented
by a trained model requiring a dedicated set of data, can be replaced by an
implementation that prompts pretrained large language models for
landmark-object relationships, eliminating the need for collecting dedicated
training data. Our proposed Prompter achieves 41.53\% and 45.32\% on the ALFRED
benchmark with high-level instructions only and step-by-step instructions,
respectively, significantly outperforming the previous state of the art by
5.46\% and 9.91\%.
Related papers
- AnyBipe: An End-to-End Framework for Training and Deploying Bipedal Robots Guided by Large Language Models [6.637952061378054]
This paper introduces an end-to-end framework for training and deploying reinforcement learning policies for robots.
The framework consists of three interconnected modules: an LLM-guided reward function design module, an RL training module leveraging prior work, and a sim-to-real homomorphic evaluation module.
We detail the construction of these modules, their advantages over traditional approaches, and demonstrate the framework's capability to autonomously develop and refine controlling strategies for bipedal robot locomotion.
arXiv Detail & Related papers (2024-09-13T15:15:45Z) - Details Make a Difference: Object State-Sensitive Neurorobotic Task Planning [15.03025428687218]
The state of an object reflects its current status or condition and is important for a robot's task planning and manipulation.
Recently, pre-trained Large Language Models (LLMs) and Vision-Language Models (VLMs) have shown impressive capabilities in generating plans.
We introduce an Object State-Sensitive Agent (OSSA), a task-planning agent empowered by pre-trained neural networks.
arXiv Detail & Related papers (2024-06-14T12:52:42Z) - Learning to Extract Structured Entities Using Language Models [52.281701191329]
Recent advances in machine learning have significantly impacted the field of information extraction.
We reformulate the task to be entity-centric, enabling the use of diverse metrics.
We contribute to the field by introducing Structured Entity Extraction and proposing the Approximate Entity Set OverlaP metric.
arXiv Detail & Related papers (2024-02-06T22:15:09Z) - Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning [50.47568731994238]
Key method for creating Artificial Intelligence (AI) agents is Reinforcement Learning (RL)
This paper presents a general framework model for integrating and learning structured reasoning into AI agents' policies.
arXiv Detail & Related papers (2023-12-22T17:57:57Z) - ModuleFormer: Modularity Emerges from Mixture-of-Experts [60.6148988099284]
This paper proposes a new neural network architecture, ModuleFormer, to improve the efficiency and flexibility of large language models.
Unlike the previous SMoE-based modular language model, ModuleFormer can induce modularity from uncurated data.
arXiv Detail & Related papers (2023-06-07T17:59:57Z) - Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions
with Large Language Model [63.66204449776262]
Instruct2Act is a framework that maps multi-modal instructions to sequential actions for robotic manipulation tasks.
Our approach is adjustable and flexible in accommodating various instruction modalities and input types.
Our zero-shot method outperformed many state-of-the-art learning-based policies in several tasks.
arXiv Detail & Related papers (2023-05-18T17:59:49Z) - Modular Deep Learning [120.36599591042908]
Transfer learning has recently become the dominant paradigm of machine learning.
It remains unclear how to develop models that specialise towards multiple tasks without incurring negative interference.
Modular deep learning has emerged as a promising solution to these challenges.
arXiv Detail & Related papers (2023-02-22T18:11:25Z) - Modular Framework for Visuomotor Language Grounding [57.93906820466519]
Natural language instruction following tasks serve as a valuable test-bed for grounded language and robotics research.
We propose the structuring of language, acting, and visual tasks into separate modules that can be trained independently.
arXiv Detail & Related papers (2021-09-05T20:11:53Z) - Self-training Improves Pre-training for Few-shot Learning in
Task-oriented Dialog Systems [47.937191088981436]
Large-scale pre-trained language models, have shown promising results for few-shot learning in ToD.
We propose a self-training approach that iteratively labels the most confident unlabeled data to train a stronger Student model.
We conduct experiments and present analyses on four downstream tasks in ToD, including intent classification, dialog state tracking, dialog act prediction, and response selection.
arXiv Detail & Related papers (2021-08-28T07:22:06Z) - A Data Efficient End-To-End Spoken Language Understanding Architecture [22.823732899634518]
We introduce a data efficient system which is trained end-to-end, with no additional, pre-trained external module.
The proposed model achieves a reasonable size and competitive results with respect to state-of-the-art while using a small training dataset.
arXiv Detail & Related papers (2020-02-14T10:24:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.