Related papers: Prompter: Utilizing Large Language Model Prompting for a Data Efficient Embodied Instruction Following

Prompter: Utilizing Large Language Model Prompting for a Data Efficient Embodied Instruction Following

URL: http://arxiv.org/abs/2211.03267v2
Date: Tue, 12 Mar 2024 09:01:54 GMT
Title: Prompter: Utilizing Large Language Model Prompting for a Data Efficient Embodied Instruction Following
Authors: Yuki Inoue and Hiroki Ohashi
Abstract summary: Embodied Instruction Following studies how autonomous mobile manipulation robots should be controlled to accomplish long-horizon tasks. We show that embedding the physical constraints of the deployed robots into the module design is highly effective. Our design also allows the same modular system to work across robots of different configurations with minimal modifications.
Score: 4.532517021515834
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Embodied Instruction Following (EIF) studies how autonomous mobile manipulation robots should be controlled to accomplish long-horizon tasks described by natural language instructions. While much research on EIF is conducted in simulators, the ultimate goal of the field is to deploy the agents in real life. This is one of the reasons why recent methods have moved away from training models end-to-end and take modular approaches, which do not need the costly expert operation data. However, as it is still in the early days of importing modular ideas to EIF, a search for modules effective in the EIF task is still far from a conclusion. In this paper, we propose to extend the modular design using knowledge obtained from two external sources. First, we show that embedding the physical constraints of the deployed robots into the module design is highly effective. Our design also allows the same modular system to work across robots of different configurations with minimal modifications. Second, we show that the landmark-based object search, previously implemented by a trained model requiring a dedicated set of data, can be replaced by an implementation that prompts pretrained large language models for landmark-object relationships, eliminating the need for collecting dedicated training data. Our proposed Prompter achieves 41.53\% and 45.32\% on the ALFRED benchmark with high-level instructions only and step-by-step instructions, respectively, significantly outperforming the previous state of the art by 5.46\% and 9.91\%.

Related papers

AnyBipe: An End-to-End Framework for Training and Deploying Bipedal Robots Guided by Large Language Models [6.637952061378054]
This paper introduces an end-to-end framework for training and deploying reinforcement learning policies for robots. The framework consists of three interconnected modules: an LLM-guided reward function design module, an RL training module leveraging prior work, and a sim-to-real homomorphic evaluation module. We detail the construction of these modules, their advantages over traditional approaches, and demonstrate the framework's capability to autonomously develop and refine controlling strategies for bipedal robot locomotion.
arXiv Detail & Related papers (2024-09-13T15:15:45Z)
Details Make a Difference: Object State-Sensitive Neurorobotic Task Planning [15.03025428687218]
The state of an object reflects its current status or condition and is important for a robot's task planning and manipulation. Recently, pre-trained Large Language Models (LLMs) and Vision-Language Models (VLMs) have shown impressive capabilities in generating plans. We introduce an Object State-Sensitive Agent (OSSA), a task-planning agent empowered by pre-trained neural networks.
arXiv Detail & Related papers (2024-06-14T12:52:42Z)
Learning to Extract Structured Entities Using Language Models [52.281701191329]
Recent advances in machine learning have significantly impacted the field of information extraction. We reformulate the task to be entity-centric, enabling the use of diverse metrics. We contribute to the field by introducing Structured Entity Extraction and proposing the Approximate Entity Set OverlaP metric.
arXiv Detail & Related papers (2024-02-06T22:15:09Z)
Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning [50.47568731994238]
Key method for creating Artificial Intelligence (AI) agents is Reinforcement Learning (RL) This paper presents a general framework model for integrating and learning structured reasoning into AI agents' policies.
arXiv Detail & Related papers (2023-12-22T17:57:57Z)
ModuleFormer: Modularity Emerges from Mixture-of-Experts [60.6148988099284]
This paper proposes a new neural network architecture, ModuleFormer, to improve the efficiency and flexibility of large language models. Unlike the previous SMoE-based modular language model, ModuleFormer can induce modularity from uncurated data.
arXiv Detail & Related papers (2023-06-07T17:59:57Z)
Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model [63.66204449776262]
Instruct2Act is a framework that maps multi-modal instructions to sequential actions for robotic manipulation tasks. Our approach is adjustable and flexible in accommodating various instruction modalities and input types. Our zero-shot method outperformed many state-of-the-art learning-based policies in several tasks.
arXiv Detail & Related papers (2023-05-18T17:59:49Z)
Modular Deep Learning [120.36599591042908]
Transfer learning has recently become the dominant paradigm of machine learning. It remains unclear how to develop models that specialise towards multiple tasks without incurring negative interference. Modular deep learning has emerged as a promising solution to these challenges.
arXiv Detail & Related papers (2023-02-22T18:11:25Z)
Modular Framework for Visuomotor Language Grounding [57.93906820466519]
Natural language instruction following tasks serve as a valuable test-bed for grounded language and robotics research. We propose the structuring of language, acting, and visual tasks into separate modules that can be trained independently.
arXiv Detail & Related papers (2021-09-05T20:11:53Z)
Self-training Improves Pre-training for Few-shot Learning in Task-oriented Dialog Systems [47.937191088981436]
Large-scale pre-trained language models, have shown promising results for few-shot learning in ToD. We propose a self-training approach that iteratively labels the most confident unlabeled data to train a stronger Student model. We conduct experiments and present analyses on four downstream tasks in ToD, including intent classification, dialog state tracking, dialog act prediction, and response selection.
arXiv Detail & Related papers (2021-08-28T07:22:06Z)
A Data Efficient End-To-End Spoken Language Understanding Architecture [22.823732899634518]
We introduce a data efficient system which is trained end-to-end, with no additional, pre-trained external module. The proposed model achieves a reasonable size and competitive results with respect to state-of-the-art while using a small training dataset.
arXiv Detail & Related papers (2020-02-14T10:24:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.