In-Context Learning Enables Robot Action Prediction in LLMs
- URL: http://arxiv.org/abs/2410.12782v1
- Date: Wed, 16 Oct 2024 17:56:49 GMT
- Title: In-Context Learning Enables Robot Action Prediction in LLMs
- Authors: Yida Yin, Zekai Wang, Yuvan Sharma, Dantong Niu, Trevor Darrell, Roei Herzig,
- Abstract summary: We introduce RoboPrompt, a framework that enables offthe-shelf text-only Large Language Models to directly predict robot actions.
Our approach firstally identifiess that capture important moments from an episode.
We extract end-effector actions as well as the estimated initial object poses, and both are converted into textual descriptions.
This enables an LLM to directly predict robot actions at test time.
- Score: 52.285739178561705
- License:
- Abstract: Recently, Large Language Models (LLMs) have achieved remarkable success using in-context learning (ICL) in the language domain. However, leveraging the ICL capabilities within LLMs to directly predict robot actions remains largely unexplored. In this paper, we introduce RoboPrompt, a framework that enables off-the-shelf text-only LLMs to directly predict robot actions through ICL without training. Our approach first heuristically identifies keyframes that capture important moments from an episode. Next, we extract end-effector actions from these keyframes as well as the estimated initial object poses, and both are converted into textual descriptions. Finally, we construct a structured template to form ICL demonstrations from these textual descriptions and a task instruction. This enables an LLM to directly predict robot actions at test time. Through extensive experiments and analysis, RoboPrompt shows stronger performance over zero-shot and ICL baselines in simulated and real-world settings.
Related papers
- ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning [74.58666091522198]
We present a framework for intuitive robot programming by non-experts.
We leverage natural language prompts and contextual information from the Robot Operating System (ROS)
Our system integrates large language models (LLMs), enabling non-experts to articulate task requirements to the system through a chat interface.
arXiv Detail & Related papers (2024-06-28T08:28:38Z) - CLMASP: Coupling Large Language Models with Answer Set Programming for Robotic Task Planning [9.544073786800706]
Large Language Models (LLMs) possess extensive foundational knowledge and moderate reasoning abilities.
It is challenging to ground a LLM-generated plan to be executable for the specified robot with certain restrictions.
This paper introduces CLMASP, an approach that couples LLMs with Answer Set Programming (ASP) to overcome the limitations.
arXiv Detail & Related papers (2024-06-05T15:21:44Z) - The Strong Pull of Prior Knowledge in Large Language Models and Its Impact on Emotion Recognition [74.04775677110179]
In-context Learning (ICL) has emerged as a powerful paradigm for performing natural language tasks with Large Language Models (LLM)
We show that LLMs have strong yet inconsistent priors in emotion recognition that ossify their predictions.
Our results suggest that caution is needed when using ICL with larger LLMs for affect-centered tasks outside their pre-training domain.
arXiv Detail & Related papers (2024-03-25T19:07:32Z) - Object-Centric Instruction Augmentation for Robotic Manipulation [29.491990994901666]
We introduce the textitObject-Centric Instruction Augmentation (OCI) framework to augment highly semantic and information-dense language instruction with position cues.
We utilize a Multi-modal Large Language Model (MLLM) to weave knowledge of object locations into natural language instruction.
We demonstrate that robotic manipulator imitation policies trained with our enhanced instructions outperform those relying solely on traditional language instructions.
arXiv Detail & Related papers (2024-01-05T13:54:45Z) - Interactive Planning Using Large Language Models for Partially
Observable Robotics Tasks [54.60571399091711]
Large Language Models (LLMs) have achieved impressive results in creating robotic agents for performing open vocabulary tasks.
We present an interactive planning technique for partially observable tasks using LLMs.
arXiv Detail & Related papers (2023-12-11T22:54:44Z) - Exploring In-Context Learning of Textless Speech Language Model for Speech Classification Tasks [98.5311231450689]
In-context learning (ICL) has played an essential role in utilizing large language models (LLMs)
This study is the first work exploring ICL for speech classification tasks with textless speech LM.
arXiv Detail & Related papers (2023-10-19T05:31:45Z) - Language Models as Zero-Shot Trajectory Generators [10.572264780575564]
Large Language Models (LLMs) have recently shown promise as high-level planners for robots.
It is often assumed that LLMs do not possess sufficient knowledge to be used for the low-level trajectories themselves.
This work investigates if an LLM can directly predict a dense sequence of end-effector poses for manipulation tasks.
arXiv Detail & Related papers (2023-10-17T21:57:36Z) - ProgPrompt: Generating Situated Robot Task Plans using Large Language
Models [68.57918965060787]
Large language models (LLMs) can be used to score potential next actions during task planning.
We present a programmatic LLM prompt structure that enables plan generation functional across situated environments.
arXiv Detail & Related papers (2022-09-22T20:29:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.