Related papers: Prompt a Robot to Walk with Large Language Models

Prompt a Robot to Walk with Large Language Models

URL: http://arxiv.org/abs/2309.09969v3
Date: Tue, 15 Oct 2024 17:04:20 GMT
Title: Prompt a Robot to Walk with Large Language Models
Authors: Yen-Jen Wang, Bike Zhang, Jianyu Chen, Koushil Sreenath,
Abstract summary: Large language models (LLMs) pre-trained on vast internet-scale data have showcased remarkable capabilities across diverse domains. We introduce a novel paradigm in which we use few-shot prompts collected from the physical environment. Experiments across various robots and environments validate that our method can effectively prompt a robot to walk.
Score: 18.214609570837403
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) pre-trained on vast internet-scale data have showcased remarkable capabilities across diverse domains. Recently, there has been escalating interest in deploying LLMs for robotics, aiming to harness the power of foundation models in real-world settings. However, this approach faces significant challenges, particularly in grounding these models in the physical world and in generating dynamic robot motions. To address these issues, we introduce a novel paradigm in which we use few-shot prompts collected from the physical environment, enabling the LLM to autoregressively generate low-level control commands for robots without task-specific fine-tuning. Experiments across various robots and environments validate that our method can effectively prompt a robot to walk. We thus illustrate how LLMs can proficiently function as low-level feedback controllers for dynamic motion control even in high-dimensional robotic systems. The project website and source code can be found at: https://prompt2walk.github.io/ .

Related papers

Deploying Foundation Model-Enabled Air and Ground Robots in the Field: Challenges and Opportunities [65.98704516122228]
The integration of foundation models (FMs) into robotics has enabled robots to understand natural language and reason about the semantics in their environments.<n>This paper addresses the deployment of FM-enabled robots in the field, where missions often require a robot to operate in large-scale and unstructured environments.<n>We present the first demonstration of large-scale LLM-enabled robot planning in unstructured environments with several kilometers of missions.
arXiv Detail & Related papers (2025-05-14T15:28:43Z)
Trajectory Adaptation using Large Language Models [0.8704964543257245]
Adapting robot trajectories based on human instructions as per new situations is essential for achieving more intuitive and scalable human-robot interactions. This work proposes a flexible language-based framework to adapt generic robotic trajectories produced by off-the-shelf motion planners. We utilize pre-trained LLMs to adapt trajectory waypoints by generating code as a policy for dense robot manipulation.
arXiv Detail & Related papers (2025-04-17T08:48:23Z)
$π_0$: A Vision-Language-Action Flow Model for General Robot Control [77.32743739202543]
We propose a novel flow matching architecture built on top of a pre-trained vision-language model (VLM) to inherit Internet-scale semantic knowledge. We evaluate our model in terms of its ability to perform tasks in zero shot after pre-training, follow language instructions from people, and its ability to acquire new skills via fine-tuning.
arXiv Detail & Related papers (2024-10-31T17:22:30Z)
GRAPPA: Generalizing and Adapting Robot Policies via Online Agentic Guidance [15.774237279917594]
We propose an agentic framework for robot self-guidance and self-improvement. Our framework iteratively grounds a base robot policy to relevant objects in the environment. We demonstrate that our approach can effectively guide manipulation policies to achieve significantly higher success rates.
arXiv Detail & Related papers (2024-10-09T02:00:37Z)
Solving Robotics Problems in Zero-Shot with Vision-Language Models [0.0]
We introduce Wonderful Team, a multi-agent Vision Large Language Model (VLLM) framework designed to solve robotics problems in a zero-shot regime. In our context, zero-shot means that for a novel environment, we provide a VLLM with an image of the robot's surroundings and a task description. Our system showcases the ability to handle diverse tasks such as manipulation, goal-reaching, and visual reasoning -- all in a zero-shot manner.
arXiv Detail & Related papers (2024-07-26T21:18:57Z)
Commonsense Reasoning for Legged Robot Adaptation with Vision-Language Models [81.55156507635286]
Legged robots are physically capable of navigating a diverse variety of environments and overcoming a wide range of obstructions. Current learning methods often struggle with generalization to the long tail of unexpected situations without heavy human supervision. We propose a system, VLM-Predictive Control (VLM-PC), combining two key components that we find to be crucial for eliciting on-the-fly, adaptive behavior selection.
arXiv Detail & Related papers (2024-07-02T21:00:30Z)
LLaRA: Supercharging Robot Learning Data for Vision-Language Policy [56.505551117094534]
We introduce LLaRA: Large Language and Robotics Assistant, a framework that formulates robot action policy as visuo-textual conversations. First, we present an automated pipeline to generate conversation-style instruction tuning data for robots from existing behavior cloning datasets. We show that a VLM finetuned with a limited amount of such datasets can produce meaningful action decisions for robotic control.
arXiv Detail & Related papers (2024-06-28T17:59:12Z)
Enhancing the LLM-Based Robot Manipulation Through Human-Robot Collaboration [4.2460673279562755]
Large Language Models (LLMs) are gaining popularity in the field of robotics. This paper proposes a novel approach to enhance the performance of LLM-based autonomous manipulation through Human-Robot Collaboration (HRC) The approach involves using a prompted GPT-4 language model to decompose high-level language commands into sequences of motions that can be executed by the robot.
arXiv Detail & Related papers (2024-06-20T08:23:49Z)
LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning [50.99807031490589]
We introduce LLARVA, a model trained with a novel instruction tuning method to unify a range of robotic learning tasks, scenarios, and environments. We generate 8.5M image-visual trace pairs from the Open X-Embodiment dataset in order to pre-train our model. Experiments yield strong performance, demonstrating that LLARVA performs well compared to several contemporary baselines.
arXiv Detail & Related papers (2024-06-17T17:55:29Z)
RoboScript: Code Generation for Free-Form Manipulation Tasks across Real and Simulation [77.41969287400977]
This paper presents textbfRobotScript, a platform for a deployable robot manipulation pipeline powered by code generation. We also present a benchmark for a code generation benchmark for robot manipulation tasks in free-form natural language. We demonstrate the adaptability of our code generation framework across multiple robot embodiments, including the Franka and UR5 robot arms.
arXiv Detail & Related papers (2024-02-22T15:12:00Z)
InCoRo: In-Context Learning for Robotics Control with Feedback Loops [4.702566749969133]
InCoRo is a system that uses a classical robotic feedback loop composed of an LLM controller, a scene understanding unit, and a robot. We highlight the generalization capabilities of our system and show that InCoRo surpasses the prior art in terms of the success rate. This research paves the way towards building reliable, efficient, intelligent autonomous systems that adapt to dynamic environments.
arXiv Detail & Related papers (2024-02-07T19:01:11Z)
AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents [109.3804962220498]
AutoRT is a system to scale up the deployment of operational robots in completely unseen scenarios with minimal human supervision. We demonstrate AutoRT proposing instructions to over 20 robots across multiple buildings and collecting 77k real robot episodes via both teleoperation and autonomous robot policies. We experimentally show that such "in-the-wild" data collected by AutoRT is significantly more diverse, and that AutoRT's use of LLMs allows for instruction following data collection robots that can align to human preferences.
arXiv Detail & Related papers (2024-01-23T18:45:54Z)
WALL-E: Embodied Robotic WAiter Load Lifting with Large Language Model [92.90127398282209]
This paper investigates the potential of integrating the most recent Large Language Models (LLMs) and existing visual grounding and robotic grasping system. We introduce the WALL-E (Embodied Robotic WAiter load lifting with Large Language model) as an example of this integration. We deploy this LLM-empowered system on the physical robot to provide a more user-friendly interface for the instruction-guided grasping task.
arXiv Detail & Related papers (2023-08-30T11:35:21Z)
Language to Rewards for Robotic Skill Synthesis [37.21434094015743]
We introduce a new paradigm that harnesses large language models (LLMs) to define reward parameters that can be optimized and accomplish variety of robotic tasks. Using reward as the intermediate interface generated by LLMs, we can effectively bridge the gap between high-level language instructions or corrections to low-level robot actions.
arXiv Detail & Related papers (2023-06-14T17:27:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.