WALL-E: Embodied Robotic WAiter Load Lifting with Large Language Model
- URL: http://arxiv.org/abs/2308.15962v2
- Date: Thu, 31 Aug 2023 13:51:56 GMT
- Title: WALL-E: Embodied Robotic WAiter Load Lifting with Large Language Model
- Authors: Tianyu Wang, Yifan Li, Haitao Lin, Xiangyang Xue, Yanwei Fu
- Abstract summary: This paper investigates the potential of integrating the most recent Large Language Models (LLMs) and existing visual grounding and robotic grasping system.
We introduce the WALL-E (Embodied Robotic WAiter load lifting with Large Language model) as an example of this integration.
We deploy this LLM-empowered system on the physical robot to provide a more user-friendly interface for the instruction-guided grasping task.
- Score: 92.90127398282209
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Enabling robots to understand language instructions and react accordingly to
visual perception has been a long-standing goal in the robotics research
community. Achieving this goal requires cutting-edge advances in natural
language processing, computer vision, and robotics engineering. Thus, this
paper mainly investigates the potential of integrating the most recent Large
Language Models (LLMs) and existing visual grounding and robotic grasping
system to enhance the effectiveness of the human-robot interaction. We
introduce the WALL-E (Embodied Robotic WAiter load lifting with Large Language
model) as an example of this integration. The system utilizes the LLM of
ChatGPT to summarize the preference object of the users as a target instruction
via the multi-round interactive dialogue. The target instruction is then
forwarded to a visual grounding system for object pose and size estimation,
following which the robot grasps the object accordingly. We deploy this
LLM-empowered system on the physical robot to provide a more user-friendly
interface for the instruction-guided grasping task. The further experimental
results on various real-world scenarios demonstrated the feasibility and
efficacy of our proposed framework. See the project website at:
https://star-uu-wang.github.io/WALL-E/
Related papers
- $π_0$: A Vision-Language-Action Flow Model for General Robot Control [77.32743739202543]
We propose a novel flow matching architecture built on top of a pre-trained vision-language model (VLM) to inherit Internet-scale semantic knowledge.
We evaluate our model in terms of its ability to perform tasks in zero shot after pre-training, follow language instructions from people, and its ability to acquire new skills via fine-tuning.
arXiv Detail & Related papers (2024-10-31T17:22:30Z) - Polaris: Open-ended Interactive Robotic Manipulation via Syn2Real Visual Grounding and Large Language Models [53.22792173053473]
We introduce an interactive robotic manipulation framework called Polaris.
Polaris integrates perception and interaction by utilizing GPT-4 alongside grounded vision models.
We propose a novel Synthetic-to-Real (Syn2Real) pose estimation pipeline.
arXiv Detail & Related papers (2024-08-15T06:40:38Z) - RoboScript: Code Generation for Free-Form Manipulation Tasks across Real
and Simulation [77.41969287400977]
This paper presents textbfRobotScript, a platform for a deployable robot manipulation pipeline powered by code generation.
We also present a benchmark for a code generation benchmark for robot manipulation tasks in free-form natural language.
We demonstrate the adaptability of our code generation framework across multiple robot embodiments, including the Franka and UR5 robot arms.
arXiv Detail & Related papers (2024-02-22T15:12:00Z) - Interactive Planning Using Large Language Models for Partially
Observable Robotics Tasks [54.60571399091711]
Large Language Models (LLMs) have achieved impressive results in creating robotic agents for performing open vocabulary tasks.
We present an interactive planning technique for partially observable tasks using LLMs.
arXiv Detail & Related papers (2023-12-11T22:54:44Z) - Large Language Models for Robotics: A Survey [40.76581696885846]
Large language models (LLMs) possess the ability to process and generate natural language, facilitating efficient interaction and collaboration with robots.
This review aims to summarize the applications of LLMs in robotics, delving into their impact and contributions to key areas such as robot control, perception, decision-making, and path planning.
arXiv Detail & Related papers (2023-11-13T10:46:35Z) - Prompt a Robot to Walk with Large Language Models [18.214609570837403]
Large language models (LLMs) pre-trained on vast internet-scale data have showcased remarkable capabilities across diverse domains.
We introduce a novel paradigm in which we use few-shot prompts collected from the physical environment.
Experiments across various robots and environments validate that our method can effectively prompt a robot to walk.
arXiv Detail & Related papers (2023-09-18T17:50:17Z) - PROGrasp: Pragmatic Human-Robot Communication for Object Grasping [22.182690439449278]
Interactive Object Grasping (IOG) is the task of identifying and grasping the desired object via human-robot natural language interaction.
Inspired by pragmatics, we introduce a new IOG task, Pragmatic-IOG, and the corresponding dataset, Intention-oriented Multi-modal Dialogue (IM-Dial)
Prograsp performs Pragmatic-IOG by incorporating modules for visual grounding, question asking, object grasping, and most importantly, answer interpretation for pragmatic inference.
arXiv Detail & Related papers (2023-09-14T14:45:47Z) - Incremental Learning of Humanoid Robot Behavior from Natural Interaction and Large Language Models [23.945922720555146]
We propose a system to achieve incremental learning of complex behavior from natural interaction.
We integrate the system in the robot cognitive architecture of the humanoid robot ARMAR-6.
arXiv Detail & Related papers (2023-09-08T13:29:05Z) - RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic
Control [140.48218261864153]
We study how vision-language models trained on Internet-scale data can be incorporated directly into end-to-end robotic control.
Our approach leads to performant robotic policies and enables RT-2 to obtain a range of emergent capabilities from Internet-scale training.
arXiv Detail & Related papers (2023-07-28T21:18:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.