LLM as A Robotic Brain: Unifying Egocentric Memory and Control
- URL: http://arxiv.org/abs/2304.09349v4
- Date: Mon, 12 Jun 2023 14:07:42 GMT
- Title: LLM as A Robotic Brain: Unifying Egocentric Memory and Control
- Authors: Jinjie Mai, Jun Chen, Bing Li, Guocheng Qian, Mohamed Elhoseiny,
Bernard Ghanem
- Abstract summary: Embodied AI focuses on the study and development of intelligent systems that possess a physical or virtual embodiment (i.e. robots)
Memory and control are the two essential parts of an embodied system and usually require separate frameworks to model each of them.
We propose a novel framework called LLM-Brain: using Large-scale Language Model as a robotic brain to unify egocentric memory and control.
- Score: 77.0899374628474
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Embodied AI focuses on the study and development of intelligent systems that
possess a physical or virtual embodiment (i.e. robots) and are able to
dynamically interact with their environment. Memory and control are the two
essential parts of an embodied system and usually require separate frameworks
to model each of them. In this paper, we propose a novel and generalizable
framework called LLM-Brain: using Large-scale Language Model as a robotic brain
to unify egocentric memory and control. The LLM-Brain framework integrates
multiple multimodal language models for robotic tasks, utilizing a zero-shot
learning approach. All components within LLM-Brain communicate using natural
language in closed-loop multi-round dialogues that encompass perception,
planning, control, and memory. The core of the system is an embodied LLM to
maintain egocentric memory and control the robot. We demonstrate LLM-Brain by
examining two downstream tasks: active exploration and embodied question
answering. The active exploration tasks require the robot to extensively
explore an unknown environment within a limited number of actions. Meanwhile,
the embodied question answering tasks necessitate that the robot answers
questions based on observations acquired during prior explorations.
Related papers
- $π_0$: A Vision-Language-Action Flow Model for General Robot Control [77.32743739202543]
We propose a novel flow matching architecture built on top of a pre-trained vision-language model (VLM) to inherit Internet-scale semantic knowledge.
We evaluate our model in terms of its ability to perform tasks in zero shot after pre-training, follow language instructions from people, and its ability to acquire new skills via fine-tuning.
arXiv Detail & Related papers (2024-10-31T17:22:30Z) - Commonsense Reasoning for Legged Robot Adaptation with Vision-Language Models [81.55156507635286]
Legged robots are physically capable of navigating a diverse variety of environments and overcoming a wide range of obstructions.
Current learning methods often struggle with generalization to the long tail of unexpected situations without heavy human supervision.
We propose a system, VLM-Predictive Control (VLM-PC), combining two key components that we find to be crucial for eliciting on-the-fly, adaptive behavior selection.
arXiv Detail & Related papers (2024-07-02T21:00:30Z) - MMRo: Are Multimodal LLMs Eligible as the Brain for In-Home Robotics? [33.573056018368504]
This study introduces the first benchmark for evaluating Multimodal LLM for Robotic (MMRo) benchmark.
We identify four essential capabilities perception, task planning, visual reasoning, and safety measurement that MLLMs must possess to qualify as the robot's central processing unit.
Our findings indicate that no single model excels in all areas, suggesting that current MLLMs are not yet trustworthy enough to serve as the cognitive core for robots.
arXiv Detail & Related papers (2024-06-28T07:09:06Z) - Imperative Learning: A Self-supervised Neural-Symbolic Learning Framework for Robot Autonomy [31.818923556912495]
We introduce a new self-supervised neural-symbolic (NeSy) computational framework, imperative learning (IL) for robot autonomy.
We formulate IL as a special bilevel optimization (BLO) which enables reciprocal learning over the three modules.
We show that IL can significantly enhance robot autonomy capabilities and we anticipate that it will catalyze further research across diverse domains.
arXiv Detail & Related papers (2024-06-23T12:02:17Z) - Large Language Models for Robotics: Opportunities, Challenges, and
Perspectives [46.57277568357048]
Large language models (LLMs) have undergone significant expansion and have been increasingly integrated across various domains.
For embodied tasks, where robots interact with complex environments, text-only LLMs often face challenges due to a lack of compatibility with robotic visual perception.
We propose a framework that utilizes multimodal GPT-4V to enhance embodied task planning through the combination of natural language instructions and robot visual perceptions.
arXiv Detail & Related papers (2024-01-09T03:22:16Z) - Large Language Models for Robotics: A Survey [40.76581696885846]
Large language models (LLMs) possess the ability to process and generate natural language, facilitating efficient interaction and collaboration with robots.
This review aims to summarize the applications of LLMs in robotics, delving into their impact and contributions to key areas such as robot control, perception, decision-making, and path planning.
arXiv Detail & Related papers (2023-11-13T10:46:35Z) - WALL-E: Embodied Robotic WAiter Load Lifting with Large Language Model [92.90127398282209]
This paper investigates the potential of integrating the most recent Large Language Models (LLMs) and existing visual grounding and robotic grasping system.
We introduce the WALL-E (Embodied Robotic WAiter load lifting with Large Language model) as an example of this integration.
We deploy this LLM-empowered system on the physical robot to provide a more user-friendly interface for the instruction-guided grasping task.
arXiv Detail & Related papers (2023-08-30T11:35:21Z) - Chat with the Environment: Interactive Multimodal Perception Using Large
Language Models [19.623070762485494]
Large Language Models (LLMs) have shown remarkable reasoning ability in few-shot robotic planning.
Our study demonstrates that LLMs can provide high-level planning and reasoning skills and control interactive robot behavior in a multimodal environment.
arXiv Detail & Related papers (2023-03-14T23:01:27Z) - Inner Monologue: Embodied Reasoning through Planning with Language
Models [81.07216635735571]
Large Language Models (LLMs) can be applied to domains beyond natural language processing.
LLMs planning in embodied environments need to consider not just what skills to do, but also how and when to do them.
We propose that by leveraging environment feedback, LLMs are able to form an inner monologue that allows them to more richly process and plan in robotic control scenarios.
arXiv Detail & Related papers (2022-07-12T15:20:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.