A Survey on Robotics with Foundation Models: toward Embodied AI
- URL: http://arxiv.org/abs/2402.02385v1
- Date: Sun, 4 Feb 2024 07:55:01 GMT
- Title: A Survey on Robotics with Foundation Models: toward Embodied AI
- Authors: Zhiyuan Xu, Kun Wu, Junjie Wen, Jinming Li, Ning Liu, Zhengping Che,
Jian Tang
- Abstract summary: Recent advances in computer vision, natural language processing, and multi-modality learning have shown that the foundation models have superhuman capabilities for specific tasks.
This survey aims to provide a comprehensive and up-to-date overview of foundation models in robotics, focusing on autonomous manipulation and encompassing high-level planning and low-level control.
- Score: 30.999414445286757
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While the exploration for embodied AI has spanned multiple decades, it
remains a persistent challenge to endow agents with human-level intelligence,
including perception, learning, reasoning, decision-making, control, and
generalization capabilities, so that they can perform general-purpose tasks in
open, unstructured, and dynamic environments. Recent advances in computer
vision, natural language processing, and multi-modality learning have shown
that the foundation models have superhuman capabilities for specific tasks.
They not only provide a solid cornerstone for integrating basic modules into
embodied AI systems but also shed light on how to scale up robot learning from
a methodological perspective. This survey aims to provide a comprehensive and
up-to-date overview of foundation models in robotics, focusing on autonomous
manipulation and encompassing high-level planning and low-level control.
Moreover, we showcase their commonly used datasets, simulators, and benchmarks.
Importantly, we emphasize the critical challenges intrinsic to this field and
delineate potential avenues for future research, contributing to advancing the
frontier of academic and industrial discourse.
Related papers
- $π_0$: A Vision-Language-Action Flow Model for General Robot Control [77.32743739202543]
We propose a novel flow matching architecture built on top of a pre-trained vision-language model (VLM) to inherit Internet-scale semantic knowledge.
We evaluate our model in terms of its ability to perform tasks in zero shot after pre-training, follow language instructions from people, and its ability to acquire new skills via fine-tuning.
arXiv Detail & Related papers (2024-10-31T17:22:30Z) - Grounding Robot Policies with Visuomotor Language Guidance [15.774237279917594]
We propose an agent-based framework for grounding robot policies to the current context.
The proposed framework is composed of a set of conversational agents designed for specific roles.
We demonstrate that our approach can effectively guide manipulation policies to achieve significantly higher success rates.
arXiv Detail & Related papers (2024-10-09T02:00:37Z) - A Survey on Vision-Language-Action Models for Embodied AI [71.16123093739932]
Vision-language-action models (VLAs) have become a foundational element in robot learning.
Various methods have been proposed to enhance traits such as versatility, dexterity, and generalizability.
VLAs serve as high-level task planners capable of decomposing long-horizon tasks into executable subtasks.
arXiv Detail & Related papers (2024-05-23T01:43:54Z) - Position Paper: Agent AI Towards a Holistic Intelligence [53.35971598180146]
We emphasize developing Agent AI -- an embodied system that integrates large foundation models into agent actions.
In this paper, we propose a novel large action model to achieve embodied intelligent behavior, the Agent Foundation Model.
arXiv Detail & Related papers (2024-02-28T16:09:56Z) - Toward General-Purpose Robots via Foundation Models: A Survey and Meta-Analysis [82.59451639072073]
General-purpose robots operate seamlessly in any environment, with any object, and utilize various skills to complete diverse tasks.
As a community, we have been constraining most robotic systems by designing them for specific tasks, training them on specific datasets, and deploying them within specific environments.
Motivated by the impressive open-set performance and content generation capabilities of web-scale, large-capacity pre-trained models, we devote this survey to exploring how foundation models can be applied to general-purpose robotics.
arXiv Detail & Related papers (2023-12-14T10:02:55Z) - The Future of Fundamental Science Led by Generative Closed-Loop
Artificial Intelligence [67.70415658080121]
Recent advances in machine learning and AI are disrupting technological innovation, product development, and society as a whole.
AI has contributed less to fundamental science in part because large data sets of high-quality data for scientific practice and model discovery are more difficult to access.
Here we explore and investigate aspects of an AI-driven, automated, closed-loop approach to scientific discovery.
arXiv Detail & Related papers (2023-07-09T21:16:56Z) - Towards Generalist Robots: A Promising Paradigm via Generative
Simulation [18.704506851738365]
This document serves as a position paper that outlines the authors' vision for a potential pathway towards generalist robots.
The authors believe the proposed paradigm is a feasible path towards accomplishing the long-standing goal of robotics research.
arXiv Detail & Related papers (2023-05-17T02:53:58Z) - WenLan 2.0: Make AI Imagine via a Multimodal Foundation Model [74.4875156387271]
We develop a novel foundation model pre-trained with huge multimodal (visual and textual) data.
We show that state-of-the-art results can be obtained on a wide range of downstream tasks.
arXiv Detail & Related papers (2021-10-27T12:25:21Z) - Towards open and expandable cognitive AI architectures for large-scale
multi-agent human-robot collaborative learning [5.478764356647437]
A novel cognitive architecture for multi-agent LfD robotic learning is introduced, targeting to enable the reliable deployment of open, scalable and expandable robotic systems.
The conceptualization relies on employing multiple AI-empowered cognitive processes that operate at the edge nodes of a network of robotic platforms.
The applicability of the proposed framework is explained using an example of a real-world industrial case study.
arXiv Detail & Related papers (2020-12-15T09:49:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.