A Survey on Robotics with Foundation Models: toward Embodied AI
- URL: http://arxiv.org/abs/2402.02385v1
- Date: Sun, 4 Feb 2024 07:55:01 GMT
- Title: A Survey on Robotics with Foundation Models: toward Embodied AI
- Authors: Zhiyuan Xu, Kun Wu, Junjie Wen, Jinming Li, Ning Liu, Zhengping Che,
Jian Tang
- Abstract summary: Recent advances in computer vision, natural language processing, and multi-modality learning have shown that the foundation models have superhuman capabilities for specific tasks.
This survey aims to provide a comprehensive and up-to-date overview of foundation models in robotics, focusing on autonomous manipulation and encompassing high-level planning and low-level control.
- Score: 30.999414445286757
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While the exploration for embodied AI has spanned multiple decades, it
remains a persistent challenge to endow agents with human-level intelligence,
including perception, learning, reasoning, decision-making, control, and
generalization capabilities, so that they can perform general-purpose tasks in
open, unstructured, and dynamic environments. Recent advances in computer
vision, natural language processing, and multi-modality learning have shown
that the foundation models have superhuman capabilities for specific tasks.
They not only provide a solid cornerstone for integrating basic modules into
embodied AI systems but also shed light on how to scale up robot learning from
a methodological perspective. This survey aims to provide a comprehensive and
up-to-date overview of foundation models in robotics, focusing on autonomous
manipulation and encompassing high-level planning and low-level control.
Moreover, we showcase their commonly used datasets, simulators, and benchmarks.
Importantly, we emphasize the critical challenges intrinsic to this field and
delineate potential avenues for future research, contributing to advancing the
frontier of academic and industrial discourse.
Related papers
- Foundation Models for Autonomous Robots in Unstructured Environments [15.517532442044962]
The study systematically reviews application of foundation models in two field of robotic and unstructured environment.
Findings showed that linguistic capabilities of LLMs have been utilized more than other features for improving perception in human-robot interactions.
The use of LLMs demonstrated more applications in project management and safety in construction, and natural hazard detection in disaster management.
arXiv Detail & Related papers (2024-07-19T13:26:52Z) - A Survey on Vision-Language-Action Models for Embodied AI [71.16123093739932]
Vision-language-action models (VLAs) have become a foundational element in robot learning.
Various methods have been proposed to enhance traits such as versatility, dexterity, and generalizability.
VLAs serve as high-level task planners capable of decomposing long-horizon tasks into executable subtasks.
arXiv Detail & Related papers (2024-05-23T01:43:54Z) - Position Paper: Agent AI Towards a Holistic Intelligence [53.35971598180146]
We emphasize developing Agent AI -- an embodied system that integrates large foundation models into agent actions.
In this paper, we propose a novel large action model to achieve embodied intelligent behavior, the Agent Foundation Model.
arXiv Detail & Related papers (2024-02-28T16:09:56Z) - Toward General-Purpose Robots via Foundation Models: A Survey and
Meta-Analysis [73.89558418030418]
Most existing robotic systems have been designed for specific tasks, trained on specific datasets, and deployed within specific environments.
Motivated by the impressive open-set performance and content generation capabilities of web-scale, large-capacity pre-trained models, we devote this survey to exploring how foundation models can be applied to robotics.
arXiv Detail & Related papers (2023-12-14T10:02:55Z) - The Future of Fundamental Science Led by Generative Closed-Loop
Artificial Intelligence [67.70415658080121]
Recent advances in machine learning and AI are disrupting technological innovation, product development, and society as a whole.
AI has contributed less to fundamental science in part because large data sets of high-quality data for scientific practice and model discovery are more difficult to access.
Here we explore and investigate aspects of an AI-driven, automated, closed-loop approach to scientific discovery.
arXiv Detail & Related papers (2023-07-09T21:16:56Z) - Towards Generalist Robots: A Promising Paradigm via Generative
Simulation [18.704506851738365]
This document serves as a position paper that outlines the authors' vision for a potential pathway towards generalist robots.
The authors believe the proposed paradigm is a feasible path towards accomplishing the long-standing goal of robotics research.
arXiv Detail & Related papers (2023-05-17T02:53:58Z) - World Models and Predictive Coding for Cognitive and Developmental
Robotics: Frontiers and Challenges [51.92834011423463]
We focus on the two concepts of world models and predictive coding.
In neuroscience, predictive coding proposes that the brain continuously predicts its inputs and adapts to model its own dynamics and control behavior in its environment.
arXiv Detail & Related papers (2023-01-14T06:38:14Z) - WenLan 2.0: Make AI Imagine via a Multimodal Foundation Model [74.4875156387271]
We develop a novel foundation model pre-trained with huge multimodal (visual and textual) data.
We show that state-of-the-art results can be obtained on a wide range of downstream tasks.
arXiv Detail & Related papers (2021-10-27T12:25:21Z) - From SLAM to Situational Awareness: Challenges and Survey [0.0]
The capability of a mobile robot to efficiently and safely perform complex missions is limited by its knowledge of the environment.
Advanced reasoning, decision-making, and execution skills enable an intelligent agent to act autonomously in unknown environments.
This paper investigates each aspect of Situational Awareness, surveying the state-of-the-art robotics algorithms that cover them.
arXiv Detail & Related papers (2021-10-01T09:00:34Z) - Towards open and expandable cognitive AI architectures for large-scale
multi-agent human-robot collaborative learning [5.478764356647437]
A novel cognitive architecture for multi-agent LfD robotic learning is introduced, targeting to enable the reliable deployment of open, scalable and expandable robotic systems.
The conceptualization relies on employing multiple AI-empowered cognitive processes that operate at the edge nodes of a network of robotic platforms.
The applicability of the proposed framework is explained using an example of a real-world industrial case study.
arXiv Detail & Related papers (2020-12-15T09:49:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.