Related papers: A Survey on Robotics with Foundation Models: toward Embodied AI

A Survey on Robotics with Foundation Models: toward Embodied AI

URL: http://arxiv.org/abs/2402.02385v1
Date: Sun, 4 Feb 2024 07:55:01 GMT
Title: A Survey on Robotics with Foundation Models: toward Embodied AI
Authors: Zhiyuan Xu, Kun Wu, Junjie Wen, Jinming Li, Ning Liu, Zhengping Che, Jian Tang
Abstract summary: Recent advances in computer vision, natural language processing, and multi-modality learning have shown that the foundation models have superhuman capabilities for specific tasks. This survey aims to provide a comprehensive and up-to-date overview of foundation models in robotics, focusing on autonomous manipulation and encompassing high-level planning and low-level control.
Score: 30.999414445286757
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While the exploration for embodied AI has spanned multiple decades, it remains a persistent challenge to endow agents with human-level intelligence, including perception, learning, reasoning, decision-making, control, and generalization capabilities, so that they can perform general-purpose tasks in open, unstructured, and dynamic environments. Recent advances in computer vision, natural language processing, and multi-modality learning have shown that the foundation models have superhuman capabilities for specific tasks. They not only provide a solid cornerstone for integrating basic modules into embodied AI systems but also shed light on how to scale up robot learning from a methodological perspective. This survey aims to provide a comprehensive and up-to-date overview of foundation models in robotics, focusing on autonomous manipulation and encompassing high-level planning and low-level control. Moreover, we showcase their commonly used datasets, simulators, and benchmarks. Importantly, we emphasize the critical challenges intrinsic to this field and delineate potential avenues for future research, contributing to advancing the frontier of academic and industrial discourse.

Related papers

Embodied AI with Foundation Models for Mobile Service Robots: A Systematic Review [4.540236408836132]
We present the first systematic review of the integration of foundation models in mobile service robotics.<n>We explore the role of such models in enabling real-time sensor fusion, language-conditioned control, and adaptive task execution.<n>We also discuss real-world applications in the domestic assistance, healthcare, and service automation sectors.
arXiv Detail & Related papers (2025-05-26T20:08:09Z)
Autonomous Embodied Agents: When Robotics Meets Deep Learning Reasoning [0.9790236766474201]
This dissertation follows the complete creation process of embodied agents for indoor environments.<n>We aim to contribute to research in Embodied AI and autonomous agents, in order to foster future work in this field.
arXiv Detail & Related papers (2025-05-02T00:43:28Z)
$π_0$: A Vision-Language-Action Flow Model for General Robot Control [77.32743739202543]
We propose a novel flow matching architecture built on top of a pre-trained vision-language model (VLM) to inherit Internet-scale semantic knowledge. We evaluate our model in terms of its ability to perform tasks in zero shot after pre-training, follow language instructions from people, and its ability to acquire new skills via fine-tuning.
arXiv Detail & Related papers (2024-10-31T17:22:30Z)
Grounding Robot Policies with Visuomotor Language Guidance [15.774237279917594]
We propose an agent-based framework for grounding robot policies to the current context. The proposed framework is composed of a set of conversational agents designed for specific roles. We demonstrate that our approach can effectively guide manipulation policies to achieve significantly higher success rates.
arXiv Detail & Related papers (2024-10-09T02:00:37Z)
Deep Learning and Machine Learning, Advancing Big Data Analytics and Management: Unveiling AI's Potential Through Tools, Techniques, and Applications [17.624263707781655]
Artificial intelligence (AI), machine learning, and deep learning have become transformative forces in big data analytics and management. This article delves into the foundational concepts and cutting-edge developments in these fields. By bridging theoretical underpinnings with actionable strategies, it showcases the potential of AI and LLMs to revolutionize big data management.
arXiv Detail & Related papers (2024-10-02T06:24:51Z)
A Survey on Vision-Language-Action Models for Embodied AI [71.16123093739932]
Vision-language-action models (VLAs) have become a foundational element in robot learning. Various methods have been proposed to enhance traits such as versatility, dexterity, and generalizability. VLAs serve as high-level task planners capable of decomposing long-horizon tasks into executable subtasks.
arXiv Detail & Related papers (2024-05-23T01:43:54Z)
Position Paper: Agent AI Towards a Holistic Intelligence [53.35971598180146]
We emphasize developing Agent AI -- an embodied system that integrates large foundation models into agent actions. In this paper, we propose a novel large action model to achieve embodied intelligent behavior, the Agent Foundation Model.
arXiv Detail & Related papers (2024-02-28T16:09:56Z)
Toward General-Purpose Robots via Foundation Models: A Survey and Meta-Analysis [82.59451639072073]
General-purpose robots operate seamlessly in any environment, with any object, and utilize various skills to complete diverse tasks. As a community, we have been constraining most robotic systems by designing them for specific tasks, training them on specific datasets, and deploying them within specific environments. Motivated by the impressive open-set performance and content generation capabilities of web-scale, large-capacity pre-trained models, we devote this survey to exploring how foundation models can be applied to general-purpose robotics.
arXiv Detail & Related papers (2023-12-14T10:02:55Z)
The Future of Fundamental Science Led by Generative Closed-Loop Artificial Intelligence [67.70415658080121]
Recent advances in machine learning and AI are disrupting technological innovation, product development, and society as a whole. AI has contributed less to fundamental science in part because large data sets of high-quality data for scientific practice and model discovery are more difficult to access. Here we explore and investigate aspects of an AI-driven, automated, closed-loop approach to scientific discovery.
arXiv Detail & Related papers (2023-07-09T21:16:56Z)
Towards Generalist Robots: A Promising Paradigm via Generative Simulation [18.704506851738365]
This document serves as a position paper that outlines the authors' vision for a potential pathway towards generalist robots. The authors believe the proposed paradigm is a feasible path towards accomplishing the long-standing goal of robotics research.
arXiv Detail & Related papers (2023-05-17T02:53:58Z)
World Models and Predictive Coding for Cognitive and Developmental Robotics: Frontiers and Challenges [51.92834011423463]
We focus on the two concepts of world models and predictive coding. In neuroscience, predictive coding proposes that the brain continuously predicts its inputs and adapts to model its own dynamics and control behavior in its environment.
arXiv Detail & Related papers (2023-01-14T06:38:14Z)
WenLan 2.0: Make AI Imagine via a Multimodal Foundation Model [74.4875156387271]
We develop a novel foundation model pre-trained with huge multimodal (visual and textual) data. We show that state-of-the-art results can be obtained on a wide range of downstream tasks.
arXiv Detail & Related papers (2021-10-27T12:25:21Z)
Towards open and expandable cognitive AI architectures for large-scale multi-agent human-robot collaborative learning [5.478764356647437]
A novel cognitive architecture for multi-agent LfD robotic learning is introduced, targeting to enable the reliable deployment of open, scalable and expandable robotic systems. The conceptualization relies on employing multiple AI-empowered cognitive processes that operate at the edge nodes of a network of robotic platforms. The applicability of the proposed framework is explained using an example of a real-world industrial case study.
arXiv Detail & Related papers (2020-12-15T09:49:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.