Related papers: Toward General-Purpose Robots via Foundation Models: A Survey and Meta-Analysis

Toward General-Purpose Robots via Foundation Models: A Survey and Meta-Analysis

URL: http://arxiv.org/abs/2312.08782v3
Date: Tue, 01 Oct 2024 08:54:53 GMT
Title: Toward General-Purpose Robots via Foundation Models: A Survey and Meta-Analysis
Authors: Yafei Hu, Quanting Xie, Vidhi Jain, Jonathan Francis, Jay Patrikar, Nikhil Keetha, Seungchan Kim, Yaqi Xie, Tianyi Zhang, Hao-Shu Fang, Shibo Zhao, Shayegan Omidshafiei, Dong-Ki Kim, Ali-akbar Agha-mohammadi, Katia Sycara, Matthew Johnson-Roberson, Dhruv Batra, Xiaolong Wang, Sebastian Scherer, Chen Wang, Zsolt Kira, Fei Xia, Yonatan Bisk,
Abstract summary: General-purpose robots operate seamlessly in any environment, with any object, and utilize various skills to complete diverse tasks. As a community, we have been constraining most robotic systems by designing them for specific tasks, training them on specific datasets, and deploying them within specific environments. Motivated by the impressive open-set performance and content generation capabilities of web-scale, large-capacity pre-trained models, we devote this survey to exploring how foundation models can be applied to general-purpose robotics.
Score: 82.59451639072073
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Building general-purpose robots that operate seamlessly in any environment, with any object, and utilizing various skills to complete diverse tasks has been a long-standing goal in Artificial Intelligence. However, as a community, we have been constraining most robotic systems by designing them for specific tasks, training them on specific datasets, and deploying them within specific environments. These systems require extensively-labeled data and task-specific models. When deployed in real-world scenarios, such systems face several generalization issues and struggle to remain robust to distribution shifts. Motivated by the impressive open-set performance and content generation capabilities of web-scale, large-capacity pre-trained models (i.e., foundation models) in research fields such as Natural Language Processing (NLP) and Computer Vision (CV), we devote this survey to exploring (i) how these existing foundation models from NLP and CV can be applied to the field of general-purpose robotics, and also exploring (ii) what a robotics-specific foundation model would look like. We begin by providing a generalized formulation of how foundation models are used in robotics, and the fundamental barriers to making generalist robots universally applicable. Next, we establish a taxonomy to discuss current work exploring ways to leverage existing foundation models for robotics and develop ones catered to robotics. Finally, we discuss key challenges and promising future directions in using foundation models for enabling general-purpose robotic systems. We encourage readers to view our living GitHub repository 2 of resources, including papers reviewed in this survey, as well as related projects and repositories for developing foundation models for robotics.

Related papers

Embodied AI with Foundation Models for Mobile Service Robots: A Systematic Review [4.540236408836132]
We present the first systematic review of the integration of foundation models in mobile service robotics.<n>We explore the role of such models in enabling real-time sensor fusion, language-conditioned control, and adaptive task execution.<n>We also discuss real-world applications in the domestic assistance, healthcare, and service automation sectors.
arXiv Detail & Related papers (2025-05-26T20:08:09Z)
Deploying Foundation Model-Enabled Air and Ground Robots in the Field: Challenges and Opportunities [65.98704516122228]
The integration of foundation models (FMs) into robotics has enabled robots to understand natural language and reason about the semantics in their environments.<n>This paper addresses the deployment of FM-enabled robots in the field, where missions often require a robot to operate in large-scale and unstructured environments.<n>We present the first demonstration of large-scale LLM-enabled robot planning in unstructured environments with several kilometers of missions.
arXiv Detail & Related papers (2025-05-14T15:28:43Z)
GR00T N1: An Open Foundation Model for Generalist Humanoid Robots [133.23509142762356]
General-purpose robots need a versatile body and an intelligent mind. Recent advancements in humanoid robots have shown great promise as a hardware platform for building generalist autonomy. We introduce GR00T N1, an open foundation model for humanoid robots.
arXiv Detail & Related papers (2025-03-18T21:06:21Z)
$π_0$: A Vision-Language-Action Flow Model for General Robot Control [77.32743739202543]
We propose a novel flow matching architecture built on top of a pre-trained vision-language model (VLM) to inherit Internet-scale semantic knowledge. We evaluate our model in terms of its ability to perform tasks in zero shot after pre-training, follow language instructions from people, and its ability to acquire new skills via fine-tuning.
arXiv Detail & Related papers (2024-10-31T17:22:30Z)
Robot Utility Models: General Policies for Zero-Shot Deployment in New Environments [26.66666135624716]
We present Robot Utility Models (RUMs), a framework for training and deploying zero-shot robot policies. RUMs can generalize to new environments without any finetuning. We train five utility models for opening cabinet doors, opening drawers, picking up napkins, picking up paper bags, and reorienting fallen objects.
arXiv Detail & Related papers (2024-09-09T17:59:50Z)
Real-World Robot Applications of Foundation Models: A Review [25.53250085363019]
Recent developments in foundation models, like Large Language Models (LLMs) and Vision-Language Models (VLMs), facilitate flexible application across different tasks and modalities. This paper provides an overview of the practical application of foundation models in real-world robotics.
arXiv Detail & Related papers (2024-02-08T15:19:50Z)
A Survey on Robotics with Foundation Models: toward Embodied AI [30.999414445286757]
Recent advances in computer vision, natural language processing, and multi-modality learning have shown that the foundation models have superhuman capabilities for specific tasks. This survey aims to provide a comprehensive and up-to-date overview of foundation models in robotics, focusing on autonomous manipulation and encompassing high-level planning and low-level control.
arXiv Detail & Related papers (2024-02-04T07:55:01Z)
AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents [109.3804962220498]
AutoRT is a system to scale up the deployment of operational robots in completely unseen scenarios with minimal human supervision. We demonstrate AutoRT proposing instructions to over 20 robots across multiple buildings and collecting 77k real robot episodes via both teleoperation and autonomous robot policies. We experimentally show that such "in-the-wild" data collected by AutoRT is significantly more diverse, and that AutoRT's use of LLMs allows for instruction following data collection robots that can align to human preferences.
arXiv Detail & Related papers (2024-01-23T18:45:54Z)
RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation [68.70755196744533]
RoboGen is a generative robotic agent that automatically learns diverse robotic skills at scale via generative simulation. Our work attempts to extract the extensive and versatile knowledge embedded in large-scale models and transfer them to the field of robotics.
arXiv Detail & Related papers (2023-11-02T17:59:21Z)
Transferring Foundation Models for Generalizable Robotic Manipulation [82.12754319808197]
We propose a novel paradigm that effectively leverages language-reasoning segmentation mask generated by internet-scale foundation models. Our approach can effectively and robustly perceive object pose and enable sample-efficient generalization learning. Demos can be found in our submitted video, and more comprehensive ones can be found in link1 or link2.
arXiv Detail & Related papers (2023-06-09T07:22:12Z)
Towards Generalist Robots: A Promising Paradigm via Generative Simulation [18.704506851738365]
This document serves as a position paper that outlines the authors' vision for a potential pathway towards generalist robots. The authors believe the proposed paradigm is a feasible path towards accomplishing the long-standing goal of robotics research.
arXiv Detail & Related papers (2023-05-17T02:53:58Z)
RT-1: Robotics Transformer for Real-World Control at Scale [98.09428483862165]
We present a model class, dubbed Robotics Transformer, that exhibits promising scalable model properties. We verify our conclusions in a study of different model classes and their ability to generalize as a function of the data size, model size, and data diversity based on a large-scale data collection on real robots performing real-world tasks.
arXiv Detail & Related papers (2022-12-13T18:55:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.