Real-World Robot Applications of Foundation Models: A Review
- URL: http://arxiv.org/abs/2402.05741v1
- Date: Thu, 8 Feb 2024 15:19:50 GMT
- Title: Real-World Robot Applications of Foundation Models: A Review
- Authors: Kento Kawaharazuka, Tatsuya Matsushima, Andrew Gambardella, Jiaxian
Guo, Chris Paxton, Andy Zeng
- Abstract summary: Recent developments in foundation models, like Large Language Models (LLMs) and Vision-Language Models (VLMs), facilitate flexible application across different tasks and modalities.
This paper provides an overview of the practical application of foundation models in real-world robotics.
- Score: 27.054641862377807
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent developments in foundation models, like Large Language Models (LLMs)
and Vision-Language Models (VLMs), trained on extensive data, facilitate
flexible application across different tasks and modalities. Their impact spans
various fields, including healthcare, education, and robotics. This paper
provides an overview of the practical application of foundation models in
real-world robotics, with a primary emphasis on the replacement of specific
components within existing robot systems. The summary encompasses the
perspective of input-output relationships in foundation models, as well as
their role in perception, motion planning, and control within the field of
robotics. This paper concludes with a discussion of future challenges and
implications for practical robot applications.
Related papers
- A Survey on Robotics with Foundation Models: toward Embodied AI [30.999414445286757]
Recent advances in computer vision, natural language processing, and multi-modality learning have shown that the foundation models have superhuman capabilities for specific tasks.
This survey aims to provide a comprehensive and up-to-date overview of foundation models in robotics, focusing on autonomous manipulation and encompassing high-level planning and low-level control.
arXiv Detail & Related papers (2024-02-04T07:55:01Z) - AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents [109.3804962220498]
AutoRT is a system to scale up the deployment of operational robots in completely unseen scenarios with minimal human supervision.
We demonstrate AutoRT proposing instructions to over 20 robots across multiple buildings and collecting 77k real robot episodes via both teleoperation and autonomous robot policies.
We experimentally show that such "in-the-wild" data collected by AutoRT is significantly more diverse, and that AutoRT's use of LLMs allows for instruction following data collection robots that can align to human preferences.
arXiv Detail & Related papers (2024-01-23T18:45:54Z) - Toward General-Purpose Robots via Foundation Models: A Survey and
Meta-Analysis [73.89558418030418]
Most existing robotic systems have been designed for specific tasks, trained on specific datasets, and deployed within specific environments.
Motivated by the impressive open-set performance and content generation capabilities of web-scale, large-capacity pre-trained models, we devote this survey to exploring how foundation models can be applied to robotics.
arXiv Detail & Related papers (2023-12-14T10:02:55Z) - Foundational Models Defining a New Era in Vision: A Survey and Outlook [151.49434496615427]
Vision systems to see and reason about the compositional nature of visual scenes are fundamental to understanding our world.
The models learned to bridge the gap between such modalities coupled with large-scale training data facilitate contextual reasoning, generalization, and prompt capabilities at test time.
The output of such models can be modified through human-provided prompts without retraining, e.g., segmenting a particular object by providing a bounding box, having interactive dialogues by asking questions about an image or video scene or manipulating the robot's behavior through language instructions.
arXiv Detail & Related papers (2023-07-25T17:59:18Z) - Transferring Foundation Models for Generalizable Robotic Manipulation [82.12754319808197]
We propose a novel paradigm that effectively leverages language-reasoning segmentation mask generated by internet-scale foundation models.
Our approach can effectively and robustly perceive object pose and enable sample-efficient generalization learning.
Demos can be found in our submitted video, and more comprehensive ones can be found in link1 or link2.
arXiv Detail & Related papers (2023-06-09T07:22:12Z) - Foundation Models for Decision Making: Problems, Methods, and
Opportunities [124.79381732197649]
Foundation models pretrained on diverse data at scale have demonstrated extraordinary capabilities in a wide range of vision and language tasks.
New paradigms are emerging for training foundation models to interact with other agents and perform long-term reasoning.
Research at the intersection of foundation models and decision making holds tremendous promise for creating powerful new systems.
arXiv Detail & Related papers (2023-03-07T18:44:07Z) - RT-1: Robotics Transformer for Real-World Control at Scale [98.09428483862165]
We present a model class, dubbed Robotics Transformer, that exhibits promising scalable model properties.
We verify our conclusions in a study of different model classes and their ability to generalize as a function of the data size, model size, and data diversity based on a large-scale data collection on real robots performing real-world tasks.
arXiv Detail & Related papers (2022-12-13T18:55:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.