Foundation Models for Decision Making: Problems, Methods, and
Opportunities
- URL: http://arxiv.org/abs/2303.04129v1
- Date: Tue, 7 Mar 2023 18:44:07 GMT
- Title: Foundation Models for Decision Making: Problems, Methods, and
Opportunities
- Authors: Sherry Yang, Ofir Nachum, Yilun Du, Jason Wei, Pieter Abbeel, Dale
Schuurmans
- Abstract summary: Foundation models pretrained on diverse data at scale have demonstrated extraordinary capabilities in a wide range of vision and language tasks.
New paradigms are emerging for training foundation models to interact with other agents and perform long-term reasoning.
Research at the intersection of foundation models and decision making holds tremendous promise for creating powerful new systems.
- Score: 124.79381732197649
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Foundation models pretrained on diverse data at scale have demonstrated
extraordinary capabilities in a wide range of vision and language tasks. When
such models are deployed in real world environments, they inevitably interface
with other entities and agents. For example, language models are often used to
interact with human beings through dialogue, and visual perception models are
used to autonomously navigate neighborhood streets. In response to these
developments, new paradigms are emerging for training foundation models to
interact with other agents and perform long-term reasoning. These paradigms
leverage the existence of ever-larger datasets curated for multimodal,
multitask, and generalist interaction. Research at the intersection of
foundation models and decision making holds tremendous promise for creating
powerful new systems that can interact effectively across a diverse range of
applications such as dialogue, autonomous driving, healthcare, education, and
robotics. In this manuscript, we examine the scope of foundation models for
decision making, and provide conceptual tools and technical background for
understanding the problem space and exploring new research directions. We
review recent approaches that ground foundation models in practical decision
making applications through a variety of methods such as prompting, conditional
generative modeling, planning, optimal control, and reinforcement learning, and
discuss common challenges and open problems in the field.
Related papers
- HEMM: Holistic Evaluation of Multimodal Foundation Models [91.60364024897653]
Multimodal foundation models can holistically process text alongside images, video, audio, and other sensory modalities.
It is challenging to characterize and study progress in multimodal foundation models, given the range of possible modeling decisions, tasks, and domains.
arXiv Detail & Related papers (2024-07-03T18:00:48Z) - An Interactive Agent Foundation Model [49.77861810045509]
We propose an Interactive Agent Foundation Model that uses a novel multi-task agent training paradigm for training AI agents.
Our training paradigm unifies diverse pre-training strategies, including visual masked auto-encoders, language modeling, and next-action prediction.
We demonstrate the performance of our framework across three separate domains -- Robotics, Gaming AI, and Healthcare.
arXiv Detail & Related papers (2024-02-08T18:58:02Z) - Delving into Multi-modal Multi-task Foundation Models for Road Scene Understanding: From Learning Paradigm Perspectives [56.2139730920855]
We present a systematic analysis of MM-VUFMs specifically designed for road scenes.
Our objective is to provide a comprehensive overview of common practices, referring to task-specific models, unified multi-modal models, unified multi-task models, and foundation model prompting techniques.
We provide insights into key challenges and future trends, such as closed-loop driving systems, interpretability, embodied driving agents, and world models.
arXiv Detail & Related papers (2024-02-05T12:47:09Z) - Foundational Models Defining a New Era in Vision: A Survey and Outlook [151.49434496615427]
Vision systems to see and reason about the compositional nature of visual scenes are fundamental to understanding our world.
The models learned to bridge the gap between such modalities coupled with large-scale training data facilitate contextual reasoning, generalization, and prompt capabilities at test time.
The output of such models can be modified through human-provided prompts without retraining, e.g., segmenting a particular object by providing a bounding box, having interactive dialogues by asking questions about an image or video scene or manipulating the robot's behavior through language instructions.
arXiv Detail & Related papers (2023-07-25T17:59:18Z) - Foundation models in brief: A historical, socio-technical focus [2.5991265608180396]
Foundation models can be disruptive for future AI development by scaling up deep learning.
Models achieve state-of-the-art performance on a variety of tasks in domains such as natural language processing and computer vision.
arXiv Detail & Related papers (2022-12-17T22:11:33Z) - DIME: Fine-grained Interpretations of Multimodal Models via Disentangled
Local Explanations [119.1953397679783]
We focus on advancing the state-of-the-art in interpreting multimodal models.
Our proposed approach, DIME, enables accurate and fine-grained analysis of multimodal models.
arXiv Detail & Related papers (2022-03-03T20:52:47Z) - Neurosymbolic AI for Situated Language Understanding [13.249453757295083]
We argue that computational situated grounding provides a solution to some of these learning challenges.
Our model reincorporates some ideas of classic AI into a framework of neurosymbolic intelligence.
We discuss how situated grounding provides diverse data and multiple levels of modeling for a variety of AI learning challenges.
arXiv Detail & Related papers (2020-12-05T05:03:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.