Foundation Model Sherpas: Guiding Foundation Models through Knowledge
and Reasoning
- URL: http://arxiv.org/abs/2402.01602v1
- Date: Fri, 2 Feb 2024 18:00:35 GMT
- Title: Foundation Model Sherpas: Guiding Foundation Models through Knowledge
and Reasoning
- Authors: Debarun Bhattacharjya, Junkyu Lee, Don Joven Agravante, Balaji
Ganesan, Radu Marinescu
- Abstract summary: Foundation models (FMs) have revolutionized the field of AI by showing remarkable performance in various tasks.
FMs exhibit numerous limitations that prevent their broader adoption in many real-world systems.
We propose a conceptual framework that encapsulates different modes by which agents could interact with FMs.
- Score: 23.763256908202496
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Foundation models (FMs) such as large language models have revolutionized the
field of AI by showing remarkable performance in various tasks. However, they
exhibit numerous limitations that prevent their broader adoption in many
real-world systems, which often require a higher bar for trustworthiness and
usability. Since FMs are trained using loss functions aimed at reconstructing
the training corpus in a self-supervised manner, there is no guarantee that the
model's output aligns with users' preferences for a specific task at hand. In
this survey paper, we propose a conceptual framework that encapsulates
different modes by which agents could interact with FMs and guide them suitably
for a set of tasks, particularly through knowledge augmentation and reasoning.
Our framework elucidates agent role categories such as updating the underlying
FM, assisting with prompting the FM, and evaluating the FM output. We also
categorize several state-of-the-art approaches into agent interaction
protocols, highlighting the nature and extent of involvement of the various
agent roles. The proposed framework provides guidance for future directions to
further realize the power of FMs in practical AI systems.
Related papers
- Specialized Foundation Models Struggle to Beat Supervised Baselines [60.23386520331143]
We look at three modalities -- genomics, satellite imaging, and time series -- with multiple recent FMs and compare them to a standard supervised learning workflow.
We find that it is consistently possible to train simple supervised models that match or even outperform the latest foundation models.
arXiv Detail & Related papers (2024-11-05T04:10:59Z) - Swiss Army Knife: Synergizing Biases in Knowledge from Vision Foundation Models for Multi-Task Learning [29.33199582163445]
Vision Foundation Models (VFMs) have demonstrated outstanding performance on numerous downstream tasks.
Due to their inherent representation biases, VFMs exhibit advantages and disadvantages across distinct vision tasks.
We propose a novel and versatile "Swiss Army Knife" (SAK) solution, which adaptively distills knowledge from a committee of VFMs to enhance multi-task learning.
arXiv Detail & Related papers (2024-10-18T17:32:39Z) - Large Model for Small Data: Foundation Model for Cross-Modal RF Human Activity Recognition [7.351361666395708]
We introduce FM-Fi, a cross-modal framework engineered to translate the knowledge of vision-based FMs for enhancing RF-based HAR systems.
FM-Fi involves a novel cross-modal contrastive knowledge distillation mechanism, enabling an RF encoder to inherit the interpretative power of FMs.
It also employs the intrinsic capabilities of FM and RF to remove extraneous features for better alignment between the two modalities.
arXiv Detail & Related papers (2024-10-13T03:43:59Z) - On the Workflows and Smells of Leaderboard Operations (LBOps): An Exploratory Study of Foundation Model Leaderboards [11.99718417371013]
This research focuses on understanding how these FM leaderboards operate in real-world scenarios ("leaderboard operations")
We identify 5 unique workflow patterns and develop a domain model that outlines the essential components and their interaction within FM leaderboards.
We then identify 8 unique types of leaderboard smells in LBOps.
arXiv Detail & Related papers (2024-07-04T17:12:00Z) - On the Evaluation of Speech Foundation Models for Spoken Language Understanding [87.52911510306011]
The Spoken Language Understanding Evaluation (SLUE) suite of benchmark tasks was recently introduced to address the need for open resources and benchmarking.
The benchmark has demonstrated preliminary success in using pre-trained speech foundation models (SFM) for these SLU tasks.
We ask: which SFMs offer the most benefits for these complex SLU tasks, and what is the most effective approach for incorporating these SFMs?
arXiv Detail & Related papers (2024-06-14T14:37:52Z) - MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild [81.32127423981426]
Multimodal emotion recognition based on audio and video data is important for real-world applications.
Recent methods have focused on exploiting advances of self-supervised learning (SSL) for pre-training of strong multimodal encoders.
We propose a different perspective on the problem and investigate the advancement of multimodal DFER performance by adapting SSL-pre-trained disjoint unimodal encoders.
arXiv Detail & Related papers (2024-04-13T13:39:26Z) - Delving into Multi-modal Multi-task Foundation Models for Road Scene Understanding: From Learning Paradigm Perspectives [56.2139730920855]
We present a systematic analysis of MM-VUFMs specifically designed for road scenes.
Our objective is to provide a comprehensive overview of common practices, referring to task-specific models, unified multi-modal models, unified multi-task models, and foundation model prompting techniques.
We provide insights into key challenges and future trends, such as closed-loop driving systems, interpretability, embodied driving agents, and world models.
arXiv Detail & Related papers (2024-02-05T12:47:09Z) - A Survey on Efficient Federated Learning Methods for Foundation Model Training [62.473245910234304]
Federated Learning (FL) has become an established technique to facilitate privacy-preserving collaborative training across a multitude of clients.
In the wake of Foundation Models (FM), the reality is different for many deep learning applications.
We discuss the benefits and drawbacks of parameter-efficient fine-tuning (PEFT) for FL applications.
arXiv Detail & Related papers (2024-01-09T10:22:23Z) - Learn From Model Beyond Fine-Tuning: A Survey [78.80920533793595]
Learn From Model (LFM) focuses on the research, modification, and design of foundation models (FM) based on the model interface.
The study of LFM techniques can be broadly categorized into five major areas: model tuning, model distillation, model reuse, meta learning and model editing.
This paper gives a comprehensive review of the current methods based on FM from the perspective of LFM.
arXiv Detail & Related papers (2023-10-12T10:20:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.