Foundation models in brief: A historical, socio-technical focus
- URL: http://arxiv.org/abs/2212.08967v1
- Date: Sat, 17 Dec 2022 22:11:33 GMT
- Title: Foundation models in brief: A historical, socio-technical focus
- Authors: Johannes Schneider
- Abstract summary: Foundation models can be disruptive for future AI development by scaling up deep learning.
Models achieve state-of-the-art performance on a variety of tasks in domains such as natural language processing and computer vision.
- Score: 2.5991265608180396
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Foundation models can be disruptive for future AI development by scaling up
deep learning in terms of model size and training data's breadth and size.
These models achieve state-of-the-art performance (often through further
adaptation) on a variety of tasks in domains such as natural language
processing and computer vision. Foundational models exhibit a novel {emergent
behavior}: {In-context learning} enables users to provide a query and a few
examples from which a model derives an answer without being trained on such
queries. Additionally, {homogenization} of models might replace a myriad of
task-specific models with fewer very large models controlled by few
corporations leading to a shift in power and control over AI. This paper
provides a short introduction to foundation models. It contributes by crafting
a crisp distinction between foundation models and prior deep learning models,
providing a history of machine learning leading to foundation models,
elaborating more on socio-technical aspects, i.e., organizational issues and
end-user interaction, and a discussion of future research.
Related papers
- Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities [89.40778301238642]
Model merging is an efficient empowerment technique in the machine learning community.
There is a significant gap in the literature regarding a systematic and thorough review of these techniques.
arXiv Detail & Related papers (2024-08-14T16:58:48Z) - Learning-based Models for Vulnerability Detection: An Extensive Study [3.1317409221921144]
We extensively and comprehensively investigate two types of state-of-the-art learning-based approaches.
We experimentally demonstrate the priority of sequence-based models and the limited abilities of both graph-based models.
arXiv Detail & Related papers (2024-08-14T13:01:30Z) - Deep Generative Models in Robotics: A Survey on Learning from Multimodal Demonstrations [52.11801730860999]
In recent years, the robot learning community has shown increasing interest in using deep generative models to capture the complexity of large datasets.
We present the different types of models that the community has explored, such as energy-based models, diffusion models, action value maps, or generative adversarial networks.
We also present the different types of applications in which deep generative models have been used, from grasp generation to trajectory generation or cost learning.
arXiv Detail & Related papers (2024-08-08T11:34:31Z) - Model Lakes [22.717104096113637]
Given a set of deep learning models, it can be hard to find models appropriate to a task.
Inspired from research on data lakes, we introduce and define the concept of model lakes.
We discuss fundamental research challenges in the management of large models.
arXiv Detail & Related papers (2024-03-04T18:55:50Z) - Matching Pairs: Attributing Fine-Tuned Models to their Pre-Trained Large
Language Models [11.57282859281814]
We consider different knowledge levels and attribution strategies, and find that we can correctly trace back 8 out of the 10 fine tuned models with our best method.
arXiv Detail & Related papers (2023-06-15T17:42:48Z) - Foundation Models for Natural Language Processing -- Pre-trained
Language Models Integrating Media [0.0]
Foundation Models are pre-trained language models for Natural Language Processing.
They can be applied to a wide range of different media and problem domains, ranging from image and video processing to robot control learning.
This book provides a comprehensive overview of the state of the art in research and applications of Foundation Models.
arXiv Detail & Related papers (2023-02-16T20:42:04Z) - Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models.
This creates a barrier to fusing knowledge across individual models to yield a better single model.
We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z) - Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP)
What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining.
How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z) - Synthetic Model Combination: An Instance-wise Approach to Unsupervised
Ensemble Learning [92.89846887298852]
Consider making a prediction over new test data without any opportunity to learn from a training set of labelled data.
Give access to a set of expert models and their predictions alongside some limited information about the dataset used to train them.
arXiv Detail & Related papers (2022-10-11T10:20:31Z) - Model Reprogramming: Resource-Efficient Cross-Domain Machine Learning [65.268245109828]
In data-rich domains such as vision, language, and speech, deep learning prevails to deliver high-performance task-specific models.
Deep learning in resource-limited domains still faces multiple challenges including (i) limited data, (ii) constrained model development cost, and (iii) lack of adequate pre-trained models for effective finetuning.
Model reprogramming enables resource-efficient cross-domain machine learning by repurposing a well-developed pre-trained model from a source domain to solve tasks in a target domain without model finetuning.
arXiv Detail & Related papers (2022-02-22T02:33:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.