Foundation Models for Natural Language Processing -- Pre-trained
Language Models Integrating Media
- URL: http://arxiv.org/abs/2302.08575v1
- Date: Thu, 16 Feb 2023 20:42:04 GMT
- Title: Foundation Models for Natural Language Processing -- Pre-trained
Language Models Integrating Media
- Authors: Gerhard Paa{\ss} and Sven Giesselbach
- Abstract summary: Foundation Models are pre-trained language models for Natural Language Processing.
They can be applied to a wide range of different media and problem domains, ranging from image and video processing to robot control learning.
This book provides a comprehensive overview of the state of the art in research and applications of Foundation Models.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This open access book provides a comprehensive overview of the state of the
art in research and applications of Foundation Models and is intended for
readers familiar with basic Natural Language Processing (NLP) concepts. Over
the recent years, a revolutionary new paradigm has been developed for training
models for NLP. These models are first pre-trained on large collections of text
documents to acquire general syntactic knowledge and semantic information.
Then, they are fine-tuned for specific tasks, which they can often solve with
superhuman accuracy. When the models are large enough, they can be instructed
by prompts to solve new tasks without any fine-tuning. Moreover, they can be
applied to a wide range of different media and problem domains, ranging from
image and video processing to robot control learning. Because they provide a
blueprint for solving many tasks in artificial intelligence, they have been
called Foundation Models. After a brief introduction to basic NLP models the
main pre-trained language models BERT, GPT and sequence-to-sequence transformer
are described, as well as the concepts of self-attention and context-sensitive
embedding. Then, different approaches to improving these models are discussed,
such as expanding the pre-training criteria, increasing the length of input
texts, or including extra knowledge. An overview of the best-performing models
for about twenty application areas is then presented, e.g., question answering,
translation, story generation, dialog systems, generating images from text,
etc. For each application area, the strengths and weaknesses of current models
are discussed, and an outlook on further developments is given. In addition,
links are provided to freely available program code. A concluding chapter
summarizes the economic opportunities, mitigation of risks, and potential
developments of AI.
Related papers
- Foundational Models Defining a New Era in Vision: A Survey and Outlook [151.49434496615427]
Vision systems to see and reason about the compositional nature of visual scenes are fundamental to understanding our world.
The models learned to bridge the gap between such modalities coupled with large-scale training data facilitate contextual reasoning, generalization, and prompt capabilities at test time.
The output of such models can be modified through human-provided prompts without retraining, e.g., segmenting a particular object by providing a bounding box, having interactive dialogues by asking questions about an image or video scene or manipulating the robot's behavior through language instructions.
arXiv Detail & Related papers (2023-07-25T17:59:18Z) - A Systematic Survey of Prompt Engineering on Vision-Language Foundation
Models [43.35892536887404]
Prompt engineering involves augmenting a large pre-trained model with task-specific hints, known as prompts, to adapt the model to new tasks.
This paper aims to provide a comprehensive survey of cutting-edge research in prompt engineering on three types of vision-language models.
arXiv Detail & Related papers (2023-07-24T17:58:06Z) - GPT4Image: Can Large Pre-trained Models Help Vision Models on Perception
Tasks? [51.22096780511165]
We present a new learning paradigm in which the knowledge extracted from large pre-trained models are utilized to help models like CNN and ViT learn enhanced representations.
We feed detailed descriptions into a pre-trained encoder to extract text embeddings with rich semantic information that encodes the content of images.
arXiv Detail & Related papers (2023-06-01T14:02:45Z) - An Overview on Language Models: Recent Developments and Outlook [32.528770408502396]
Conventional language models (CLMs) aim to predict the probability of linguistic sequences in a causal manner.
Pre-trained language models (PLMs) cover broader concepts and can be used in both causal sequential modeling and fine-tuning for downstream applications.
arXiv Detail & Related papers (2023-03-10T07:55:00Z) - Foundation models in brief: A historical, socio-technical focus [2.5991265608180396]
Foundation models can be disruptive for future AI development by scaling up deep learning.
Models achieve state-of-the-art performance on a variety of tasks in domains such as natural language processing and computer vision.
arXiv Detail & Related papers (2022-12-17T22:11:33Z) - An Overview on Controllable Text Generation via Variational
Auto-Encoders [15.97186478109836]
Recent advances in neural-based generative modeling have reignited the hopes of having computer systems capable of conversing with humans.
Latent variable models (LVM) such as variational auto-encoders (VAEs) are designed to characterize the distributional pattern of textual data.
This overview gives an introduction to existing generation schemes, problems associated with text variational auto-encoders, and a review of several applications about the controllable generation.
arXiv Detail & Related papers (2022-11-15T07:36:11Z) - Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP)
What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining.
How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z) - A Survey of Knowledge Enhanced Pre-trained Models [28.160826399552462]
We refer to pre-trained language models with knowledge injection as knowledge-enhanced pre-trained language models (KEPLMs)
These models demonstrate deep understanding and logical reasoning and introduce interpretability.
arXiv Detail & Related papers (2021-10-01T08:51:58Z) - Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods
in Natural Language Processing [78.8500633981247]
This paper surveys and organizes research works in a new paradigm in natural language processing, which we dub "prompt-based learning"
Unlike traditional supervised learning, which trains a model to take in an input x and predict an output y as P(y|x), prompt-based learning is based on language models that model the probability of text directly.
arXiv Detail & Related papers (2021-07-28T18:09:46Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - InfoBERT: Improving Robustness of Language Models from An Information
Theoretic Perspective [84.78604733927887]
Large-scale language models such as BERT have achieved state-of-the-art performance across a wide range of NLP tasks.
Recent studies show that such BERT-based models are vulnerable facing the threats of textual adversarial attacks.
We propose InfoBERT, a novel learning framework for robust fine-tuning of pre-trained language models.
arXiv Detail & Related papers (2020-10-05T20:49:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.