AMMUS : A Survey of Transformer-based Pretrained Models in Natural
Language Processing
- URL: http://arxiv.org/abs/2108.05542v1
- Date: Thu, 12 Aug 2021 05:32:18 GMT
- Title: AMMUS : A Survey of Transformer-based Pretrained Models in Natural
Language Processing
- Authors: Katikapalli Subramanyam Kalyan, Ajit Rajasekharan, Sivanesan Sangeetha
- Abstract summary: Transformer-based pretrained language models (T-PTLMs) have achieved great success in almost every NLP task.
Transformed-based PTLMs learn universal language representations from large volumes of text data using self-supervised learning.
These models provide good background knowledge to downstream tasks which avoids training of downstream models from scratch.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformer-based pretrained language models (T-PTLMs) have achieved great
success in almost every NLP task. The evolution of these models started with
GPT and BERT. These models are built on the top of transformers,
self-supervised learning and transfer learning. Transformed-based PTLMs learn
universal language representations from large volumes of text data using
self-supervised learning and transfer this knowledge to downstream tasks. These
models provide good background knowledge to downstream tasks which avoids
training of downstream models from scratch. In this comprehensive survey paper,
we initially give a brief overview of self-supervised learning. Next, we
explain various core concepts like pretraining, pretraining methods,
pretraining tasks, embeddings and downstream adaptation methods. Next, we
present a new taxonomy of T-PTLMs and then give brief overview of various
benchmarks including both intrinsic and extrinsic. We present a summary of
various useful libraries to work with T-PTLMs. Finally, we highlight some of
the future research directions which will further improve these models. We
strongly believe that this comprehensive survey paper will serve as a good
reference to learn the core concepts as well as to stay updated with the recent
happenings in T-PTLMs.
Related papers
- Continual Learning for Large Language Models: A Survey [95.79977915131145]
Large language models (LLMs) are not amenable to frequent re-training, due to high training costs arising from their massive scale.
This paper surveys recent works on continual learning for LLMs.
arXiv Detail & Related papers (2024-02-02T12:34:09Z) - PASTA: Pretrained Action-State Transformer Agents [10.654719072766495]
Self-supervised learning has brought about a revolutionary paradigm shift in various computing domains.
Recent approaches involve pre-training transformer models on vast amounts of unlabeled data.
In reinforcement learning, researchers have recently adapted these approaches, developing models pre-trained on expert trajectories.
arXiv Detail & Related papers (2023-07-20T15:09:06Z) - GPT4Image: Can Large Pre-trained Models Help Vision Models on Perception
Tasks? [51.22096780511165]
We present a new learning paradigm in which the knowledge extracted from large pre-trained models are utilized to help models like CNN and ViT learn enhanced representations.
We feed detailed descriptions into a pre-trained encoder to extract text embeddings with rich semantic information that encodes the content of images.
arXiv Detail & Related papers (2023-06-01T14:02:45Z) - Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey [66.18478838828231]
Multi-modal pre-trained big models have drawn more and more attention in recent years.
This paper introduces the background of multi-modal pre-training by reviewing the conventional deep, pre-training works in natural language process, computer vision, and speech.
Then, we introduce the task definition, key challenges, and advantages of multi-modal pre-training models (MM-PTMs), and discuss the MM-PTMs with a focus on data, objectives, network, and knowledge enhanced pre-training.
arXiv Detail & Related papers (2023-02-20T15:34:03Z) - Foundation Models for Natural Language Processing -- Pre-trained
Language Models Integrating Media [0.0]
Foundation Models are pre-trained language models for Natural Language Processing.
They can be applied to a wide range of different media and problem domains, ranging from image and video processing to robot control learning.
This book provides a comprehensive overview of the state of the art in research and applications of Foundation Models.
arXiv Detail & Related papers (2023-02-16T20:42:04Z) - Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP)
What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining.
How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z) - A Survey of Vision-Language Pre-Trained Models [41.323956143107644]
Pre-trained models have advanced at a breakneck pace in recent years.
How to adapt pre-training to the field of Vision-and-Language learning and improve the performance on downstream tasks becomes a focus of multimodal learning.
arXiv Detail & Related papers (2022-02-18T15:15:46Z) - Recent Advances in Natural Language Processing via Large Pre-Trained
Language Models: A Survey [67.82942975834924]
Large, pre-trained language models such as BERT have drastically changed the Natural Language Processing (NLP) field.
We present a survey of recent work that uses these large language models to solve NLP tasks via pre-training then fine-tuning, prompting, or text generation approaches.
arXiv Detail & Related papers (2021-11-01T20:08:05Z) - Lifelong Pretraining: Continually Adapting Language Models to Emerging
Corpora [31.136334214818305]
We study a lifelong language model pretraining challenge where a PTLM is continually updated so as to adapt to emerging data.
Over a domain-incremental research paper stream and a chronologically ordered tweet stream, we incrementally pretrain a PTLM with different continual learning algorithms.
Our experiments show continual learning algorithms improve knowledge preservation, with logit distillation being the most effective approach.
arXiv Detail & Related papers (2021-10-16T09:59:33Z) - Exploring the Limits of Transfer Learning with a Unified Text-to-Text
Transformer [64.22926988297685]
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP)
In this paper, we explore the landscape of introducing transfer learning techniques for NLP by a unified framework that converts all text-based language problems into a text-to-text format.
arXiv Detail & Related papers (2019-10-23T17:37:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.