Related papers: AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing

AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing

URL: http://arxiv.org/abs/2108.05542v1
Date: Thu, 12 Aug 2021 05:32:18 GMT
Title: AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing
Authors: Katikapalli Subramanyam Kalyan, Ajit Rajasekharan, Sivanesan Sangeetha
Abstract summary: Transformer-based pretrained language models (T-PTLMs) have achieved great success in almost every NLP task. Transformed-based PTLMs learn universal language representations from large volumes of text data using self-supervised learning. These models provide good background knowledge to downstream tasks which avoids training of downstream models from scratch.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Transformer-based pretrained language models (T-PTLMs) have achieved great success in almost every NLP task. The evolution of these models started with GPT and BERT. These models are built on the top of transformers, self-supervised learning and transfer learning. Transformed-based PTLMs learn universal language representations from large volumes of text data using self-supervised learning and transfer this knowledge to downstream tasks. These models provide good background knowledge to downstream tasks which avoids training of downstream models from scratch. In this comprehensive survey paper, we initially give a brief overview of self-supervised learning. Next, we explain various core concepts like pretraining, pretraining methods, pretraining tasks, embeddings and downstream adaptation methods. Next, we present a new taxonomy of T-PTLMs and then give brief overview of various benchmarks including both intrinsic and extrinsic. We present a summary of various useful libraries to work with T-PTLMs. Finally, we highlight some of the future research directions which will further improve these models. We strongly believe that this comprehensive survey paper will serve as a good reference to learn the core concepts as well as to stay updated with the recent happenings in T-PTLMs.

Related papers

LLM Post-Training: A Deep Dive into Reasoning Large Language Models [131.10969986056]
Large Language Models (LLMs) have transformed the natural language processing landscape and brought to life diverse applications. Post-training methods enable LLMs to refine their knowledge, improve reasoning, enhance factual accuracy, and align more effectively with user intents and ethical considerations.
arXiv Detail & Related papers (2025-02-28T18:59:54Z)
Can bidirectional encoder become the ultimate winner for downstream applications of foundation models? [1.8120356834558644]
Foundational models have the characteristics of pre-training, transfer learning, and self-supervised learning. BERT broke through the limitation of only using one-way methods for language modeling in pre-training by using a masked language model. This article analyzes one-way and bidirectional models based on GPT and BERT and compares their differences based on the purpose of the model.
arXiv Detail & Related papers (2024-11-27T03:31:14Z)
Continual Learning for Large Language Models: A Survey [95.79977915131145]
Large language models (LLMs) are not amenable to frequent re-training, due to high training costs arising from their massive scale. This paper surveys recent works on continual learning for LLMs.
arXiv Detail & Related papers (2024-02-02T12:34:09Z)
PASTA: Pretrained Action-State Transformer Agents [10.654719072766495]
Self-supervised learning has brought about a revolutionary paradigm shift in various computing domains. Recent approaches involve pre-training transformer models on vast amounts of unlabeled data. In reinforcement learning, researchers have recently adapted these approaches, developing models pre-trained on expert trajectories.
arXiv Detail & Related papers (2023-07-20T15:09:06Z)
GPT4Image: Can Large Pre-trained Models Help Vision Models on Perception Tasks? [51.22096780511165]
We present a new learning paradigm in which the knowledge extracted from large pre-trained models are utilized to help models like CNN and ViT learn enhanced representations. We feed detailed descriptions into a pre-trained encoder to extract text embeddings with rich semantic information that encodes the content of images.
arXiv Detail & Related papers (2023-06-01T14:02:45Z)
Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey [66.18478838828231]
Multi-modal pre-trained big models have drawn more and more attention in recent years. This paper introduces the background of multi-modal pre-training by reviewing the conventional deep, pre-training works in natural language process, computer vision, and speech. Then, we introduce the task definition, key challenges, and advantages of multi-modal pre-training models (MM-PTMs), and discuss the MM-PTMs with a focus on data, objectives, network, and knowledge enhanced pre-training.
arXiv Detail & Related papers (2023-02-20T15:34:03Z)
Foundation Models for Natural Language Processing -- Pre-trained Language Models Integrating Media [0.0]
Foundation Models are pre-trained language models for Natural Language Processing. They can be applied to a wide range of different media and problem domains, ranging from image and video processing to robot control learning. This book provides a comprehensive overview of the state of the art in research and applications of Foundation Models.
arXiv Detail & Related papers (2023-02-16T20:42:04Z)
Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP) What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining. How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z)
A Survey of Vision-Language Pre-Trained Models [41.323956143107644]
Pre-trained models have advanced at a breakneck pace in recent years. How to adapt pre-training to the field of Vision-and-Language learning and improve the performance on downstream tasks becomes a focus of multimodal learning.
arXiv Detail & Related papers (2022-02-18T15:15:46Z)
Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A Survey [67.82942975834924]
Large, pre-trained language models such as BERT have drastically changed the Natural Language Processing (NLP) field. We present a survey of recent work that uses these large language models to solve NLP tasks via pre-training then fine-tuning, prompting, or text generation approaches.
arXiv Detail & Related papers (2021-11-01T20:08:05Z)
Lifelong Pretraining: Continually Adapting Language Models to Emerging Corpora [31.136334214818305]
We study a lifelong language model pretraining challenge where a PTLM is continually updated so as to adapt to emerging data. Over a domain-incremental research paper stream and a chronologically ordered tweet stream, we incrementally pretrain a PTLM with different continual learning algorithms. Our experiments show continual learning algorithms improve knowledge preservation, with logit distillation being the most effective approach.
arXiv Detail & Related papers (2021-10-16T09:59:33Z)
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer [64.22926988297685]
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP) In this paper, we explore the landscape of introducing transfer learning techniques for NLP by a unified framework that converts all text-based language problems into a text-to-text format.
arXiv Detail & Related papers (2019-10-23T17:37:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.