Related papers: Compression of Deep Learning Models for Text: A Survey

Compression of Deep Learning Models for Text: A Survey

URL: http://arxiv.org/abs/2008.05221v4
Date: Sun, 13 Jun 2021 17:47:28 GMT
Title: Compression of Deep Learning Models for Text: A Survey
Authors: Manish Gupta, Puneet Agrawal
Abstract summary: In recent years, the fields of natural language processing (NLP) and information retrieval (IR) have made tremendous progress. Deep learning models like Recurrent Neural Networks (RNNs), Gated Recurrent Units (GRUs) and Long Short-Term Memory (LSTMs)networks, and Transformer [120] based models like Bidirectional Representations from Transformers (BERT) [24], GenerativePre-training Transformer (GPT-2) [94], Multi-task Deep Neural Network (MT-DNN) [73], Extra-Long Network (XLNet) [134], Text-to-text transfer
Score: 6.532867867011488
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In recent years, the fields of natural language processing (NLP) and information retrieval (IR) have made tremendous progress thanksto deep learning models like Recurrent Neural Networks (RNNs), Gated Recurrent Units (GRUs) and Long Short-Term Memory (LSTMs)networks, and Transformer [120] based models like Bidirectional Encoder Representations from Transformers (BERT) [24], GenerativePre-training Transformer (GPT-2) [94], Multi-task Deep Neural Network (MT-DNN) [73], Extra-Long Network (XLNet) [134], Text-to-text transfer transformer (T5) [95], T-NLG [98] and GShard [63]. But these models are humongous in size. On the other hand,real world applications demand small model size, low response times and low computational power wattage. In this survey, wediscuss six different types of methods (Pruning, Quantization, Knowledge Distillation, Parameter Sharing, Tensor Decomposition, andSub-quadratic Transformer based methods) for compression of such models to enable their deployment in real industry NLP projects.Given the critical need of building applications with efficient and small models, and the large amount of recently published work inthis area, we believe that this survey organizes the plethora of work done by the 'deep learning for NLP' community in the past fewyears and presents it as a coherent story.

Related papers

DeepCoT: Deep Continual Transformers for Real-Time Inference on Data Streams [63.27233749591346]
Transformer-based models have dramatically increased their size and parameter count to tackle increasingly complex tasks.<n>Stream data inference is typically performed over a sliding temporal window, leading to highly redundant computations.<n>We propose the Deep Continual Transformer (DeepCoT), a redundancy-free encoder-only model that can be applied over existing deep encoder architectures with minimal changes.
arXiv Detail & Related papers (2025-11-21T16:15:43Z)
Efficient Language Modeling for Low-Resource Settings with Hybrid RNN-Transformer Architectures [8.442206285783463]
Transformer-based language models have recently been at the forefront of active research in text generation. These models' advances come at the price of prohibitive training costs, with parameter counts in the billions and compute requirements measured in petaflop/s-decades. We investigate transformer-based architectures for improving model performance in a low-data regime by selectively replacing attention layers with feed-forward and quasi-recurrent neural network layers.
arXiv Detail & Related papers (2025-02-02T01:05:09Z)
Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities. In-Context Learning (ICL) and. Efficient Fine-Tuning (PEFT) are currently two mainstream methods for augmenting. LLMs to downstream tasks. We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z)
A Survey on Transformer Compression [84.18094368700379]
Transformer plays a vital role in the realms of natural language processing (NLP) and computer vision (CV) Model compression methods reduce the memory and computational cost of Transformer. This survey provides a comprehensive review of recent compression methods, with a specific focus on their application to Transformer-based models.
arXiv Detail & Related papers (2024-02-05T12:16:28Z)
Enhancing Actuarial Non-Life Pricing Models via Transformers [0.0]
We build on the foundation laid out by the combined actuarial neural network as well as the localGLMnet and enhance those models via the feature tokenizer transformer. The paper shows that the new methods can achieve better results than the benchmark models while preserving certain generalized linear model advantages.
arXiv Detail & Related papers (2023-11-10T12:06:23Z)
The Languini Kitchen: Enabling Language Modelling Research at Different Scales of Compute [66.84421705029624]
We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours. We pre-process an existing large, diverse, and high-quality dataset of books that surpasses existing academic benchmarks in quality, diversity, and document length. This work also provides two baseline models: a feed-forward model derived from the GPT-2 architecture and a recurrent model in the form of a novel LSTM with ten-fold throughput.
arXiv Detail & Related papers (2023-09-20T10:31:17Z)
A Comprehensive Survey on Applications of Transformers for Deep Learning Tasks [60.38369406877899]
Transformer is a deep neural network that employs a self-attention mechanism to comprehend the contextual relationships within sequential data. transformer models excel in handling long dependencies between input sequence elements and enable parallel processing. Our survey encompasses the identification of the top five application domains for transformer-based models.
arXiv Detail & Related papers (2023-06-11T23:13:51Z)
Learning to Grow Pretrained Models for Efficient Transformer Training [72.20676008625641]
We learn to grow pretrained transformers, where we learn to linearly map the parameters of the smaller model to initialize the larger model. Experiments across both language and vision transformers demonstrate that our learned Linear Growth Operator (LiGO) can save up to 50% computational cost of training from scratch.
arXiv Detail & Related papers (2023-03-02T05:21:18Z)
Application of Deep Learning in Generating Structured Radiology Reports: A Transformer-Based Technique [0.4549831511476247]
Natural language processing techniques can facilitate automatic information extraction and transformation of free-text formats to structured data. Deep learning (DL)-based models have been adapted for NLP experiments with promising results. In this study, we propose a transformer-based fine-grained named entity recognition architecture for clinical information extraction.
arXiv Detail & Related papers (2022-09-25T08:03:15Z)
N-Grammer: Augmenting Transformers with latent n-grams [35.39961549040385]
We propose a simple yet effective modification to the Transformer architecture inspired by the literature in statistical language modeling, by augmenting the model with n-grams that are constructed from a discrete latent representation of the text sequence. We evaluate our model, the N-Grammer on language modeling on the C4 data-set as well as text classification on the SuperGLUE data-set, and find that it outperforms several strong baselines such as the Transformer and the Primer.
arXiv Detail & Related papers (2022-07-13T17:18:02Z)
Transformers: "The End of History" for NLP? [17.36054090232896]
We shed light on some important theoretical limitations of pre-trained BERT-style models. We show that addressing these limitations can yield sizable improvements over vanilla RoBERTa and XLNet. We offer a more general discussion on desiderata for future additions to the Transformer architecture.
arXiv Detail & Related papers (2021-04-09T08:29:42Z)
Compressing Large-Scale Transformer-Based Models: A Case Study on BERT [41.04066537294312]
Pre-trained Transformer-based models have achieved state-of-the-art performance for various Natural Language Processing (NLP) tasks. These models often have billions of parameters, and, thus, are too resource-hungry and computation-intensive to suit low-capability devices or applications. One potential remedy for this is model compression, which has attracted a lot of research attention.
arXiv Detail & Related papers (2020-02-27T09:20:31Z)
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers [117.67424061746247]
We present a simple and effective approach to compress large Transformer based pre-trained models. We propose distilling the self-attention module of the last Transformer layer of the teacher, which is effective and flexible for the student. Experimental results demonstrate that our monolingual model outperforms state-of-the-art baselines in different parameter size of student models.
arXiv Detail & Related papers (2020-02-25T15:21:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.