Compression of Deep Learning Models for Text: A Survey
- URL: http://arxiv.org/abs/2008.05221v4
- Date: Sun, 13 Jun 2021 17:47:28 GMT
- Title: Compression of Deep Learning Models for Text: A Survey
- Authors: Manish Gupta, Puneet Agrawal
- Abstract summary: In recent years, the fields of natural language processing (NLP) and information retrieval (IR) have made tremendous progress.
Deep learning models like Recurrent Neural Networks (RNNs), Gated Recurrent Units (GRUs) and Long Short-Term Memory (LSTMs)networks, and Transformer [120] based models like Bidirectional Representations from Transformers (BERT) [24], GenerativePre-training Transformer (GPT-2) [94], Multi-task Deep Neural Network (MT-DNN) [73], Extra-Long Network (XLNet) [134], Text-to-text transfer
- Score: 6.532867867011488
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, the fields of natural language processing (NLP) and
information retrieval (IR) have made tremendous progress thanksto deep learning
models like Recurrent Neural Networks (RNNs), Gated Recurrent Units (GRUs) and
Long Short-Term Memory (LSTMs)networks, and Transformer [120] based models like
Bidirectional Encoder Representations from Transformers (BERT) [24],
GenerativePre-training Transformer (GPT-2) [94], Multi-task Deep Neural Network
(MT-DNN) [73], Extra-Long Network (XLNet) [134], Text-to-text transfer
transformer (T5) [95], T-NLG [98] and GShard [63]. But these models are
humongous in size. On the other hand,real world applications demand small model
size, low response times and low computational power wattage. In this survey,
wediscuss six different types of methods (Pruning, Quantization, Knowledge
Distillation, Parameter Sharing, Tensor Decomposition, andSub-quadratic
Transformer based methods) for compression of such models to enable their
deployment in real industry NLP projects.Given the critical need of building
applications with efficient and small models, and the large amount of recently
published work inthis area, we believe that this survey organizes the plethora
of work done by the 'deep learning for NLP' community in the past fewyears and
presents it as a coherent story.
Related papers
- A Survey on Transformer Compression [84.18094368700379]
Transformer plays a vital role in the realms of natural language processing (NLP) and computer vision (CV)
Model compression methods reduce the memory and computational cost of Transformer.
This survey provides a comprehensive review of recent compression methods, with a specific focus on their application to Transformer-based models.
arXiv Detail & Related papers (2024-02-05T12:16:28Z) - Repeat After Me: Transformers are Better than State Space Models at Copying [53.47717661441142]
We show that while generalized state space models are promising in terms of inference-time efficiency, they are limited compared to transformer models on tasks that require copying from the input context.
arXiv Detail & Related papers (2024-02-01T21:44:11Z) - Enhancing Actuarial Non-Life Pricing Models via Transformers [0.0]
We build on the foundation laid out by the combined actuarial neural network as well as the localGLMnet and enhance those models via the feature tokenizer transformer.
The paper shows that the new methods can achieve better results than the benchmark models while preserving certain generalized linear model advantages.
arXiv Detail & Related papers (2023-11-10T12:06:23Z) - The Languini Kitchen: Enabling Language Modelling Research at Different
Scales of Compute [66.84421705029624]
We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours.
We pre-process an existing large, diverse, and high-quality dataset of books that surpasses existing academic benchmarks in quality, diversity, and document length.
This work also provides two baseline models: a feed-forward model derived from the GPT-2 architecture and a recurrent model in the form of a novel LSTM with ten-fold throughput.
arXiv Detail & Related papers (2023-09-20T10:31:17Z) - A Comprehensive Survey on Applications of Transformers for Deep Learning
Tasks [60.38369406877899]
Transformer is a deep neural network that employs a self-attention mechanism to comprehend the contextual relationships within sequential data.
transformer models excel in handling long dependencies between input sequence elements and enable parallel processing.
Our survey encompasses the identification of the top five application domains for transformer-based models.
arXiv Detail & Related papers (2023-06-11T23:13:51Z) - Learning to Grow Pretrained Models for Efficient Transformer Training [72.20676008625641]
We learn to grow pretrained transformers, where we learn to linearly map the parameters of the smaller model to initialize the larger model.
Experiments across both language and vision transformers demonstrate that our learned Linear Growth Operator (LiGO) can save up to 50% computational cost of training from scratch.
arXiv Detail & Related papers (2023-03-02T05:21:18Z) - Application of Deep Learning in Generating Structured Radiology Reports:
A Transformer-Based Technique [0.4549831511476247]
Natural language processing techniques can facilitate automatic information extraction and transformation of free-text formats to structured data.
Deep learning (DL)-based models have been adapted for NLP experiments with promising results.
In this study, we propose a transformer-based fine-grained named entity recognition architecture for clinical information extraction.
arXiv Detail & Related papers (2022-09-25T08:03:15Z) - N-Grammer: Augmenting Transformers with latent n-grams [35.39961549040385]
We propose a simple yet effective modification to the Transformer architecture inspired by the literature in statistical language modeling, by augmenting the model with n-grams that are constructed from a discrete latent representation of the text sequence.
We evaluate our model, the N-Grammer on language modeling on the C4 data-set as well as text classification on the SuperGLUE data-set, and find that it outperforms several strong baselines such as the Transformer and the Primer.
arXiv Detail & Related papers (2022-07-13T17:18:02Z) - Transformers: "The End of History" for NLP? [17.36054090232896]
We shed light on some important theoretical limitations of pre-trained BERT-style models.
We show that addressing these limitations can yield sizable improvements over vanilla RoBERTa and XLNet.
We offer a more general discussion on desiderata for future additions to the Transformer architecture.
arXiv Detail & Related papers (2021-04-09T08:29:42Z) - Compressing Large-Scale Transformer-Based Models: A Case Study on BERT [41.04066537294312]
Pre-trained Transformer-based models have achieved state-of-the-art performance for various Natural Language Processing (NLP) tasks.
These models often have billions of parameters, and, thus, are too resource-hungry and computation-intensive to suit low-capability devices or applications.
One potential remedy for this is model compression, which has attracted a lot of research attention.
arXiv Detail & Related papers (2020-02-27T09:20:31Z) - MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression
of Pre-Trained Transformers [117.67424061746247]
We present a simple and effective approach to compress large Transformer based pre-trained models.
We propose distilling the self-attention module of the last Transformer layer of the teacher, which is effective and flexible for the student.
Experimental results demonstrate that our monolingual model outperforms state-of-the-art baselines in different parameter size of student models.
arXiv Detail & Related papers (2020-02-25T15:21:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.