Exploring Low-Cost Transformer Model Compression for Large-Scale
Commercial Reply Suggestions
- URL: http://arxiv.org/abs/2111.13999v1
- Date: Sat, 27 Nov 2021 22:42:06 GMT
- Title: Exploring Low-Cost Transformer Model Compression for Large-Scale
Commercial Reply Suggestions
- Authors: Vaishnavi Shrivastava, Radhika Gaonkar, Shashank Gupta, Abhishek Jha
- Abstract summary: Fine-tuning pre-trained language models improves the quality of commercial reply suggestion systems.
We explore low-cost model compression techniques like Layer Dropping and Layer Freezing.
We demonstrate the efficacy of these techniques in large-data scenarios, enabling the training time reduction for a commercial email reply suggestion system by 42%.
- Score: 3.3953799543764522
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Fine-tuning pre-trained language models improves the quality of commercial
reply suggestion systems, but at the cost of unsustainable training times.
Popular training time reduction approaches are resource intensive, thus we
explore low-cost model compression techniques like Layer Dropping and Layer
Freezing. We demonstrate the efficacy of these techniques in large-data
scenarios, enabling the training time reduction for a commercial email reply
suggestion system by 42%, without affecting the model relevance or user
engagement. We further study the robustness of these techniques to pre-trained
model and dataset size ablation, and share several insights and recommendations
for commercial applications.
Related papers
- Dynamic Sparse Learning: A Novel Paradigm for Efficient Recommendation [20.851925464903804]
This paper introduces a novel learning paradigm, Dynamic Sparse Learning, tailored for recommendation models.
DSL innovatively trains a lightweight sparse model from scratch, periodically evaluating and dynamically adjusting each weight's significance.
Our experimental results underline DSL's effectiveness, significantly reducing training and inference costs while delivering comparable recommendation performance.
arXiv Detail & Related papers (2024-02-05T10:16:20Z) - Model Compression Techniques in Biometrics Applications: A Survey [5.452293986561535]
Deep learning algorithms have extensively empowered humanity's task automatization capacity.
The huge improvement in the performance of these models is highly correlated with their increasing level of complexity.
This led to the development of compression techniques that drastically reduce the computational and memory costs of deep learning models without significant performance degradation.
arXiv Detail & Related papers (2024-01-18T17:06:21Z) - Preparing Lessons for Progressive Training on Language Models [75.88952808979087]
The rapid progress of Transformers in artificial intelligence has come at the cost of increased resource consumption and greenhouse gas emissions.
We propose Apollo, which preptextbfares lessons for extextbfpanding textbfoperations by textbflayer functitextbfonality during training of low layers.
Experiments demonstrate that Apollo achieves state-of-the-art acceleration ratios, even rivaling methods using pretrained models.
arXiv Detail & Related papers (2024-01-17T13:04:14Z) - Retrieval-based Knowledge Transfer: An Effective Approach for Extreme
Large Language Model Compression [64.07696663255155]
Large-scale pre-trained language models (LLMs) have demonstrated exceptional performance in various natural language processing (NLP) tasks.
However, the massive size of these models poses huge challenges for their deployment in real-world applications.
We introduce a novel compression paradigm called Retrieval-based Knowledge Transfer (RetriKT) which effectively transfers the knowledge of LLMs to extremely small-scale models.
arXiv Detail & Related papers (2023-10-24T07:58:20Z) - Reusing Pretrained Models by Multi-linear Operators for Efficient
Training [65.64075958382034]
Training large models from scratch usually costs a substantial amount of resources.
Recent studies such as bert2BERT and LiGO have reused small pretrained models to initialize a large model.
We propose a method that linearly correlates each weight of the target model to all the weights of the pretrained model.
arXiv Detail & Related papers (2023-10-16T06:16:47Z) - Towards a Better Theoretical Understanding of Independent Subnetwork Training [56.24689348875711]
We take a closer theoretical look at Independent Subnetwork Training (IST)
IST is a recently proposed and highly effective technique for solving the aforementioned problems.
We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication.
arXiv Detail & Related papers (2023-06-28T18:14:22Z) - Re-thinking Data Availablity Attacks Against Deep Neural Networks [53.64624167867274]
In this paper, we re-examine the concept of unlearnable examples and discern that the existing robust error-minimizing noise presents an inaccurate optimization objective.
We introduce a novel optimization paradigm that yields improved protection results with reduced computational time requirements.
arXiv Detail & Related papers (2023-05-18T04:03:51Z) - Improving Sample Efficiency of Deep Learning Models in Electricity
Market [0.41998444721319217]
We propose a general framework, namely Knowledge-Augmented Training (KAT), to improve the sample efficiency.
We propose a novel data augmentation technique to generate some synthetic data, which are later processed by an improved training strategy.
Modern learning theories demonstrate the effectiveness of our method in terms of effective prediction error feedbacks, a reliable loss function, and rich gradient noises.
arXiv Detail & Related papers (2022-10-11T16:35:13Z) - Alternate Model Growth and Pruning for Efficient Training of
Recommendation Systems [7.415129876303651]
Model pruning is an effective technique to reduce computation overhead for deep neural networks by removing redundant parameters.
Modern recommendation systems are still thirsty for model capacity due to the demand for handling big data.
We propose a dynamic training scheme, namely alternate model growth and pruning, to alternatively construct and prune weights in the course of training.
arXiv Detail & Related papers (2021-05-04T03:14:30Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.