Alternate Model Growth and Pruning for Efficient Training of
Recommendation Systems
- URL: http://arxiv.org/abs/2105.01064v1
- Date: Tue, 4 May 2021 03:14:30 GMT
- Title: Alternate Model Growth and Pruning for Efficient Training of
Recommendation Systems
- Authors: Xiaocong Du, Bhargav Bhushanam, Jiecao Yu, Dhruv Choudhary, Tianxiang
Gao, Sherman Wong, Louis Feng, Jongsoo Park, Yu Cao, Arun Kejariwal
- Abstract summary: Model pruning is an effective technique to reduce computation overhead for deep neural networks by removing redundant parameters.
Modern recommendation systems are still thirsty for model capacity due to the demand for handling big data.
We propose a dynamic training scheme, namely alternate model growth and pruning, to alternatively construct and prune weights in the course of training.
- Score: 7.415129876303651
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning recommendation systems at scale have provided remarkable gains
through increasing model capacity (i.e. wider and deeper neural networks), but
it comes at significant training cost and infrastructure cost. Model pruning is
an effective technique to reduce computation overhead for deep neural networks
by removing redundant parameters. However, modern recommendation systems are
still thirsty for model capacity due to the demand for handling big data. Thus,
pruning a recommendation model at scale results in a smaller model capacity and
consequently lower accuracy. To reduce computation cost without sacrificing
model capacity, we propose a dynamic training scheme, namely alternate model
growth and pruning, to alternatively construct and prune weights in the course
of training. Our method leverages structured sparsification to reduce
computational cost without hurting the model capacity at the end of offline
training so that a full-size model is available in the recurring training stage
to learn new data in real-time. To the best of our knowledge, this is the first
work to provide in-depth experiments and discussion of applying structural
dynamics to recommendation systems at scale to reduce training cost. The
proposed method is validated with an open-source deep-learning recommendation
model (DLRM) and state-of-the-art industrial-scale production models.
Related papers
- AquilaMoE: Efficient Training for MoE Models with Scale-Up and Scale-Out Strategies [36.645912291368546]
We present AquilaMoE, a cutting-edge bilingual 8*16B Mixture of Experts (MoE) language model with 8 experts with 16 billion parameters each.
This approach optimize performance while minimizing data requirements through a two-stage process.
We successfully trained a 16B model and subsequently the 8*16B AquilaMoE model, demonstrating significant improvements in performance and training efficiency.
arXiv Detail & Related papers (2024-08-13T02:07:00Z) - Jointly Training and Pruning CNNs via Learnable Agent Guidance and Alignment [69.33930972652594]
We propose a novel structural pruning approach to jointly learn the weights and structurally prune architectures of CNN models.
The core element of our method is a Reinforcement Learning (RL) agent whose actions determine the pruning ratios of the CNN model's layers.
We conduct the joint training and pruning by iteratively training the model's weights and the agent's policy.
arXiv Detail & Related papers (2024-03-28T15:22:29Z) - Dynamic Sparse Learning: A Novel Paradigm for Efficient Recommendation [20.851925464903804]
This paper introduces a novel learning paradigm, Dynamic Sparse Learning, tailored for recommendation models.
DSL innovatively trains a lightweight sparse model from scratch, periodically evaluating and dynamically adjusting each weight's significance.
Our experimental results underline DSL's effectiveness, significantly reducing training and inference costs while delivering comparable recommendation performance.
arXiv Detail & Related papers (2024-02-05T10:16:20Z) - Reusing Pretrained Models by Multi-linear Operators for Efficient
Training [65.64075958382034]
Training large models from scratch usually costs a substantial amount of resources.
Recent studies such as bert2BERT and LiGO have reused small pretrained models to initialize a large model.
We propose a method that linearly correlates each weight of the target model to all the weights of the pretrained model.
arXiv Detail & Related papers (2023-10-16T06:16:47Z) - Towards a Better Theoretical Understanding of Independent Subnetwork Training [56.24689348875711]
We take a closer theoretical look at Independent Subnetwork Training (IST)
IST is a recently proposed and highly effective technique for solving the aforementioned problems.
We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication.
arXiv Detail & Related papers (2023-06-28T18:14:22Z) - Dimensionality Reduced Training by Pruning and Freezing Parts of a Deep
Neural Network, a Survey [69.3939291118954]
State-of-the-art deep learning models have a parameter count that reaches into the billions. Training, storing and transferring such models is energy and time consuming, thus costly.
Model compression lowers storage and transfer costs, and can further make training more efficient by decreasing the number of computations in the forward and/or backward pass.
This work is a survey on methods which reduce the number of trained weights in deep learning models throughout the training.
arXiv Detail & Related papers (2022-05-17T05:37:08Z) - Exploring Low-Cost Transformer Model Compression for Large-Scale
Commercial Reply Suggestions [3.3953799543764522]
Fine-tuning pre-trained language models improves the quality of commercial reply suggestion systems.
We explore low-cost model compression techniques like Layer Dropping and Layer Freezing.
We demonstrate the efficacy of these techniques in large-data scenarios, enabling the training time reduction for a commercial email reply suggestion system by 42%.
arXiv Detail & Related papers (2021-11-27T22:42:06Z) - Powerpropagation: A sparsity inducing weight reparameterisation [65.85142037667065]
We introduce Powerpropagation, a new weight- parameterisation for neural networks that leads to inherently sparse models.
Models trained in this manner exhibit similar performance, but have a distribution with markedly higher density at zero, allowing more parameters to be pruned safely.
Here, we combine Powerpropagation with a traditional weight-pruning technique as well as recent state-of-the-art sparse-to-sparse algorithms, showing superior performance on the ImageNet benchmark.
arXiv Detail & Related papers (2021-10-01T10:03:57Z) - Incremental Learning for Personalized Recommender Systems [8.020546404087922]
We present an incremental learning solution to provide both the training efficiency and the model quality.
The solution is deployed in LinkedIn and directly applicable to industrial scale recommender systems.
arXiv Detail & Related papers (2021-08-13T04:21:21Z) - Dynamic Model Pruning with Feedback [64.019079257231]
We propose a novel model compression method that generates a sparse trained model without additional overhead.
We evaluate our method on CIFAR-10 and ImageNet, and show that the obtained sparse models can reach the state-of-the-art performance of dense models.
arXiv Detail & Related papers (2020-06-12T15:07:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.