Adaptive Dense-to-Sparse Paradigm for Pruning Online Recommendation
System with Non-Stationary Data
- URL: http://arxiv.org/abs/2010.08655v2
- Date: Wed, 21 Oct 2020 04:01:32 GMT
- Title: Adaptive Dense-to-Sparse Paradigm for Pruning Online Recommendation
System with Non-Stationary Data
- Authors: Mao Ye, Dhruv Choudhary, Jiecao Yu, Ellie Wen, Zeliang Chen, Jiyan
Yang, Jongsoo Park, Qiang Liu, Arun Kejariwal
- Abstract summary: Pruning is an effective technique that reduces both memory and compute demand for model inference.
This work presents an adaptive dense to sparse paradigm equipped with a novel pruning algorithm for pruning a large scale recommendation system with non-stationary data distribution.
- Score: 13.080986170257782
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large scale deep learning provides a tremendous opportunity to improve the
quality of content recommendation systems by employing both wider and deeper
models, but this comes at great infrastructural cost and carbon footprint in
modern data centers. Pruning is an effective technique that reduces both memory
and compute demand for model inference. However, pruning for online
recommendation systems is challenging due to the continuous data distribution
shift (a.k.a non-stationary data). Although incremental training on the full
model is able to adapt to the non-stationary data, directly applying it on the
pruned model leads to accuracy loss. This is because the sparsity pattern after
pruning requires adjustment to learn new patterns. To the best of our
knowledge, this is the first work to provide in-depth analysis and discussion
of applying pruning to online recommendation systems with non-stationary data
distribution. Overall, this work makes the following contributions: 1) We
present an adaptive dense to sparse paradigm equipped with a novel pruning
algorithm for pruning a large scale recommendation system with non-stationary
data distribution; 2) We design the pruning algorithm to automatically learn
the sparsity across layers to avoid repeating hand-tuning, which is critical
for pruning the heterogeneous architectures of recommendation systems trained
with non-stationary data.
Related papers
- TCGU: Data-centric Graph Unlearning based on Transferable Condensation [36.670771080732486]
Transferable Condensation Graph Unlearning (TCGU) is a data-centric solution to zero-glance graph unlearning.
We show that TCGU can achieve superior performance in terms of model utility, unlearning efficiency, and unlearning efficacy than existing GU methods.
arXiv Detail & Related papers (2024-10-09T02:14:40Z) - Differential Privacy Regularization: Protecting Training Data Through Loss Function Regularization [49.1574468325115]
Training machine learning models based on neural networks requires large datasets, which may contain sensitive information.
Differentially private SGD [DP-SGD] requires the modification of the standard gradient descent [SGD] algorithm for training new models.
A novel regularization strategy is proposed to achieve the same goal in a more efficient manner.
arXiv Detail & Related papers (2024-09-25T17:59:32Z) - DRoP: Distributionally Robust Pruning [11.930434318557156]
We conduct the first systematic study of the impact of data pruning on classification bias of trained models.
We propose DRoP, a distributionally robust approach to pruning and empirically demonstrate its performance on standard computer vision benchmarks.
arXiv Detail & Related papers (2024-04-08T14:55:35Z) - Kalman Filter for Online Classification of Non-Stationary Data [101.26838049872651]
In Online Continual Learning (OCL) a learning system receives a stream of data and sequentially performs prediction and training steps.
We introduce a probabilistic Bayesian online learning model by using a neural representation and a state space model over the linear predictor weights.
In experiments in multi-class classification we demonstrate the predictive ability of the model and its flexibility to capture non-stationarity.
arXiv Detail & Related papers (2023-06-14T11:41:42Z) - CLIP: Train Faster with Less Data [3.2575001434344286]
Deep learning models require an enormous amount of data for training.
Recently there is a shift in machine learning from model-centric to data-centric approaches.
We propose CLIP i.e., Curriculum Learning with Iterative data Pruning.
arXiv Detail & Related papers (2022-12-02T21:29:48Z) - Learning to Optimize Permutation Flow Shop Scheduling via Graph-based
Imitation Learning [70.65666982566655]
Permutation flow shop scheduling (PFSS) is widely used in manufacturing systems.
We propose to train the model via expert-driven imitation learning, which accelerates convergence more stably and accurately.
Our model's network parameters are reduced to only 37% of theirs, and the solution gap of our model towards the expert solutions decreases from 6.8% to 1.3% on average.
arXiv Detail & Related papers (2022-10-31T09:46:26Z) - Contextual Squeeze-and-Excitation for Efficient Few-Shot Image
Classification [57.36281142038042]
We present a new adaptive block called Contextual Squeeze-and-Excitation (CaSE) that adjusts a pretrained neural network on a new task to significantly improve performance.
We also present a new training protocol based on Coordinate-Descent called UpperCaSE that exploits meta-trained CaSE blocks and fine-tuning routines for efficient adaptation.
arXiv Detail & Related papers (2022-06-20T15:25:08Z) - How Well Do Sparse Imagenet Models Transfer? [75.98123173154605]
Transfer learning is a classic paradigm by which models pretrained on large "upstream" datasets are adapted to yield good results on "downstream" datasets.
In this work, we perform an in-depth investigation of this phenomenon in the context of convolutional neural networks (CNNs) trained on the ImageNet dataset.
We show that sparse models can match or even outperform the transfer performance of dense models, even at high sparsities.
arXiv Detail & Related papers (2021-11-26T11:58:51Z) - Incremental Learning for Personalized Recommender Systems [8.020546404087922]
We present an incremental learning solution to provide both the training efficiency and the model quality.
The solution is deployed in LinkedIn and directly applicable to industrial scale recommender systems.
arXiv Detail & Related papers (2021-08-13T04:21:21Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.