A Generic Network Compression Framework for Sequential Recommender
Systems
- URL: http://arxiv.org/abs/2004.13139v5
- Date: Tue, 26 May 2020 06:25:41 GMT
- Title: A Generic Network Compression Framework for Sequential Recommender
Systems
- Authors: Yang Sun, Fajie Yuan, Min Yang, Guoao Wei, Zhou Zhao, and Duo Liu
- Abstract summary: Sequential recommender systems (SRS) have become the key technology in capturing user's dynamic interests and generating high-quality recommendations.
We propose a compressed sequential recommendation framework, termed as CpRec, where two generic model shrinking techniques are employed.
By the extensive ablation studies, we demonstrate that the proposed CpRec can achieve up to 4$sim$8 times compression rates in real-world SRS datasets.
- Score: 71.81962915192022
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sequential recommender systems (SRS) have become the key technology in
capturing user's dynamic interests and generating high-quality recommendations.
Current state-of-the-art sequential recommender models are typically based on a
sandwich-structured deep neural network, where one or more middle (hidden)
layers are placed between the input embedding layer and output softmax layer.
In general, these models require a large number of parameters (such as using a
large embedding dimension or a deep network architecture) to obtain their
optimal performance. Despite the effectiveness, at some point, further
increasing model size may be harder for model deployment in resource-constraint
devices, resulting in longer responding time and larger memory footprint. To
resolve the issues, we propose a compressed sequential recommendation
framework, termed as CpRec, where two generic model shrinking techniques are
employed. Specifically, we first propose a block-wise adaptive decomposition to
approximate the input and softmax matrices by exploiting the fact that items in
SRS obey a long-tailed distribution. To reduce the parameters of the middle
layers, we introduce three layer-wise parameter sharing schemes. We instantiate
CpRec using deep convolutional neural network with dilated kernels given
consideration to both recommendation accuracy and efficiency. By the extensive
ablation studies, we demonstrate that the proposed CpRec can achieve up to
4$\sim$8 times compression rates in real-world SRS datasets. Meanwhile, CpRec
is faster during training\inference, and in most cases outperforms its
uncompressed counterpart.
Related papers
- Adaptable Embeddings Network (AEN) [49.1574468325115]
We introduce Adaptable Embeddings Networks (AEN), a novel dual-encoder architecture using Kernel Density Estimation (KDE)
AEN allows for runtime adaptation of classification criteria without retraining and is non-autoregressive.
The architecture's ability to preprocess and cache condition embeddings makes it ideal for edge computing applications and real-time monitoring systems.
arXiv Detail & Related papers (2024-11-21T02:15:52Z) - Retraining-free Model Quantization via One-Shot Weight-Coupling Learning [41.299675080384]
Mixed-precision quantization (MPQ) is advocated to compress the model effectively by allocating heterogeneous bit-width for layers.
MPQ is typically organized into a searching-retraining two-stage process.
In this paper, we devise a one-shot training-searching paradigm for mixed-precision model compression.
arXiv Detail & Related papers (2024-01-03T05:26:57Z) - Stochastic Configuration Machines: FPGA Implementation [4.57421617811378]
configuration networks (SCNs) are a prime choice in industrial applications due to their merits and feasibility for data modelling.
This paper aims to implement SCM models on a field programmable gate array (FPGA) and introduce binary-coded inputs to improve learning performance.
arXiv Detail & Related papers (2023-10-30T02:04:20Z) - Dynamic Embedding Size Search with Minimum Regret for Streaming
Recommender System [39.78277554870799]
We show that setting an identical and static embedding size is sub-optimal in terms of recommendation performance and memory cost.
We propose a method to minimize the embedding size selection regret on both user and item sides in a non-stationary manner.
arXiv Detail & Related papers (2023-08-15T13:27:18Z) - Scaling Pre-trained Language Models to Deeper via Parameter-efficient
Architecture [68.13678918660872]
We design a more capable parameter-sharing architecture based on matrix product operator (MPO)
MPO decomposition can reorganize and factorize the information of a parameter matrix into two parts.
Our architecture shares the central tensor across all layers for reducing the model size.
arXiv Detail & Related papers (2023-03-27T02:34:09Z) - Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures.
This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead.
We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z) - Pushing the Efficiency Limit Using Structured Sparse Convolutions [82.31130122200578]
We propose Structured Sparse Convolution (SSC), which leverages the inherent structure in images to reduce the parameters in the convolutional filter.
We show that SSC is a generalization of commonly used layers (depthwise, groupwise and pointwise convolution) in efficient architectures''
Architectures based on SSC achieve state-of-the-art performance compared to baselines on CIFAR-10, CIFAR-100, Tiny-ImageNet, and ImageNet classification benchmarks.
arXiv Detail & Related papers (2022-10-23T18:37:22Z) - Sparsity-guided Network Design for Frame Interpolation [39.828644638174225]
We present a compression-driven network design for frame-based algorithms.
We leverage model pruning through sparsity-inducing optimization to greatly reduce the model size.
We achieve a considerable performance gain with a quarter of the size of the original AdaCoF.
arXiv Detail & Related papers (2022-09-09T23:13:25Z) - Efficient Micro-Structured Weight Unification and Pruning for Neural
Network Compression [56.83861738731913]
Deep Neural Network (DNN) models are essential for practical applications, especially for resource limited devices.
Previous unstructured or structured weight pruning methods can hardly truly accelerate inference.
We propose a generalized weight unification framework at a hardware compatible micro-structured level to achieve high amount of compression and acceleration.
arXiv Detail & Related papers (2021-06-15T17:22:59Z) - A Variational Information Bottleneck Based Method to Compress Sequential
Networks for Human Action Recognition [9.414818018857316]
We propose a method to effectively compress Recurrent Neural Networks (RNNs) used for Human Action Recognition (HAR)
We use a Variational Information Bottleneck (VIB) theory-based pruning approach to limit the information flow through the sequential cells of RNNs to a small subset.
We combine our pruning method with a specific group-lasso regularization technique that significantly improves compression.
It is shown that our method achieves over 70 times greater compression than the nearest competitor with comparable accuracy for the task of action recognition on UCF11.
arXiv Detail & Related papers (2020-10-03T12:41:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.