Learnable Embedding Sizes for Recommender Systems
- URL: http://arxiv.org/abs/2101.07577v2
- Date: Thu, 11 Mar 2021 10:38:59 GMT
- Title: Learnable Embedding Sizes for Recommender Systems
- Authors: Siyi Liu, Chen Gao, Yihong Chen, Depeng Jin, Yong Li
- Abstract summary: We propose PEP (short for Plug-in Embedding Pruning) to reduce the size of the embedding table while avoiding the drop of recommendation accuracy.
PEP achieves strong recommendation performance while reducing 97-99% parameters.
PEP only brings an additional 20-30% time cost compared with base models.
- Score: 34.98757041815557
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The embedding-based representation learning is commonly used in deep learning
recommendation models to map the raw sparse features to dense vectors. The
traditional embedding manner that assigns a uniform size to all features has
two issues. First, the numerous features inevitably lead to a gigantic
embedding table that causes a high memory usage cost. Second, it is likely to
cause the over-fitting problem for those features that do not require too large
representation capacity. Existing works that try to address the problem always
cause a significant drop in recommendation performance or suffers from the
limitation of unaffordable training time cost. In this paper, we proposed a
novel approach, named PEP (short for Plug-in Embedding Pruning), to reduce the
size of the embedding table while avoiding the drop of recommendation accuracy.
PEP prunes embedding parameter where the pruning threshold(s) can be adaptively
learned from data. Therefore we can automatically obtain a mixed-dimension
embedding-scheme by pruning redundant parameters for each feature. PEP is a
general framework that can plug in various base recommendation models.
Extensive experiments demonstrate it can efficiently cut down embedding
parameters and boost the base model's performance. Specifically, it achieves
strong recommendation performance while reducing 97-99% parameters. As for the
computation cost, PEP only brings an additional 20-30% time cost compared with
base models. Codes are available at
https://github.com/ssui-liu/learnable-embed-sizes-for-RecSys.
Related papers
- Sparser Training for On-Device Recommendation Systems [50.74019319100728]
We propose SparseRec, a lightweight embedding method based on Dynamic Sparse Training (DST)
It avoids dense gradients during backpropagation by sampling a subset of important vectors.
arXiv Detail & Related papers (2024-11-19T03:48:48Z) - Expanding Sparse Tuning for Low Memory Usage [103.43560327427647]
We propose a method named SNELL (Sparse tuning with kerNELized LoRA) for sparse tuning with low memory usage.
To achieve low memory usage, SNELL decomposes the tunable matrix for sparsification into two learnable low-rank matrices.
A competition-based sparsification mechanism is further proposed to avoid the storage of tunable weight indexes.
arXiv Detail & Related papers (2024-11-04T04:58:20Z) - Scalable Dynamic Embedding Size Search for Streaming Recommendation [54.28404337601801]
Real-world recommender systems often operate in streaming recommendation scenarios.
Number of users and items continues to grow, leading to substantial storage resource consumption.
We learn Lightweight Embeddings for streaming recommendation, called SCALL, which can adaptively adjust the embedding sizes of users/items.
arXiv Detail & Related papers (2024-07-22T06:37:24Z) - CRISP: Hybrid Structured Sparsity for Class-aware Model Pruning [4.775684973625185]
Machine learning pipelines often train a universal model to achieve accuracy across a broad range of classes.
This disparity provides an opportunity to enhance computational efficiency by tailoring models to focus on user-specific classes.
We propose CRISP, a novel pruning framework that combines fine-grained N:M structured sparsity and coarse-grained block sparsity.
Our pruning strategy is guided by a gradient-based class-aware saliency score, allowing us to retain weights crucial for user-specific classes.
arXiv Detail & Related papers (2023-11-24T04:16:32Z) - Frustratingly Simple Memory Efficiency for Pre-trained Language Models
via Dynamic Embedding Pruning [42.652021176354644]
The memory footprint of pre-trained language models (PLMs) can hinder deployment in memory-constrained settings.
We propose a simple yet effective approach that leverages this finding to minimize the memory footprint of the embedding matrix.
We show that this approach provides substantial reductions in memory usage across a wide range of models and tasks.
arXiv Detail & Related papers (2023-09-15T19:00:00Z) - Parameter-Efficient Sparsity for Large Language Models Fine-Tuning [63.321205487234074]
We propose a.
sparse-efficient Sparse Training (PST) method to reduce the number of trainable parameters during sparse-aware training.
Experiments with diverse networks (i.e., BERT, RoBERTa and GPT-2) demonstrate PST performs on par or better than previous sparsity methods.
arXiv Detail & Related papers (2022-05-23T02:43:45Z) - Binary Code based Hash Embedding for Web-scale Applications [12.851057275052506]
Deep learning models are widely adopted in web-scale applications such as recommender systems, and online advertising.
In these applications, embedding learning of categorical features is crucial to the success of deep learning models.
We propose a binary code based hash embedding method which allows the size of the embedding table to be reduced in arbitrary scale without compromising too much performance.
arXiv Detail & Related papers (2021-08-24T11:51:15Z) - Learning Effective and Efficient Embedding via an Adaptively-Masked
Twins-based Layer [15.403616481651383]
We propose an Adaptively-Masked Twins-based Layer (AMTL) behind the standard embedding layer.
AMTL generates a mask vector to mask the undesired dimensions for each embedding vector.
The mask vector brings flexibility in selecting the dimensions and the proposed layer can be easily added to either untrained or trained DLRMs.
arXiv Detail & Related papers (2021-08-24T11:50:49Z) - A Generic Network Compression Framework for Sequential Recommender
Systems [71.81962915192022]
Sequential recommender systems (SRS) have become the key technology in capturing user's dynamic interests and generating high-quality recommendations.
We propose a compressed sequential recommendation framework, termed as CpRec, where two generic model shrinking techniques are employed.
By the extensive ablation studies, we demonstrate that the proposed CpRec can achieve up to 4$sim$8 times compression rates in real-world SRS datasets.
arXiv Detail & Related papers (2020-04-21T08:40:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.