Related papers: Learnable Embedding Sizes for Recommender Systems

Learnable Embedding Sizes for Recommender Systems

URL: http://arxiv.org/abs/2101.07577v2
Date: Thu, 11 Mar 2021 10:38:59 GMT
Title: Learnable Embedding Sizes for Recommender Systems
Authors: Siyi Liu, Chen Gao, Yihong Chen, Depeng Jin, Yong Li
Abstract summary: We propose PEP (short for Plug-in Embedding Pruning) to reduce the size of the embedding table while avoiding the drop of recommendation accuracy. PEP achieves strong recommendation performance while reducing 97-99% parameters. PEP only brings an additional 20-30% time cost compared with base models.
Score: 34.98757041815557
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The embedding-based representation learning is commonly used in deep learning recommendation models to map the raw sparse features to dense vectors. The traditional embedding manner that assigns a uniform size to all features has two issues. First, the numerous features inevitably lead to a gigantic embedding table that causes a high memory usage cost. Second, it is likely to cause the over-fitting problem for those features that do not require too large representation capacity. Existing works that try to address the problem always cause a significant drop in recommendation performance or suffers from the limitation of unaffordable training time cost. In this paper, we proposed a novel approach, named PEP (short for Plug-in Embedding Pruning), to reduce the size of the embedding table while avoiding the drop of recommendation accuracy. PEP prunes embedding parameter where the pruning threshold(s) can be adaptively learned from data. Therefore we can automatically obtain a mixed-dimension embedding-scheme by pruning redundant parameters for each feature. PEP is a general framework that can plug in various base recommendation models. Extensive experiments demonstrate it can efficiently cut down embedding parameters and boost the base model's performance. Specifically, it achieves strong recommendation performance while reducing 97-99% parameters. As for the computation cost, PEP only brings an additional 20-30% time cost compared with base models. Codes are available at https://github.com/ssui-liu/learnable-embed-sizes-for-RecSys.

Related papers

ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts [71.91042186338163]
ALoRE is a novel PETL method that reuses the hypercomplex parameterized space constructed by Kronecker product to Aggregate Low Rank Experts. Thanks to the artful design, ALoRE maintains negligible extra parameters and can be effortlessly merged into the frozen backbone.
arXiv Detail & Related papers (2024-12-11T12:31:30Z)
Sparser Training for On-Device Recommendation Systems [50.74019319100728]
We propose SparseRec, a lightweight embedding method based on Dynamic Sparse Training (DST) It avoids dense gradients during backpropagation by sampling a subset of important vectors.
arXiv Detail & Related papers (2024-11-19T03:48:48Z)
Expanding Sparse Tuning for Low Memory Usage [103.43560327427647]
We propose a method named SNELL (Sparse tuning with kerNELized LoRA) for sparse tuning with low memory usage. To achieve low memory usage, SNELL decomposes the tunable matrix for sparsification into two learnable low-rank matrices. A competition-based sparsification mechanism is further proposed to avoid the storage of tunable weight indexes.
arXiv Detail & Related papers (2024-11-04T04:58:20Z)
Scalable Dynamic Embedding Size Search for Streaming Recommendation [54.28404337601801]
Real-world recommender systems often operate in streaming recommendation scenarios. Number of users and items continues to grow, leading to substantial storage resource consumption. We learn Lightweight Embeddings for streaming recommendation, called SCALL, which can adaptively adjust the embedding sizes of users/items.
arXiv Detail & Related papers (2024-07-22T06:37:24Z)
SPT: Fine-Tuning Transformer-based Language Models Efficiently with Sparsification [14.559316921646356]
Fine-tuning Transformer-based models for downstream tasks has long running time and high memory consumption. We propose the SPT system to fine-tune Transformer-based models efficiently by introducing sparsity. SPT consistently outperforms well-optimized baselines, reducing the peak memory consumption by up to 50% and accelerating fine-tuning by up to 2.2x.
arXiv Detail & Related papers (2023-12-16T07:44:52Z)
CRISP: Hybrid Structured Sparsity for Class-aware Model Pruning [4.775684973625185]
Machine learning pipelines often train a universal model to achieve accuracy across a broad range of classes. This disparity provides an opportunity to enhance computational efficiency by tailoring models to focus on user-specific classes. We propose CRISP, a novel pruning framework that combines fine-grained N:M structured sparsity and coarse-grained block sparsity. Our pruning strategy is guided by a gradient-based class-aware saliency score, allowing us to retain weights crucial for user-specific classes.
arXiv Detail & Related papers (2023-11-24T04:16:32Z)
Frustratingly Simple Memory Efficiency for Pre-trained Language Models via Dynamic Embedding Pruning [42.652021176354644]
The memory footprint of pre-trained language models (PLMs) can hinder deployment in memory-constrained settings. We propose a simple yet effective approach that leverages this finding to minimize the memory footprint of the embedding matrix. We show that this approach provides substantial reductions in memory usage across a wide range of models and tasks.
arXiv Detail & Related papers (2023-09-15T19:00:00Z)
Parameter-Efficient Sparsity for Large Language Models Fine-Tuning [63.321205487234074]
We propose a. sparse-efficient Sparse Training (PST) method to reduce the number of trainable parameters during sparse-aware training. Experiments with diverse networks (i.e., BERT, RoBERTa and GPT-2) demonstrate PST performs on par or better than previous sparsity methods.
arXiv Detail & Related papers (2022-05-23T02:43:45Z)
Binary Code based Hash Embedding for Web-scale Applications [12.851057275052506]
Deep learning models are widely adopted in web-scale applications such as recommender systems, and online advertising. In these applications, embedding learning of categorical features is crucial to the success of deep learning models. We propose a binary code based hash embedding method which allows the size of the embedding table to be reduced in arbitrary scale without compromising too much performance.
arXiv Detail & Related papers (2021-08-24T11:51:15Z)
Learning Effective and Efficient Embedding via an Adaptively-Masked Twins-based Layer [15.403616481651383]
We propose an Adaptively-Masked Twins-based Layer (AMTL) behind the standard embedding layer. AMTL generates a mask vector to mask the undesired dimensions for each embedding vector. The mask vector brings flexibility in selecting the dimensions and the proposed layer can be easily added to either untrained or trained DLRMs.
arXiv Detail & Related papers (2021-08-24T11:50:49Z)
A Generic Network Compression Framework for Sequential Recommender Systems [71.81962915192022]
Sequential recommender systems (SRS) have become the key technology in capturing user's dynamic interests and generating high-quality recommendations. We propose a compressed sequential recommendation framework, termed as CpRec, where two generic model shrinking techniques are employed. By the extensive ablation studies, we demonstrate that the proposed CpRec can achieve up to 4$sim$8 times compression rates in real-world SRS datasets.
arXiv Detail & Related papers (2020-04-21T08:40:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.