Related papers: The Future is Sparse: Embedding Compression for Scalable Retrieval in Recommender Systems

The Future is Sparse: Embedding Compression for Scalable Retrieval in Recommender Systems

URL: http://arxiv.org/abs/2505.11388v1
Date: Fri, 16 May 2025 15:51:52 GMT
Title: The Future is Sparse: Embedding Compression for Scalable Retrieval in Recommender Systems
Authors: Petr Kasalický, Martin Spišák, Vojtěch Vančura, Daniel Bohuněk, Rodrigo Alves, Pavel Kordík,
Abstract summary: We describe a lightweight, learnable embedding compression technique that projects dense embeddings into a high-dimensional, sparsely activated space.<n>Our results demonstrate that leveraging sparsity is a promising approach for improving the efficiency of large-scale recommenders.
Score: 3.034710104407876
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Industry-scale recommender systems face a core challenge: representing entities with high cardinality, such as users or items, using dense embeddings that must be accessible during both training and inference. However, as embedding sizes grow, memory constraints make storage and access increasingly difficult. We describe a lightweight, learnable embedding compression technique that projects dense embeddings into a high-dimensional, sparsely activated space. Designed for retrieval tasks, our method reduces memory requirements while preserving retrieval performance, enabling scalable deployment under strict resource constraints. Our results demonstrate that leveraging sparsity is a promising approach for improving the efficiency of large-scale recommenders. We release our code at https://github.com/recombee/CompresSAE.

Related papers

A Universal Framework for Compressing Embeddings in CTR Prediction [68.27582084015044]
We introduce a Model-agnostic Embedding Compression (MEC) framework that compresses embedding tables by quantizing pre-trained embeddings.<n>Our approach consists of two stages: first, we apply popularity-weighted regularization to balance code distribution between high- and low-frequency features.<n> Experiments on three datasets reveal that our method reduces memory usage by over 50x while maintaining or improving recommendation performance.
arXiv Detail & Related papers (2025-02-21T10:12:34Z)
Embedding Compression in Recommender Systems: A Survey [44.949824174769]
We introduce deep learning recommendation models and the basic concept of embedding compression in recommender systems. We systematically organize existing approaches into three categories, namely low-precision, mixed-dimension, and weight-sharing.
arXiv Detail & Related papers (2024-08-05T08:30:16Z)
Scalable Dynamic Embedding Size Search for Streaming Recommendation [54.28404337601801]
Real-world recommender systems often operate in streaming recommendation scenarios. Number of users and items continues to grow, leading to substantial storage resource consumption. We learn Lightweight Embeddings for streaming recommendation, called SCALL, which can adaptively adjust the embedding sizes of users/items.
arXiv Detail & Related papers (2024-07-22T06:37:24Z)
Anti-Collapse Loss for Deep Metric Learning Based on Coding Rate Metric [99.19559537966538]
DML aims to learn a discriminative high-dimensional embedding space for downstream tasks like classification, clustering, and retrieval. To maintain the structure of embedding space and avoid feature collapse, we propose a novel loss function called Anti-Collapse Loss. Comprehensive experiments on benchmark datasets demonstrate that our proposed method outperforms existing state-of-the-art methods.
arXiv Detail & Related papers (2024-07-03T13:44:20Z)
A Thorough Performance Benchmarking on Lightweight Embedding-based Recommender Systems [67.52782366565658]
State-of-the-art recommender systems (RSs) depend on categorical features, which ecoded by embedding vectors, resulting in excessively large embedding tables.<n>Despite the prosperity of lightweight embedding-based RSs, a wide diversity is seen in evaluation protocols.<n>This study investigates various LERS' performance, efficiency, and cross-task transferability via a thorough benchmarking process.
arXiv Detail & Related papers (2024-06-25T07:45:00Z)
Embedding Compression for Efficient Re-Identification [0.0]
ReID algorithms aim to map new observations of an object to previously recorded instances. We benchmark quantization-aware-training along with three different dimension reduction methods. We find that ReID embeddings can be compressed by up to 96x with minimal drop in performance.
arXiv Detail & Related papers (2024-05-23T15:57:11Z)
Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMs [61.40047491337793]
We present Hierarchical cOntext MERging (HOMER), a new training-free scheme designed to overcome the limitations of large language models. HomeR uses a divide-and-conquer algorithm, dividing long inputs into manageable chunks. A token reduction technique precedes each merging, ensuring memory usage efficiency.
arXiv Detail & Related papers (2024-04-16T06:34:08Z)
Dynamic Embedding Size Search with Minimum Regret for Streaming Recommender System [39.78277554870799]
We show that setting an identical and static embedding size is sub-optimal in terms of recommendation performance and memory cost. We propose a method to minimize the embedding size selection regret on both user and item sides in a non-stationary manner.
arXiv Detail & Related papers (2023-08-15T13:27:18Z)
A Generic Network Compression Framework for Sequential Recommender Systems [71.81962915192022]
Sequential recommender systems (SRS) have become the key technology in capturing user's dynamic interests and generating high-quality recommendations. We propose a compressed sequential recommendation framework, termed as CpRec, where two generic model shrinking techniques are employed. By the extensive ablation studies, we demonstrate that the proposed CpRec can achieve up to 4$sim$8 times compression rates in real-world SRS datasets.
arXiv Detail & Related papers (2020-04-21T08:40:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.