Mem-Rec: Memory Efficient Recommendation System using Alternative
Representation
- URL: http://arxiv.org/abs/2305.07205v2
- Date: Mon, 15 May 2023 01:50:51 GMT
- Title: Mem-Rec: Memory Efficient Recommendation System using Alternative
Representation
- Authors: Gopi Krishna Jha, Anthony Thomas, Nilesh Jain, Sameh Gobriel, Tajana
Rosing, Ravi Iyer
- Abstract summary: MEM-REC is a novel alternative representation approach for embedding tables.
We show that MEM-REC can not only maintain the recommendation quality but can also improve the embedding latency.
- Score: 6.542635536704625
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Deep learning-based recommendation systems (e.g., DLRMs) are widely used AI
models to provide high-quality personalized recommendations. Training data used
for modern recommendation systems commonly includes categorical features taking
on tens-of-millions of possible distinct values. These categorical tokens are
typically assigned learned vector representations, that are stored in large
embedding tables, on the order of 100s of GB. Storing and accessing these
tables represent a substantial burden in commercial deployments. Our work
proposes MEM-REC, a novel alternative representation approach for embedding
tables. MEM-REC leverages bloom filters and hashing methods to encode
categorical features using two cache-friendly embedding tables. The first table
(token embedding) contains raw embeddings (i.e. learned vector representation),
and the second table (weight embedding), which is much smaller, contains
weights to scale these raw embeddings to provide better discriminative
capability to each data point. We provide a detailed architecture, design and
analysis of MEM-REC addressing trade-offs in accuracy and computation
requirements, in comparison with state-of-the-art techniques. We show that
MEM-REC can not only maintain the recommendation quality and significantly
reduce the memory footprint for commercial scale recommendation models but can
also improve the embedding latency. In particular, based on our results,
MEM-REC compresses the MLPerf CriteoTB benchmark DLRM model size by 2900x and
performs up to 3.4x faster embeddings while achieving the same AUC as that of
the full uncompressed model.
Related papers
- DQRM: Deep Quantized Recommendation Models [34.73674946187648]
Large-scale recommendation models are the dominant workload for many large Internet companies.
The size of these 1TB+ tables imposes a severe memory bottleneck for the training and inference of recommendation models.
We propose a novel recommendation framework that is small, powerful, and efficient to run and train, based on the state-of-the-art Deep Learning Recommendation Model (DLRM)
arXiv Detail & Related papers (2024-10-26T02:33:52Z) - TableRAG: Million-Token Table Understanding with Language Models [53.039560091592215]
TableRAG is a Retrieval-Augmented Generation (RAG) framework specifically designed for LM-based table understanding.
TableRAG leverages query expansion combined with schema and cell retrieval to pinpoint crucial information before providing it to the LMs.
Our results demonstrate that TableRAG achieves the highest retrieval quality, leading to the new state-of-the-art performance on large-scale table understanding.
arXiv Detail & Related papers (2024-10-07T04:15:02Z) - Scalable Dynamic Embedding Size Search for Streaming Recommendation [54.28404337601801]
Real-world recommender systems often operate in streaming recommendation scenarios.
Number of users and items continues to grow, leading to substantial storage resource consumption.
We learn Lightweight Embeddings for streaming recommendation, called SCALL, which can adaptively adjust the embedding sizes of users/items.
arXiv Detail & Related papers (2024-07-22T06:37:24Z) - CORM: Cache Optimization with Recent Message for Large Language Model Inference [57.109354287786154]
We introduce an innovative method for optimizing the KV cache, which considerably minimizes its memory footprint.
CORM, a KV cache eviction policy, dynamically retains essential key-value pairs for inference without the need for model fine-tuning.
Our validation shows that CORM reduces the inference memory usage of KV cache by up to 70% with negligible performance degradation across six tasks in LongBench.
arXiv Detail & Related papers (2024-04-24T16:11:54Z) - Fine-Grained Embedding Dimension Optimization During Training for Recommender Systems [17.602059421895856]
FIITED is a system to automatically reduce the memory footprint via FIne-grained In-Training Embedding Dimension pruning.
We show that FIITED can reduce DLRM embedding size by more than 65% while preserving model quality.
On public datasets, FIITED can reduce the size of embedding tables by 2.1x to 800x with negligible accuracy drop.
arXiv Detail & Related papers (2024-01-09T08:04:11Z) - Continual Referring Expression Comprehension via Dual Modular
Memorization [133.46886428655426]
Referring Expression (REC) aims to localize an image region of a given object described by a natural-language expression.
Existing REC algorithms make a strong assumption that training data feeding into a model are given upfront, which degrades its practicality for real-world scenarios.
In this paper, we propose Continual Referring Expression (CREC), a new setting for REC, where a model is learning on a stream of incoming tasks.
In order to continuously improve the model on sequential tasks without forgetting prior learned knowledge and without repeatedly re-training from a scratch, we propose an effective baseline method named Dual Modular Memorization
arXiv Detail & Related papers (2023-11-25T02:58:51Z) - Dynamic Embedding Size Search with Minimum Regret for Streaming
Recommender System [39.78277554870799]
We show that setting an identical and static embedding size is sub-optimal in terms of recommendation performance and memory cost.
We propose a method to minimize the embedding size selection regret on both user and item sides in a non-stationary manner.
arXiv Detail & Related papers (2023-08-15T13:27:18Z) - A Model or 603 Exemplars: Towards Memory-Efficient Class-Incremental
Learning [56.450090618578]
Class-Incremental Learning (CIL) aims to train a model with limited memory size to meet this requirement.
We show that when counting the model size into the total budget and comparing methods with aligned memory size, saving models do not consistently work.
We propose a simple yet effective baseline, denoted as MEMO for Memory-efficient Expandable MOdel.
arXiv Detail & Related papers (2022-05-26T08:24:01Z) - Semantically Constrained Memory Allocation (SCMA) for Embedding in
Efficient Recommendation Systems [27.419109620575313]
A key challenge for deep learning models is to work with millions of categorical classes or tokens.
We propose a novel formulation of memory shared embedding, where memory is shared in proportion to the overlap in semantic information.
We demonstrate a significant reduction in the memory footprint while maintaining performance.
arXiv Detail & Related papers (2021-02-24T19:55:49Z) - A Generic Network Compression Framework for Sequential Recommender
Systems [71.81962915192022]
Sequential recommender systems (SRS) have become the key technology in capturing user's dynamic interests and generating high-quality recommendations.
We propose a compressed sequential recommendation framework, termed as CpRec, where two generic model shrinking techniques are employed.
By the extensive ablation studies, we demonstrate that the proposed CpRec can achieve up to 4$sim$8 times compression rates in real-world SRS datasets.
arXiv Detail & Related papers (2020-04-21T08:40:55Z) - Rich-Item Recommendations for Rich-Users: Exploiting Dynamic and Static
Side Information [20.176329366180934]
We study the problem of recommendation system where the users and items to be recommended are rich data structures with multiple entity types.
We provide a general formulation for the problem that captures the complexities of modern real-world recommendations.
We present two real-world case studies of our formulation and the MEDRES architecture.
arXiv Detail & Related papers (2020-01-28T17:53:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.