Fine-Grained Embedding Dimension Optimization During Training for
Recommender Systems
- URL: http://arxiv.org/abs/2401.04408v1
- Date: Tue, 9 Jan 2024 08:04:11 GMT
- Title: Fine-Grained Embedding Dimension Optimization During Training for
Recommender Systems
- Authors: Qinyi Luo, Penghan Wang, Wei Zhang, Fan Lai, Jiachen Mao, Xiaohan Wei,
Jun Song, Wei-Yu Tsai, Shuai Yang, Yuxi Hu and Xuehai Qian
- Abstract summary: FIne-grained In-Training Embedding Dimension optimization (FIITED)
Experiments on two industry models show that FIITED is able to reduce the size of embeddings by more than 65% while maintaining the trained model's quality.
- Score: 18.125952266473533
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Huge embedding tables in modern Deep Learning Recommender Models (DLRM)
require prohibitively large memory during training and inference. Aiming to
reduce the memory footprint of training, this paper proposes FIne-grained
In-Training Embedding Dimension optimization (FIITED). Given the observation
that embedding vectors are not equally important, FIITED adjusts the dimension
of each individual embedding vector continuously during training, assigning
longer dimensions to more important embeddings while adapting to dynamic
changes in data. A novel embedding storage system based on virtually-hashed
physically-indexed hash tables is designed to efficiently implement the
embedding dimension adjustment and effectively enable memory saving.
Experiments on two industry models show that FIITED is able to reduce the size
of embeddings by more than 65% while maintaining the trained model's quality,
saving significantly more memory than a state-of-the-art in-training embedding
pruning method. On public click-through rate prediction datasets, FIITED is
able to prune up to 93.75%-99.75% embeddings without significant accuracy loss.
Related papers
- Hollowed Net for On-Device Personalization of Text-to-Image Diffusion Models [51.3915762595891]
This paper presents an efficient LoRA-based personalization approach for on-device subject-driven generation.
Our method, termed Hollowed Net, enhances memory efficiency during fine-tuning by modifying the architecture of a diffusion U-Net.
arXiv Detail & Related papers (2024-11-02T08:42:48Z) - Dr$^2$Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning [81.0108753452546]
We propose Dynamic Reversible Dual-Residual Networks, or Dr$2$Net, to finetune a pretrained model with substantially reduced memory consumption.
Dr$2$Net contains two types of residual connections, one maintaining the residual structure in the pretrained models, and the other making the network reversible.
We show that Dr$2$Net can reach comparable performance to conventional finetuning but with significantly less memory usage.
arXiv Detail & Related papers (2024-01-08T18:59:31Z) - Sparsity-Preserving Differentially Private Training of Large Embedding
Models [67.29926605156788]
DP-SGD is a training algorithm that combines differential privacy with gradient descent.
Applying DP-SGD naively to embedding models can destroy gradient sparsity, leading to reduced training efficiency.
We present two new algorithms, DP-FEST and DP-AdaFEST, that preserve gradient sparsity during private training of large embedding models.
arXiv Detail & Related papers (2023-11-14T17:59:51Z) - Enhancing Cross-Category Learning in Recommendation Systems with
Multi-Layer Embedding Training [2.4862527485819186]
Multi-layer embeddings training (MLET) trains embeddings using factorization of the embedding layer, with an inner dimension higher than the target embedding dimension.
MLET consistently produces better models, especially for rare items.
At constant model quality, MLET allows embedding dimension, and model size, reduction by up to 16x, and 5.8x on average.
arXiv Detail & Related papers (2023-09-27T09:32:10Z) - Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model [89.8764435351222]
We propose a new family of unbiased estimators called WTA-CRS, for matrix production with reduced variance.
Our work provides both theoretical and experimental evidence that, in the context of tuning transformers, our proposed estimators exhibit lower variance compared to existing ones.
arXiv Detail & Related papers (2023-05-24T15:52:08Z) - MTrainS: Improving DLRM training efficiency using heterogeneous memories [5.195887979684162]
In Deep Learning Recommendation Models (DLRM), sparse features capturing categorical inputs through embedding tables are the major contributors to model size and require high memory bandwidth.
In this paper, we study the bandwidth requirement and locality of embedding tables in real-world deployed models.
We then design MTrainS, which leverages heterogeneous memory, including byte and block addressable Storage Class Memory for DLRM hierarchically.
arXiv Detail & Related papers (2023-04-19T06:06:06Z) - Adaptive Cross Batch Normalization for Metric Learning [75.91093210956116]
Metric learning is a fundamental problem in computer vision.
We show that it is equally important to ensure that the accumulated embeddings are up to date.
In particular, it is necessary to circumvent the representational drift between the accumulated embeddings and the feature embeddings at the current training iteration.
arXiv Detail & Related papers (2023-03-30T03:22:52Z) - HET: Scaling out Huge Embedding Model Training via Cache-enabled
Distributed Framework [17.114812060566766]
We propose HET, a new system framework that significantly improves the scalability of huge embedding model training.
HET achieves up to 88% embedding communication reductions and up to 20.68x performance speedup over the state-of-the-art baselines.
arXiv Detail & Related papers (2021-12-14T08:18:10Z) - OSOA: One-Shot Online Adaptation of Deep Generative Models for Lossless
Compression [49.10945855716001]
We propose a novel setting that starts from a pretrained deep generative model and compresses the data batches while adapting the model with a dynamical system for only one epoch.
Experimental results show that vanilla OSOA can save significant time versus training bespoke models and space versus using one model for all targets.
arXiv Detail & Related papers (2021-11-02T15:18:25Z) - Mixed-Precision Embedding Using a Cache [3.0298877977523144]
We propose a novel change to embedding tables using a cache memory architecture, where the majority of rows in an embedding is trained in low precision.
For an open source deep learning recommendation model (DLRM) running with CriteoKaggle dataset, we achieve 3x memory reduction with INT8 precision embedding tables and full-precision cache.
For an industrial scale model and dataset, we achieve even higher >7x memory reduction with INT4 precision and cache size 1% of embedding tables.
arXiv Detail & Related papers (2020-10-21T20:49:54Z) - Training with Multi-Layer Embeddings for Model Reduction [0.9046327456472286]
We introduce a multi-layer embedding training architecture that trains embeddings via a sequence of linear layers.
We show that it allows reducing d by 4-8X, with a corresponding improvement in memory footprint, at given model accuracy.
arXiv Detail & Related papers (2020-06-10T02:47:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.