Related papers: Fine-Grained Embedding Dimension Optimization During Training for Recommender Systems

Fine-Grained Embedding Dimension Optimization During Training for Recommender Systems

URL: http://arxiv.org/abs/2401.04408v1
Date: Tue, 9 Jan 2024 08:04:11 GMT
Title: Fine-Grained Embedding Dimension Optimization During Training for Recommender Systems
Authors: Qinyi Luo, Penghan Wang, Wei Zhang, Fan Lai, Jiachen Mao, Xiaohan Wei, Jun Song, Wei-Yu Tsai, Shuai Yang, Yuxi Hu and Xuehai Qian
Abstract summary: FIne-grained In-Training Embedding Dimension optimization (FIITED) Experiments on two industry models show that FIITED is able to reduce the size of embeddings by more than 65% while maintaining the trained model's quality.
Score: 18.125952266473533
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Huge embedding tables in modern Deep Learning Recommender Models (DLRM) require prohibitively large memory during training and inference. Aiming to reduce the memory footprint of training, this paper proposes FIne-grained In-Training Embedding Dimension optimization (FIITED). Given the observation that embedding vectors are not equally important, FIITED adjusts the dimension of each individual embedding vector continuously during training, assigning longer dimensions to more important embeddings while adapting to dynamic changes in data. A novel embedding storage system based on virtually-hashed physically-indexed hash tables is designed to efficiently implement the embedding dimension adjustment and effectively enable memory saving. Experiments on two industry models show that FIITED is able to reduce the size of embeddings by more than 65% while maintaining the trained model's quality, saving significantly more memory than a state-of-the-art in-training embedding pruning method. On public click-through rate prediction datasets, FIITED is able to prune up to 93.75%-99.75% embeddings without significant accuracy loss.

Related papers

Hollowed Net for On-Device Personalization of Text-to-Image Diffusion Models [51.3915762595891]
This paper presents an efficient LoRA-based personalization approach for on-device subject-driven generation. Our method, termed Hollowed Net, enhances memory efficiency during fine-tuning by modifying the architecture of a diffusion U-Net.
arXiv Detail & Related papers (2024-11-02T08:42:48Z)
Dr$^2$Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning [81.0108753452546]
We propose Dynamic Reversible Dual-Residual Networks, or Dr$2$Net, to finetune a pretrained model with substantially reduced memory consumption. Dr$2$Net contains two types of residual connections, one maintaining the residual structure in the pretrained models, and the other making the network reversible. We show that Dr$2$Net can reach comparable performance to conventional finetuning but with significantly less memory usage.
arXiv Detail & Related papers (2024-01-08T18:59:31Z)
Sparsity-Preserving Differentially Private Training of Large Embedding Models [67.29926605156788]
DP-SGD is a training algorithm that combines differential privacy with gradient descent. Applying DP-SGD naively to embedding models can destroy gradient sparsity, leading to reduced training efficiency. We present two new algorithms, DP-FEST and DP-AdaFEST, that preserve gradient sparsity during private training of large embedding models.
arXiv Detail & Related papers (2023-11-14T17:59:51Z)
Enhancing Cross-Category Learning in Recommendation Systems with Multi-Layer Embedding Training [2.4862527485819186]
Multi-layer embeddings training (MLET) trains embeddings using factorization of the embedding layer, with an inner dimension higher than the target embedding dimension. MLET consistently produces better models, especially for rare items. At constant model quality, MLET allows embedding dimension, and model size, reduction by up to 16x, and 5.8x on average.
arXiv Detail & Related papers (2023-09-27T09:32:10Z)
Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model [89.8764435351222]
We propose a new family of unbiased estimators called WTA-CRS, for matrix production with reduced variance. Our work provides both theoretical and experimental evidence that, in the context of tuning transformers, our proposed estimators exhibit lower variance compared to existing ones.
arXiv Detail & Related papers (2023-05-24T15:52:08Z)
MTrainS: Improving DLRM training efficiency using heterogeneous memories [5.195887979684162]
In Deep Learning Recommendation Models (DLRM), sparse features capturing categorical inputs through embedding tables are the major contributors to model size and require high memory bandwidth. In this paper, we study the bandwidth requirement and locality of embedding tables in real-world deployed models. We then design MTrainS, which leverages heterogeneous memory, including byte and block addressable Storage Class Memory for DLRM hierarchically.
arXiv Detail & Related papers (2023-04-19T06:06:06Z)
Adaptive Cross Batch Normalization for Metric Learning [75.91093210956116]
Metric learning is a fundamental problem in computer vision. We show that it is equally important to ensure that the accumulated embeddings are up to date. In particular, it is necessary to circumvent the representational drift between the accumulated embeddings and the feature embeddings at the current training iteration.
arXiv Detail & Related papers (2023-03-30T03:22:52Z)
HET: Scaling out Huge Embedding Model Training via Cache-enabled Distributed Framework [17.114812060566766]
We propose HET, a new system framework that significantly improves the scalability of huge embedding model training. HET achieves up to 88% embedding communication reductions and up to 20.68x performance speedup over the state-of-the-art baselines.
arXiv Detail & Related papers (2021-12-14T08:18:10Z)
OSOA: One-Shot Online Adaptation of Deep Generative Models for Lossless Compression [49.10945855716001]
We propose a novel setting that starts from a pretrained deep generative model and compresses the data batches while adapting the model with a dynamical system for only one epoch. Experimental results show that vanilla OSOA can save significant time versus training bespoke models and space versus using one model for all targets.
arXiv Detail & Related papers (2021-11-02T15:18:25Z)
Mixed-Precision Embedding Using a Cache [3.0298877977523144]
We propose a novel change to embedding tables using a cache memory architecture, where the majority of rows in an embedding is trained in low precision. For an open source deep learning recommendation model (DLRM) running with CriteoKaggle dataset, we achieve 3x memory reduction with INT8 precision embedding tables and full-precision cache. For an industrial scale model and dataset, we achieve even higher >7x memory reduction with INT4 precision and cache size 1% of embedding tables.
arXiv Detail & Related papers (2020-10-21T20:49:54Z)
Training with Multi-Layer Embeddings for Model Reduction [0.9046327456472286]
We introduce a multi-layer embedding training architecture that trains embeddings via a sequence of linear layers. We show that it allows reducing d by 4-8X, with a corresponding improvement in memory footprint, at given model accuracy.
arXiv Detail & Related papers (2020-06-10T02:47:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.