Adaptive Low-Precision Training for Embeddings in Click-Through Rate
Prediction
- URL: http://arxiv.org/abs/2212.05735v1
- Date: Mon, 12 Dec 2022 07:19:14 GMT
- Title: Adaptive Low-Precision Training for Embeddings in Click-Through Rate
Prediction
- Authors: Shiwei Li, Huifeng Guo, Lu Hou, Wei Zhang, Xing Tang, Ruiming Tang,
Rui Zhang, Ruixuan Li
- Abstract summary: Embedding tables are usually huge in click-through rate (CTR) prediction models.
We formulate a novel quantization training paradigm to compress the embeddings from the training stage, termed low-precision training.
For the first time in CTR models, we successfully train 8-bit embeddings without sacrificing prediction accuracy.
- Score: 36.605153166169224
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Embedding tables are usually huge in click-through rate (CTR) prediction
models. To train and deploy the CTR models efficiently and economically, it is
necessary to compress their embedding tables at the training stage. To this
end, we formulate a novel quantization training paradigm to compress the
embeddings from the training stage, termed low-precision training (LPT). Also,
we provide theoretical analysis on its convergence. The results show that
stochastic weight quantization has a faster convergence rate and a smaller
convergence error than deterministic weight quantization in LPT. Further, to
reduce the accuracy degradation, we propose adaptive low-precision training
(ALPT) that learns the step size (i.e., the quantization resolution) through
gradient descent. Experiments on two real-world datasets confirm our analysis
and show that ALPT can significantly improve the prediction accuracy,
especially at extremely low bit widths. For the first time in CTR models, we
successfully train 8-bit embeddings without sacrificing prediction accuracy.
The code of ALPT is publicly available.
Related papers
- Scaling Laws for Precision [73.24325358259753]
We devise "precision-aware" scaling laws for both training and inference.
For inference, we find that the degradation introduced by post-training quantization increases as models are trained on more data.
For training, our scaling laws allow us to predict the loss of a model with different parts in different precisions.
arXiv Detail & Related papers (2024-11-07T00:10:10Z) - OAC: Output-adaptive Calibration for Accurate Post-training Quantization [30.115888331426515]
Post-training Quantization (PTQ) techniques have been developed to compress Large Language Models (LLMs)
Most PTQ approaches formulate the quantization error based on a calibrated layer-wise $ell$ loss.
We propose Output-adaptive (OAC) to incorporate the model output in the calibration process.
arXiv Detail & Related papers (2024-05-23T20:01:17Z) - Learning Sample Difficulty from Pre-trained Models for Reliable
Prediction [55.77136037458667]
We propose to utilize large-scale pre-trained models to guide downstream model training with sample difficulty-aware entropy regularization.
We simultaneously improve accuracy and uncertainty calibration across challenging benchmarks.
arXiv Detail & Related papers (2023-04-20T07:29:23Z) - Bridging Precision and Confidence: A Train-Time Loss for Calibrating
Object Detection [58.789823426981044]
We propose a novel auxiliary loss formulation that aims to align the class confidence of bounding boxes with the accurateness of predictions.
Our results reveal that our train-time loss surpasses strong calibration baselines in reducing calibration error for both in and out-domain scenarios.
arXiv Detail & Related papers (2023-03-25T08:56:21Z) - SQuAT: Sharpness- and Quantization-Aware Training for BERT [43.049102196902844]
We propose sharpness- and quantization-aware training (SQuAT)
Our method can consistently outperform state-of-the-art quantized BERT models under 2, 3, and 4-bit settings by 1%.
Our experiments on empirical measurement of sharpness also suggest that our method would lead to flatter minima compared to other quantization methods.
arXiv Detail & Related papers (2022-10-13T16:52:19Z) - CowClip: Reducing CTR Prediction Model Training Time from 12 hours to 10
minutes on 1 GPU [14.764217935910988]
A click-through rate (CTR) prediction task is to predict whether a user will click on the recommended item.
One approach to increase the training speed is to apply large batch training.
We develop the adaptive Column-wise Clipping (CowClip) to stabilize the training process in a large batch size setting.
arXiv Detail & Related papers (2022-04-13T08:17:15Z) - Quantization-Guided Training for Compact TinyML Models [8.266286436571887]
We propose a Quantization Guided Training (QGT) method to guide DNN training towards optimized low-bit-precision targets.
QGT uses customized regularization to encourage weight values towards a distribution that maximizes accuracy while reducing quantization errors.
arXiv Detail & Related papers (2021-03-10T18:06:05Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z) - TRP: Trained Rank Pruning for Efficient Deep Neural Networks [69.06699632822514]
We propose Trained Rank Pruning (TRP), which alternates between low rank approximation and training.
A nuclear regularization optimized by sub-gradient descent is utilized to further promote low rank in TRP.
The TRP trained network inherently has a low-rank structure, and is approximated with negligible performance loss.
arXiv Detail & Related papers (2020-04-30T03:37:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.