SEAM: Searching Transferable Mixed-Precision Quantization Policy through
Large Margin Regularization
- URL: http://arxiv.org/abs/2302.06845v2
- Date: Wed, 23 Aug 2023 03:56:24 GMT
- Title: SEAM: Searching Transferable Mixed-Precision Quantization Policy through
Large Margin Regularization
- Authors: Chen Tang, Kai Ouyang, Zenghao Chai, Yunpeng Bai, Yuan Meng, Zhi Wang,
Wenwu Zhu
- Abstract summary: Mixed-precision quantization (MPQ) suffers from the time-consuming process of searching the optimal bit-width allocation for each layer.
This paper proposes a novel method for efficiently searching for effective MPQ policies using a small proxy dataset.
- Score: 50.04951511146338
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Mixed-precision quantization (MPQ) suffers from the time-consuming process of
searching the optimal bit-width allocation i.e., the policy) for each layer,
especially when using large-scale datasets such as ISLVRC-2012. This limits the
practicality of MPQ in real-world deployment scenarios. To address this issue,
this paper proposes a novel method for efficiently searching for effective MPQ
policies using a small proxy dataset instead of the large-scale dataset used
for training the model. Deviating from the established norm of employing a
consistent dataset for both model training and MPQ policy search stages, our
approach, therefore, yields a substantial enhancement in the efficiency of MPQ
exploration. Nonetheless, using discrepant datasets poses challenges in
searching for a transferable MPQ policy. Driven by the observation that
quantization noise of sub-optimal policy exerts a detrimental influence on the
discriminability of feature representations -- manifesting as diminished class
margins and ambiguous decision boundaries -- our method aims to identify
policies that uphold the discriminative nature of feature representations,
i.e., intra-class compactness and inter-class separation. This general and
dataset-independent property makes us search for the MPQ policy over a rather
small-scale proxy dataset and then the policy can be directly used to quantize
the model trained on a large-scale dataset. Our method offers several
advantages, including high proxy data utilization, no excessive hyper-parameter
tuning, and high searching efficiency. We search high-quality MPQ policies with
the proxy dataset that has only 4% of the data scale compared to the
large-scale target dataset, achieving the same accuracy as searching directly
on the latter, improving MPQ searching efficiency by up to 300 times.
Related papers
- Data Selection via Optimal Control for Language Models [134.67665351539725]
This work investigates the selection of high-quality pre-training data from massive corpora to enhance LMs' capabilities for downstream usage.
We introduce PMP-based Data Selection (PDS), a framework that approximates optimal data selection by solving the PMP conditions.
The benefits of PDS extend to 400B models trained on 10T tokens, as evidenced by the extrapolation of the test loss curves according to the Scaling Laws.
arXiv Detail & Related papers (2024-10-09T17:06:57Z) - Minimally Supervised Learning using Topological Projections in
Self-Organizing Maps [55.31182147885694]
We introduce a semi-supervised learning approach based on topological projections in self-organizing maps (SOMs)
Our proposed method first trains SOMs on unlabeled data and then a minimal number of available labeled data points are assigned to key best matching units (BMU)
Our results indicate that the proposed minimally supervised model significantly outperforms traditional regression techniques.
arXiv Detail & Related papers (2024-01-12T22:51:48Z) - MeaeQ: Mount Model Extraction Attacks with Efficient Queries [6.1106195466129485]
We study model extraction attacks in natural language processing (NLP)
We propose MeaeQ, a straightforward yet effective method to address these issues.
MeaeQ achieves higher functional similarity to the victim model than baselines while requiring fewer queries.
arXiv Detail & Related papers (2023-10-21T16:07:16Z) - An Improved Data Augmentation Scheme for Model Predictive Control Policy
Approximation [0.0]
A sensitivity-based data augmentation framework for MPC policy approximation was proposed.
The error due to augmenting the training data set with inexact samples was shown to increase with the size of the neighborhood.
This paper presents an improved data augmentation scheme based on predictor-corrector steps that enforces a user-defined level of accuracy.
arXiv Detail & Related papers (2023-03-09T22:16:47Z) - SDQ: Stochastic Differentiable Quantization with Mixed Precision [46.232003346732064]
We present a novel Differentiable Quantization (SDQ) method that can automatically learn the MPQ strategy.
After the optimal MPQ strategy is acquired, we train our network with entropy-aware bin regularization and knowledge distillation.
SDQ outperforms all state-of-the-art mixed datasets or single precision quantization with a lower bitwidth.
arXiv Detail & Related papers (2022-06-09T12:38:18Z) - Generalizable Mixed-Precision Quantization via Attribution Rank
Preservation [90.26603048354575]
We propose a generalizable mixed-precision quantization (GMPQ) method for efficient inference.
Our method obtains competitive accuracy-complexity trade-off compared with the state-of-the-art mixed-precision networks.
arXiv Detail & Related papers (2021-08-05T16:41:57Z) - Noise-Resistant Deep Metric Learning with Probabilistic Instance
Filtering [59.286567680389766]
Noisy labels are commonly found in real-world data, which cause performance degradation of deep neural networks.
We propose Probabilistic Ranking-based Instance Selection with Memory (PRISM) approach for DML.
PRISM calculates the probability of a label being clean, and filters out potentially noisy samples.
arXiv Detail & Related papers (2021-08-03T12:15:25Z) - Sparse Feature Selection Makes Batch Reinforcement Learning More Sample
Efficient [62.24615324523435]
This paper provides a statistical analysis of high-dimensional batch Reinforcement Learning (RL) using sparse linear function approximation.
When there is a large number of candidate features, our result sheds light on the fact that sparsity-aware methods can make batch RL more sample efficient.
arXiv Detail & Related papers (2020-11-08T16:48:02Z) - DeepSampling: Selectivity Estimation with Predicted Error and Response
Time [7.23389716633927]
This paper proposes DeepSampling, a deep-learning-based model that predicts the accuracy of a sample-based AQP algorithm.
DeepSampling is the first system that provides a reliable tool for existing spatial databases to control the accuracy of AQP.
arXiv Detail & Related papers (2020-08-16T03:23:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.