SEAM: Searching Transferable Mixed-Precision Quantization Policy through
Large Margin Regularization
- URL: http://arxiv.org/abs/2302.06845v2
- Date: Wed, 23 Aug 2023 03:56:24 GMT
- Title: SEAM: Searching Transferable Mixed-Precision Quantization Policy through
Large Margin Regularization
- Authors: Chen Tang, Kai Ouyang, Zenghao Chai, Yunpeng Bai, Yuan Meng, Zhi Wang,
Wenwu Zhu
- Abstract summary: Mixed-precision quantization (MPQ) suffers from the time-consuming process of searching the optimal bit-width allocation for each layer.
This paper proposes a novel method for efficiently searching for effective MPQ policies using a small proxy dataset.
- Score: 50.04951511146338
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Mixed-precision quantization (MPQ) suffers from the time-consuming process of
searching the optimal bit-width allocation i.e., the policy) for each layer,
especially when using large-scale datasets such as ISLVRC-2012. This limits the
practicality of MPQ in real-world deployment scenarios. To address this issue,
this paper proposes a novel method for efficiently searching for effective MPQ
policies using a small proxy dataset instead of the large-scale dataset used
for training the model. Deviating from the established norm of employing a
consistent dataset for both model training and MPQ policy search stages, our
approach, therefore, yields a substantial enhancement in the efficiency of MPQ
exploration. Nonetheless, using discrepant datasets poses challenges in
searching for a transferable MPQ policy. Driven by the observation that
quantization noise of sub-optimal policy exerts a detrimental influence on the
discriminability of feature representations -- manifesting as diminished class
margins and ambiguous decision boundaries -- our method aims to identify
policies that uphold the discriminative nature of feature representations,
i.e., intra-class compactness and inter-class separation. This general and
dataset-independent property makes us search for the MPQ policy over a rather
small-scale proxy dataset and then the policy can be directly used to quantize
the model trained on a large-scale dataset. Our method offers several
advantages, including high proxy data utilization, no excessive hyper-parameter
tuning, and high searching efficiency. We search high-quality MPQ policies with
the proxy dataset that has only 4% of the data scale compared to the
large-scale target dataset, achieving the same accuracy as searching directly
on the latter, improving MPQ searching efficiency by up to 300 times.
Related papers
- IMLE Policy: Fast and Sample Efficient Visuomotor Policy Learning via Implicit Maximum Likelihood Estimation [3.7584322469996896]
IMLE Policy is a novel behaviour cloning approach based on Implicit Maximum Likelihood Estimation (IMLE)
It excels in low-data regimes, effectively learning from minimal demonstrations and requiring 38% less data on average to match the performance of baseline methods in learning complex multi-modal behaviours.
We validate our approach across diverse manipulation tasks in simulated and real-world environments, showcasing its ability to capture complex behaviours under data constraints.
arXiv Detail & Related papers (2025-02-17T23:22:49Z) - Data Selection via Optimal Control for Language Models [134.67665351539725]
This work investigates the selection of high-quality pre-training data from massive corpora to enhance LMs' capabilities for downstream usage.
We introduce PMP-based Data Selection (PDS), a framework that approximates optimal data selection by solving the PMP conditions.
The benefits of PDS extend to 400B models trained on 10T tokens, as evidenced by the extrapolation of the test loss curves according to the Scaling Laws.
arXiv Detail & Related papers (2024-10-09T17:06:57Z) - Minimally Supervised Learning using Topological Projections in
Self-Organizing Maps [55.31182147885694]
We introduce a semi-supervised learning approach based on topological projections in self-organizing maps (SOMs)
Our proposed method first trains SOMs on unlabeled data and then a minimal number of available labeled data points are assigned to key best matching units (BMU)
Our results indicate that the proposed minimally supervised model significantly outperforms traditional regression techniques.
arXiv Detail & Related papers (2024-01-12T22:51:48Z) - MeaeQ: Mount Model Extraction Attacks with Efficient Queries [6.1106195466129485]
We study model extraction attacks in natural language processing (NLP)
We propose MeaeQ, a straightforward yet effective method to address these issues.
MeaeQ achieves higher functional similarity to the victim model than baselines while requiring fewer queries.
arXiv Detail & Related papers (2023-10-21T16:07:16Z) - SDQ: Stochastic Differentiable Quantization with Mixed Precision [46.232003346732064]
We present a novel Differentiable Quantization (SDQ) method that can automatically learn the MPQ strategy.
After the optimal MPQ strategy is acquired, we train our network with entropy-aware bin regularization and knowledge distillation.
SDQ outperforms all state-of-the-art mixed datasets or single precision quantization with a lower bitwidth.
arXiv Detail & Related papers (2022-06-09T12:38:18Z) - Generalizable Mixed-Precision Quantization via Attribution Rank
Preservation [90.26603048354575]
We propose a generalizable mixed-precision quantization (GMPQ) method for efficient inference.
Our method obtains competitive accuracy-complexity trade-off compared with the state-of-the-art mixed-precision networks.
arXiv Detail & Related papers (2021-08-05T16:41:57Z) - Noise-Resistant Deep Metric Learning with Probabilistic Instance
Filtering [59.286567680389766]
Noisy labels are commonly found in real-world data, which cause performance degradation of deep neural networks.
We propose Probabilistic Ranking-based Instance Selection with Memory (PRISM) approach for DML.
PRISM calculates the probability of a label being clean, and filters out potentially noisy samples.
arXiv Detail & Related papers (2021-08-03T12:15:25Z) - Sparse Feature Selection Makes Batch Reinforcement Learning More Sample
Efficient [62.24615324523435]
This paper provides a statistical analysis of high-dimensional batch Reinforcement Learning (RL) using sparse linear function approximation.
When there is a large number of candidate features, our result sheds light on the fact that sparsity-aware methods can make batch RL more sample efficient.
arXiv Detail & Related papers (2020-11-08T16:48:02Z) - DeepSampling: Selectivity Estimation with Predicted Error and Response
Time [7.23389716633927]
This paper proposes DeepSampling, a deep-learning-based model that predicts the accuracy of a sample-based AQP algorithm.
DeepSampling is the first system that provides a reliable tool for existing spatial databases to control the accuracy of AQP.
arXiv Detail & Related papers (2020-08-16T03:23:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.