Search What You Want: Barrier Panelty NAS for Mixed Precision
Quantization
- URL: http://arxiv.org/abs/2007.10026v1
- Date: Mon, 20 Jul 2020 12:00:48 GMT
- Title: Search What You Want: Barrier Panelty NAS for Mixed Precision
Quantization
- Authors: Haibao Yu, Qi Han, Jianbo Li, Jianping Shi, Guangliang Cheng, Bin Fan
- Abstract summary: We propose a novel Barrier Penalty based NAS (BP-NAS) for mixed precision quantization.
BP-NAS sets new state of the arts on both classification (Cifar-10, ImageNet) and detection (COCO)
- Score: 51.26579110596767
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Emergent hardwares can support mixed precision CNN models inference that
assign different bitwidths for different layers. Learning to find an optimal
mixed precision model that can preserve accuracy and satisfy the specific
constraints on model size and computation is extremely challenge due to the
difficult in training a mixed precision model and the huge space of all
possible bit quantizations. In this paper, we propose a novel soft Barrier
Penalty based NAS (BP-NAS) for mixed precision quantization, which ensures all
the searched models are inside the valid domain defined by the complexity
constraint, thus could return an optimal model under the given constraint by
conducting search only one time. The proposed soft Barrier Penalty is
differentiable and can impose very large losses to those models outside the
valid domain while almost no punishment for models inside the valid domain,
thus constraining the search only in the feasible domain. In addition, a
differentiable Prob-1 regularizer is proposed to ensure learning with NAS is
reasonable. A distribution reshaping training strategy is also used to make
training more stable. BP-NAS sets new state of the arts on both classification
(Cifar-10, ImageNet) and detection (COCO), surpassing all the efficient mixed
precision methods designed manually and automatically. Particularly, BP-NAS
achieves higher mAP (up to 2.7\% mAP improvement) together with lower bit
computation cost compared with the existing best mixed precision model on COCO
detection.
Related papers
- Gradient-free variational learning with conditional mixture networks [39.827869318925494]
Conditional mixture networks (CMNs) are suitable for fast, gradient-free inference and can solve complex classification tasks.
We validate this approach by training two-layer CMNs on standard benchmarks from the UCI repository.
Our method, CAVI-CMN, achieves competitive and often superior predictive accuracy compared to maximum likelihood estimation (MLE) with backpropagation.
arXiv Detail & Related papers (2024-08-29T10:43:55Z) - Towards Continual Learning Desiderata via HSIC-Bottleneck
Orthogonalization and Equiangular Embedding [55.107555305760954]
We propose a conceptually simple yet effective method that attributes forgetting to layer-wise parameter overwriting and the resulting decision boundary distortion.
Our method achieves competitive accuracy performance, even with absolute superiority of zero exemplar buffer and 1.02x the base model.
arXiv Detail & Related papers (2024-01-17T09:01:29Z) - Augmenting Hessians with Inter-Layer Dependencies for Mixed-Precision
Post-Training Quantization [7.392278887917975]
We propose a mixed-precision post training quantization approach that assigns different numerical precisions to tensors in a network based on their specific needs.
Our experiments demonstrate latency reductions compared to a 16-bit baseline of $25.48%$, $21.69%$, and $33.28%$ respectively.
arXiv Detail & Related papers (2023-06-08T02:18:58Z) - Mixed Precision Post Training Quantization of Neural Networks with
Sensitivity Guided Search [7.392278887917975]
Mixed-precision quantization allows different tensors to be quantized to varying levels of numerical precision.
We evaluate our method for computer vision and natural language processing and demonstrate latency reductions of up to 27.59% and 34.31%.
arXiv Detail & Related papers (2023-02-02T19:30:00Z) - Model Architecture Adaption for Bayesian Neural Networks [9.978961706999833]
We show a novel network architecture search (NAS) that optimize BNNs for both accuracy and uncertainty.
In our experiments, the searched models show comparable uncertainty ability and accuracy compared to the state-of-the-art (deep ensemble)
arXiv Detail & Related papers (2022-02-09T10:58:50Z) - Automatic Mixed-Precision Quantization Search of BERT [62.65905462141319]
Pre-trained language models such as BERT have shown remarkable effectiveness in various natural language processing tasks.
These models usually contain millions of parameters, which prevents them from practical deployment on resource-constrained devices.
We propose an automatic mixed-precision quantization framework designed for BERT that can simultaneously conduct quantization and pruning in a subgroup-wise level.
arXiv Detail & Related papers (2021-12-30T06:32:47Z) - NAS-BERT: Task-Agnostic and Adaptive-Size BERT Compression with Neural
Architecture Search [100.71365025972258]
We propose NAS-BERT, an efficient method for BERT compression.
NAS-BERT trains a big supernet on a search space and outputs multiple compressed models with adaptive sizes and latency.
Experiments on GLUE and SQuAD benchmark datasets demonstrate that NAS-BERT can find lightweight models with better accuracy than previous approaches.
arXiv Detail & Related papers (2021-05-30T07:20:27Z) - Effective and Fast: A Novel Sequential Single Path Search for
Mixed-Precision Quantization [45.22093693422085]
Mixed-precision quantization model can match different quantization bit-precisions according to the sensitivity of different layers to achieve great performance.
It is a difficult problem to quickly determine the quantization bit-precision of each layer in deep neural networks according to some constraints.
We propose a novel sequential single path search (SSPS) method for mixed-precision quantization.
arXiv Detail & Related papers (2021-03-04T09:15:08Z) - Uncertainty Estimation Using a Single Deep Deterministic Neural Network [66.26231423824089]
We propose a method for training a deterministic deep model that can find and reject out of distribution data points at test time with a single forward pass.
We scale training in these with a novel loss function and centroid updating scheme and match the accuracy of softmax models.
arXiv Detail & Related papers (2020-03-04T12:27:36Z) - Triple Wins: Boosting Accuracy, Robustness and Efficiency Together by
Enabling Input-Adaptive Inference [119.19779637025444]
Deep networks were recently suggested to face the odds between accuracy (on clean natural images) and robustness (on adversarially perturbed images)
This paper studies multi-exit networks associated with input-adaptive inference, showing their strong promise in achieving a "sweet point" in cooptimizing model accuracy, robustness and efficiency.
arXiv Detail & Related papers (2020-02-24T00:40:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.