Bit-Mixer: Mixed-precision networks with runtime bit-width selection
- URL: http://arxiv.org/abs/2103.17267v1
- Date: Wed, 31 Mar 2021 17:58:47 GMT
- Title: Bit-Mixer: Mixed-precision networks with runtime bit-width selection
- Authors: Adrian Bulat and Georgios Tzimiropoulos
- Abstract summary: Bit-Mixer is the first method to train a meta-quantized network where during test time any layer can change its bid-width without affecting the overall network's ability for highly accurate inference.
We show that our method can result in mixed precision networks that exhibit the desirable flexibility properties for on-device deployment without compromising accuracy.
- Score: 72.32693989093558
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mixed-precision networks allow for a variable bit-width quantization for
every layer in the network. A major limitation of existing work is that the
bit-width for each layer must be predefined during training time. This allows
little flexibility if the characteristics of the device on which the network is
deployed change during runtime. In this work, we propose Bit-Mixer, the very
first method to train a meta-quantized network where during test time any layer
can change its bid-width without affecting at all the overall network's ability
for highly accurate inference. To this end, we make 2 key contributions: (a)
Transitional Batch-Norms, and (b) a 3-stage optimization process which is shown
capable of training such a network. We show that our method can result in mixed
precision networks that exhibit the desirable flexibility properties for
on-device deployment without compromising accuracy. Code will be made
available.
Related papers
- Dual-Path Adversarial Lifting for Domain Shift Correction in Online Test-time Adaptation [59.18151483767509]
We introduce a dual-path token lifting for domain shift correction in test time adaptation.
We then perform dual-path lifting with interleaved token prediction and update between the path of domain shift tokens and the path of class tokens.
Experimental results on the benchmark datasets demonstrate that our proposed method significantly improves the online fully test-time domain adaptation performance.
arXiv Detail & Related papers (2024-08-26T02:33:47Z) - QBitOpt: Fast and Accurate Bitwidth Reallocation during Training [19.491778184055118]
Quantizing neural networks is one of the most effective methods for achieving efficient inference on mobile and embedded devices.
We propose QBitOpt, a novel algorithm for updating bitwidths during quantization-aware training.
We evaluate QBitOpt on ImageNet and confirm that we outperform existing fixed and mixed-precision methods under average bitwidth constraints.
arXiv Detail & Related papers (2023-07-10T13:01:08Z) - A Practical Mixed Precision Algorithm for Post-Training Quantization [15.391257986051249]
Mixed-precision quantization is a promising solution to find a better performance-efficiency trade-off than homogeneous quantization.
We present a simple post-training mixed precision algorithm that only requires a small unlabeled calibration dataset.
We show that we can find mixed precision networks that provide a better trade-off between accuracy and efficiency than their homogeneous bit-width equivalents.
arXiv Detail & Related papers (2023-02-10T17:47:54Z) - Channel-wise Mixed-precision Assignment for DNN Inference on Constrained
Edge Nodes [22.40937602825472]
State-of-the-art mixed-precision works layer-wise, i.e., it uses different bit-widths for the weights and activations tensors of each network layer.
We propose a novel NAS that selects the bit-width of each weight tensor channel independently.
Our networks reduce the memory and energy for inference by up to 63% and 27% respectively.
arXiv Detail & Related papers (2022-06-17T15:51:49Z) - Arbitrary Bit-width Network: A Joint Layer-Wise Quantization and
Adaptive Inference Approach [38.03309300383544]
We propose to feed different data samples with varying quantization schemes to achieve a data-dependent dynamic inference, at a fine-grained layer level.
We present the Arbitrary Bit-width Network (ABN), where the bit-widths of a single deep network can change at runtime for different data samples, with a layer-wise granularity.
On ImageNet classification, we achieve 1.1% top1 accuracy improvement while saving 36.2% BitOps.
arXiv Detail & Related papers (2022-04-21T09:36:43Z) - All at Once Network Quantization via Collaborative Knowledge Transfer [56.95849086170461]
We develop a novel collaborative knowledge transfer approach for efficiently training the all-at-once quantization network.
Specifically, we propose an adaptive selection strategy to choose a high-precision enquoteteacher for transferring knowledge to the low-precision student.
To effectively transfer knowledge, we develop a dynamic block swapping method by randomly replacing the blocks in the lower-precision student network with the corresponding blocks in the higher-precision teacher network.
arXiv Detail & Related papers (2021-03-02T03:09:03Z) - Enabling certification of verification-agnostic networks via
memory-efficient semidefinite programming [97.40955121478716]
We propose a first-order dual SDP algorithm that requires memory only linear in the total number of network activations.
We significantly improve L-inf verified robust accuracy from 1% to 88% and 6% to 40% respectively.
We also demonstrate tight verification of a quadratic stability specification for the decoder of a variational autoencoder.
arXiv Detail & Related papers (2020-10-22T12:32:29Z) - Rethinking Differentiable Search for Mixed-Precision Neural Networks [83.55785779504868]
Low-precision networks with weights and activations quantized to low bit-width are widely used to accelerate inference on edge devices.
Current solutions are uniform, using identical bit-width for all filters.
This fails to account for the different sensitivities of different filters and is suboptimal.
Mixed-precision networks address this problem, by tuning the bit-width to individual filter requirements.
arXiv Detail & Related papers (2020-04-13T07:02:23Z) - Network Adjustment: Channel Search Guided by FLOPs Utilization Ratio [101.84651388520584]
This paper presents a new framework named network adjustment, which considers network accuracy as a function of FLOPs.
Experiments on standard image classification datasets and a wide range of base networks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2020-04-06T15:51:00Z) - Switchable Precision Neural Networks [35.2752928147013]
Switchable Precision neural Networks (SP-Nets) are proposed to train a shared network capable of operating at multiple quantization levels.
At runtime, the network can adjust its precision on the fly according to instant memory, latency, power consumption and accuracy demands.
arXiv Detail & Related papers (2020-02-07T14:43:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.