Finding the Task-Optimal Low-Bit Sub-Distribution in Deep Neural
Networks
- URL: http://arxiv.org/abs/2112.15139v2
- Date: Mon, 3 Jan 2022 04:40:10 GMT
- Title: Finding the Task-Optimal Low-Bit Sub-Distribution in Deep Neural
Networks
- Authors: Runpei Dong, Zhanhong Tan, Mengdi Wu, Linfeng Zhang, Kaisheng Ma
- Abstract summary: Quantized neural networks typically require smaller memory footprints and lower computation complexity, which is crucial for efficient deployment.
We present an adaptive-mapping quantization method to learn an optimal latent sub-distribution that is inherent within models.
Experiments on image classification and object detection over various modern architectures demonstrate the effectiveness, generalization property, and transferability of the proposed method.
- Score: 10.278350434623107
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Quantized neural networks typically require smaller memory footprints and
lower computation complexity, which is crucial for efficient deployment.
However, quantization inevitably leads to a distribution divergence from the
original network, which generally degrades the performance. To tackle this
issue, massive efforts have been made, but most existing approaches lack
statistical considerations and depend on several manual configurations. In this
paper, we present an adaptive-mapping quantization method to learn an optimal
latent sub-distribution that is inherent within models and smoothly
approximated with a concrete Gaussian Mixture (GM). In particular, the network
weights are projected in compliance with the GM-approximated sub-distribution.
This sub-distribution evolves along with the weight update in a co-tuning
schema guided by the direct task-objective optimization. Sufficient experiments
on image classification and object detection over various modern architectures
demonstrate the effectiveness, generalization property, and transferability of
the proposed method. Besides, an efficient deployment flow for the mobile CPU
is developed, achieving up to 7.46$\times$ inference acceleration on an
octa-core ARM CPU. Codes are publicly released at
https://github.com/RunpeiDong/DGMS.
Related papers
- Post-Training Quantization for Re-parameterization via Coarse & Fine
Weight Splitting [13.270381125055275]
We propose a coarse & fine weight splitting (CFWS) method to reduce quantization error of weight.
We develop an improved KL metric to determine optimal quantization scales for activation.
For example, the quantized RepVGG-A1 model exhibits a mere 0.3% accuracy loss.
arXiv Detail & Related papers (2023-12-17T02:31:20Z) - Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures.
This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead.
We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z) - AMED: Automatic Mixed-Precision Quantization for Edge Devices [3.5223695602582614]
Quantized neural networks are well known for reducing the latency, power consumption, and model size without significant harm to the performance.
Mixed-precision quantization offers better utilization of customized hardware that supports arithmetic operations at different bitwidths.
arXiv Detail & Related papers (2022-05-30T21:23:22Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - ECQ$^{\text{x}}$: Explainability-Driven Quantization for Low-Bit and
Sparse DNNs [13.446502051609036]
We develop and describe a novel quantization paradigm for deep neural networks (DNNs)
Our method leverages concepts of explainable AI (XAI) and concepts of information theory.
The ultimate goal is to preserve the most relevant weights in quantization clusters of highest information content.
arXiv Detail & Related papers (2021-09-09T12:57:06Z) - Multi-objective Evolutionary Approach for Efficient Kernel Size and
Shape for CNN [12.697368516837718]
State-of-the-art development in CNN topology, such as VGGNet and ResNet, have become increasingly accurate.
These networks are computationally expensive involving billions of arithmetic operations and parameters.
This paper considers optimising the computational resource consumption by reducing the size and number of kernels in convolutional layers.
arXiv Detail & Related papers (2021-06-28T14:47:29Z) - Manifold Regularized Dynamic Network Pruning [102.24146031250034]
This paper proposes a new paradigm that dynamically removes redundant filters by embedding the manifold information of all instances into the space of pruned networks.
The effectiveness of the proposed method is verified on several benchmarks, which shows better performance in terms of both accuracy and computational cost.
arXiv Detail & Related papers (2021-03-10T03:59:03Z) - Attentive Gaussian processes for probabilistic time-series generation [4.94950858749529]
We propose a computationally efficient attention-based network combined with the Gaussian process regression to generate real-valued sequence.
We develop a block-wise training algorithm to allow mini-batch training of the network while the GP is trained using full-batch.
The algorithm has been proved to converge and shows comparable, if not better, quality of the found solution.
arXiv Detail & Related papers (2021-02-10T01:19:15Z) - AQD: Towards Accurate Fully-Quantized Object Detection [94.06347866374927]
We propose an Accurate Quantized object Detection solution, termed AQD, to get rid of floating-point computation.
Our AQD achieves comparable or even better performance compared with the full-precision counterpart under extremely low-bit schemes.
arXiv Detail & Related papers (2020-07-14T09:07:29Z) - Evolving Normalization-Activation Layers [100.82879448303805]
We develop efficient rejection protocols to quickly filter out candidate layers that do not work well.
Our method leads to the discovery of EvoNorms, a set of new normalization-activation layers with novel, and sometimes surprising structures.
Our experiments show that EvoNorms work well on image classification models including ResNets, MobileNets and EfficientNets.
arXiv Detail & Related papers (2020-04-06T19:52:48Z) - Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks.
We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.