SPIQ: Data-Free Per-Channel Static Input Quantization
- URL: http://arxiv.org/abs/2203.14642v1
- Date: Mon, 28 Mar 2022 10:59:18 GMT
- Title: SPIQ: Data-Free Per-Channel Static Input Quantization
- Authors: Edouard Yvinec and Arnaud Dapogny and Matthieu Cord and Kevin Bailly
- Abstract summary: Methods for efficient inference have drawn a growing attention in the machine learning community.
In this work, we argue that static input quantization can reach the accuracy levels of dynamic methods by means of a per-channel input quantization scheme.
We show through a thorough empirical evaluation on multiple computer vision problems that the proposed method, dubbed SPIQ, achieves accuracies rivalling dynamic approaches with static-level inference speed.
- Score: 37.82255888371488
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Computationally expensive neural networks are ubiquitous in computer vision
and solutions for efficient inference have drawn a growing attention in the
machine learning community. Examples of such solutions comprise quantization,
i.e. converting the processing values (weights and inputs) from floating point
into integers e.g. int8 or int4. Concurrently, the rise of privacy concerns
motivated the study of less invasive acceleration methods, such as data-free
quantization of pre-trained models weights and activations. Previous approaches
either exploit statistical information to deduce scalar ranges and scaling
factors for the activations in a static manner, or dynamically adapt this range
on-the-fly for each input of each layers (also referred to as activations): the
latter generally being more accurate at the expanse of significantly slower
inference. In this work, we argue that static input quantization can reach the
accuracy levels of dynamic methods by means of a per-channel input quantization
scheme that allows one to more finely preserve cross-channel dynamics. We show
through a thorough empirical evaluation on multiple computer vision problems
(e.g. ImageNet classification, Pascal VOC object detection as well as
CityScapes semantic segmentation) that the proposed method, dubbed SPIQ,
achieves accuracies rivalling dynamic approaches with static-level inference
speed, significantly outperforming state-of-the-art quantization methods on
every benchmark.
Related papers
- Heterogeneous quantization regularizes spiking neural network activity [0.0]
We present a data-blind neuromorphic signal conditioning strategy whereby analog data are normalized and quantized into spike phase representations.
We extend this mechanism by adding a data-aware calibration step whereby the range and density of the quantization weights adapt to accumulated input statistics.
arXiv Detail & Related papers (2024-09-27T02:25:44Z) - Memory-Augmented Quantum Reservoir Computing [0.0]
We present a hybrid quantum-classical approach that implements memory through classical post-processing of quantum measurements.
We tested our model on two physical platforms: a fully connected Ising model and a Rydberg atom array.
arXiv Detail & Related papers (2024-09-15T22:44:09Z) - Compact Multi-Threshold Quantum Information Driven Ansatz For Strongly Interactive Lattice Spin Models [0.0]
We introduce a systematic procedure for ansatz building based on approximate Quantum Mutual Information (QMI)
Our approach generates a layered-structured ansatz, where each layer's qubit pairs are selected based on their QMI values, resulting in more efficient state preparation and optimization routines.
Our results show that the Multi-QIDA method reduces the computational complexity while maintaining high precision, making it a promising tool for quantum simulations in lattice spin models.
arXiv Detail & Related papers (2024-08-05T17:07:08Z) - Weight Re-Mapping for Variational Quantum Algorithms [54.854986762287126]
We introduce the concept of weight re-mapping for variational quantum circuits (VQCs)
We employ seven distinct weight re-mapping functions to assess their impact on eight classification datasets.
Our results indicate that weight re-mapping can enhance the convergence speed of the VQC.
arXiv Detail & Related papers (2023-06-09T09:42:21Z) - On Robust Numerical Solver for ODE via Self-Attention Mechanism [82.95493796476767]
We explore training efficient and robust AI-enhanced numerical solvers with a small data size by mitigating intrinsic noise disturbances.
We first analyze the ability of the self-attention mechanism to regulate noise in supervised learning and then propose a simple-yet-effective numerical solver, Attr, which introduces an additive self-attention mechanism to the numerical solution of differential equations.
arXiv Detail & Related papers (2023-02-05T01:39:21Z) - BiTAT: Neural Network Binarization with Task-dependent Aggregated
Transformation [116.26521375592759]
Quantization aims to transform high-precision weights and activations of a given neural network into low-precision weights/activations for reduced memory usage and computation.
Extreme quantization (1-bit weight/1-bit activations) of compactly-designed backbone architectures results in severe performance degeneration.
This paper proposes a novel Quantization-Aware Training (QAT) method that can effectively alleviate performance degeneration.
arXiv Detail & Related papers (2022-07-04T13:25:49Z) - A Comprehensive Survey on Model Quantization for Deep Neural Networks in
Image Classification [0.0]
A promising approach is quantization, in which the full-precision values are stored in low bit-width precision.
We present a comprehensive survey of quantization concepts and methods, with a focus on image classification.
We explain the replacement of floating-point operations with low-cost bitwise operations in a quantized DNN and the sensitivity of different layers in quantization.
arXiv Detail & Related papers (2022-05-14T15:08:32Z) - Adaptive Discrete Communication Bottlenecks with Dynamic Vector
Quantization [76.68866368409216]
We propose learning to dynamically select discretization tightness conditioned on inputs.
We show that dynamically varying tightness in communication bottlenecks can improve model performance on visual reasoning and reinforcement learning tasks.
arXiv Detail & Related papers (2022-02-02T23:54:26Z) - In-Hindsight Quantization Range Estimation for Quantized Training [5.65658124285176]
We propose a simple alternative to dynamic quantization, in-hindsight range estimation, that uses the quantization ranges estimated on previous iterations to quantize the present.
Our approach enables fast static quantization of gradients and activations while requiring only minimal hardware support from the neural network accelerator.
It is intended as a drop-in replacement for estimating quantization ranges and can be used in conjunction with other advances in quantized training.
arXiv Detail & Related papers (2021-05-10T10:25:28Z) - Where Should We Begin? A Low-Level Exploration of Weight Initialization
Impact on Quantized Behaviour of Deep Neural Networks [93.4221402881609]
We present an in-depth, fine-grained ablation study of the effect of different weights initialization on the final distributions of weights and activations of different CNN architectures.
To our best knowledge, we are the first to perform such a low-level, in-depth quantitative analysis of weights initialization and its effect on quantized behaviour.
arXiv Detail & Related papers (2020-11-30T06:54:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.