Do All MobileNets Quantize Poorly? Gaining Insights into the Effect of
Quantization on Depthwise Separable Convolutional Networks Through the Eyes
of Multi-scale Distributional Dynamics
- URL: http://arxiv.org/abs/2104.11849v1
- Date: Sat, 24 Apr 2021 01:28:29 GMT
- Title: Do All MobileNets Quantize Poorly? Gaining Insights into the Effect of
Quantization on Depthwise Separable Convolutional Networks Through the Eyes
of Multi-scale Distributional Dynamics
- Authors: Stone Yun and Alexander Wong
- Abstract summary: MobileNets are the go-to family of deep convolutional neural networks (CNN) for mobile.
They often have significant accuracy degradation under post-training quantization.
We study the multi-scale distributional dynamics of MobileNet-V1, a set of smaller DWSCNNs, and regular CNNs.
- Score: 93.4221402881609
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As the "Mobile AI" revolution continues to grow, so does the need to
understand the behaviour of edge-deployed deep neural networks. In particular,
MobileNets are the go-to family of deep convolutional neural networks (CNN) for
mobile. However, they often have significant accuracy degradation under
post-training quantization. While studies have introduced quantization-aware
training and other methods to tackle this challenge, there is limited
understanding into why MobileNets (and potentially depthwise-separable CNNs
(DWSCNN) in general) quantize so poorly compared to other CNN architectures.
Motivated to gain deeper insights into this phenomenon, we take a different
strategy and study the multi-scale distributional dynamics of MobileNet-V1, a
set of smaller DWSCNNs, and regular CNNs. Specifically, we investigate the
impact of quantization on the weight and activation distributional dynamics as
information propagates from layer to layer, as well as overall changes in
distributional dynamics at the network level. This fine-grained analysis
revealed significant dynamic range fluctuations and a "distributional mismatch"
between channelwise and layerwise distributions in DWSCNNs that lead to
increasing quantized degradation and distributional shift during information
propagation. Furthermore, analysis of the activation quantization errors show
that there is greater quantization error accumulation in DWSCNN compared to
regular CNNs. The hope is that such insights can lead to innovative strategies
for reducing such distributional dynamics changes and improve post-training
quantization for mobile.
Related papers
- RMP-Loss: Regularizing Membrane Potential Distribution for Spiking
Neural Networks [26.003193122060697]
Spiking Neural Networks (SNNs) as one of the biology-inspired models have received much attention recently.
We propose a regularizing membrane potential loss (RMP-Loss) to adjust the distribution which is directly related to quantization error to a range close to the spikes.
arXiv Detail & Related papers (2023-08-13T14:59:27Z) - Solving Large-scale Spatial Problems with Convolutional Neural Networks [88.31876586547848]
We employ transfer learning to improve training efficiency for large-scale spatial problems.
We propose that a convolutional neural network (CNN) can be trained on small windows of signals, but evaluated on arbitrarily large signals with little to no performance degradation.
arXiv Detail & Related papers (2023-06-14T01:24:42Z) - Centered Self-Attention Layers [89.21791761168032]
The self-attention mechanism in transformers and the message-passing mechanism in graph neural networks are repeatedly applied.
We show that this application inevitably leads to oversmoothing, i.e., to similar representations at the deeper layers.
We present a correction term to the aggregating operator of these mechanisms.
arXiv Detail & Related papers (2023-06-02T15:19:08Z) - Dynamics of Finite Width Kernel and Prediction Fluctuations in Mean
Field Neural Networks [47.73646927060476]
We analyze the dynamics of finite width effects in wide but finite feature learning neural networks.
Our results are non-perturbative in the strength of feature learning.
arXiv Detail & Related papers (2023-04-06T23:11:49Z) - Momentum Diminishes the Effect of Spectral Bias in Physics-Informed
Neural Networks [72.09574528342732]
Physics-informed neural network (PINN) algorithms have shown promising results in solving a wide range of problems involving partial differential equations (PDEs)
They often fail to converge to desirable solutions when the target function contains high-frequency features, due to a phenomenon known as spectral bias.
In the present work, we exploit neural tangent kernels (NTKs) to investigate the training dynamics of PINNs evolving under gradient descent with momentum (SGDM)
arXiv Detail & Related papers (2022-06-29T19:03:10Z) - Extended critical regimes of deep neural networks [0.0]
We show that heavy-tailed weights enable the emergence of an extended critical regime without fine-tuning parameters.
In this extended critical regime, DNNs exhibit rich and complex propagation dynamics across layers.
We provide a theoretical guide for the design of efficient neural architectures.
arXiv Detail & Related papers (2022-03-24T10:15:50Z) - Characterizing Learning Dynamics of Deep Neural Networks via Complex
Networks [1.0869257688521987]
Complex Network Theory (CNT) represents Deep Neural Networks (DNNs) as directed weighted graphs to study them as dynamical systems.
We introduce metrics for nodes/neurons and layers, namely Nodes Strength and Layers Fluctuation.
Our framework distills trends in the learning dynamics and separates low from high accurate networks.
arXiv Detail & Related papers (2021-10-06T10:03:32Z) - Quantized convolutional neural networks through the lens of partial
differential equations [6.88204255655161]
Quantization of Convolutional Neural Networks (CNNs) is a common approach to ease the computational burden involved in the deployment of CNNs.
In this work, we explore ways to improve quantized CNNs using PDE-based perspective and analysis.
arXiv Detail & Related papers (2021-08-31T22:18:52Z) - Where Should We Begin? A Low-Level Exploration of Weight Initialization
Impact on Quantized Behaviour of Deep Neural Networks [93.4221402881609]
We present an in-depth, fine-grained ablation study of the effect of different weights initialization on the final distributions of weights and activations of different CNN architectures.
To our best knowledge, we are the first to perform such a low-level, in-depth quantitative analysis of weights initialization and its effect on quantized behaviour.
arXiv Detail & Related papers (2020-11-30T06:54:28Z) - Curriculum By Smoothing [52.08553521577014]
Convolutional Neural Networks (CNNs) have shown impressive performance in computer vision tasks such as image classification, detection, and segmentation.
We propose an elegant curriculum based scheme that smoothes the feature embedding of a CNN using anti-aliasing or low-pass filters.
As the amount of information in the feature maps increases during training, the network is able to progressively learn better representations of the data.
arXiv Detail & Related papers (2020-03-03T07:27:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.