Fractional Skipping: Towards Finer-Grained Dynamic CNN Inference
- URL: http://arxiv.org/abs/2001.00705v1
- Date: Fri, 3 Jan 2020 03:12:17 GMT
- Title: Fractional Skipping: Towards Finer-Grained Dynamic CNN Inference
- Authors: Jianghao Shen, Yonggan Fu, Yue Wang, Pengfei Xu, Zhangyang Wang,
Yingyan Lin
- Abstract summary: We propose a Dynamic Fractional Skipping (DFS) framework for deep networks.
DFS hypothesizes layer-wise quantization (to different bitwidths) as intermediate "soft" choices to be made between fully utilizing and skipping a layer.
It exploits a layer's expressive power during input-adaptive inference, enabling finer-grained accuracy-computational cost trade-offs.
- Score: 82.96877371742532
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While increasingly deep networks are still in general desired for achieving
state-of-the-art performance, for many specific inputs a simpler network might
already suffice. Existing works exploited this observation by learning to skip
convolutional layers in an input-dependent manner. However, we argue their
binary decision scheme, i.e., either fully executing or completely bypassing
one layer for a specific input, can be enhanced by introducing finer-grained,
"softer" decisions. We therefore propose a Dynamic Fractional Skipping (DFS)
framework. The core idea of DFS is to hypothesize layer-wise quantization (to
different bitwidths) as intermediate "soft" choices to be made between fully
utilizing and skipping a layer. For each input, DFS dynamically assigns a
bitwidth to both weights and activations of each layer, where fully executing
and skipping could be viewed as two "extremes" (i.e., full bitwidth and zero
bitwidth). In this way, DFS can "fractionally" exploit a layer's expressive
power during input-adaptive inference, enabling finer-grained
accuracy-computational cost trade-offs. It presents a unified view to link
input-adaptive layer skipping and input-adaptive hybrid quantization. Extensive
experimental results demonstrate the superior tradeoff between computational
cost and model expressive power (accuracy) achieved by DFS. More visualizations
also indicate a smooth and consistent transition in the DFS behaviors,
especially the learned choices between layer skipping and different
quantizations when the total computational budgets vary, validating our
hypothesis that layer quantization could be viewed as intermediate variants of
layer skipping. Our source code and supplementary material are available at
\link{https://github.com/Torment123/DFS}.
Related papers
- Dispatch-Aware Deep Neural Network for Optimal Transmission Switching: Toward Real-Time and Feasibility Guaranteed Operation [3.3894236476098185]
We propose a dispatch-aware deep neural network (DA-DNN) that accelerates DC-OTS without relying on pre-solved labels.<n>DA-DNN predicts line states and passes them through a differentiable DC-OPF layer.<n>It produces a provably feasible topology and dispatch pair in the same time as solving the DCOPF.
arXiv Detail & Related papers (2025-07-23T04:39:29Z) - Scale Equalization for Multi-Level Feature Fusion [8.541075075344438]
We find that multi-level features from parallel branches are on different scales.
The scale disequilibrium is a universal and unwanted flaw that leads to detrimental gradient descent.
We propose injecting scale equalizers to achieve scale equilibrium across multi-level features after bilinear upsampling.
arXiv Detail & Related papers (2024-02-02T05:25:51Z) - Kernel function impact on convolutional neural networks [10.98068123467568]
We study the usage of kernel functions at the different layers in a convolutional neural network.
We show how one can effectively leverage kernel functions, by introducing a more distortion aware pooling layers.
We propose Kernelized Dense Layers (KDL), which replace fully-connected layers.
arXiv Detail & Related papers (2023-02-20T19:57:01Z) - Learnable Polyphase Sampling for Shift Invariant and Equivariant
Convolutional Networks [120.78155051439076]
LPS can be trained end-to-end from data and generalizes existing handcrafted downsampling layers.
We evaluate LPS on image classification and semantic segmentation.
arXiv Detail & Related papers (2022-10-14T17:59:55Z) - Scale Attention for Learning Deep Face Representation: A Study Against
Visual Scale Variation [69.45176408639483]
We reform the conv layer by resorting to the scale-space theory.
We build a novel style named SCale AttentioN Conv Neural Network (textbfSCAN-CNN)
As a single-shot scheme, the inference is more efficient than multi-shot fusion.
arXiv Detail & Related papers (2022-09-19T06:35:04Z) - Arbitrary Bit-width Network: A Joint Layer-Wise Quantization and
Adaptive Inference Approach [38.03309300383544]
We propose to feed different data samples with varying quantization schemes to achieve a data-dependent dynamic inference, at a fine-grained layer level.
We present the Arbitrary Bit-width Network (ABN), where the bit-widths of a single deep network can change at runtime for different data samples, with a layer-wise granularity.
On ImageNet classification, we achieve 1.1% top1 accuracy improvement while saving 36.2% BitOps.
arXiv Detail & Related papers (2022-04-21T09:36:43Z) - Learning strides in convolutional neural networks [34.20666933112202]
This work introduces DiffStride, the first downsampling layer with learnable strides.
Experiments on audio and image classification show the generality and effectiveness of our solution.
arXiv Detail & Related papers (2022-02-03T16:03:36Z) - Fire Together Wire Together: A Dynamic Pruning Approach with
Self-Supervised Mask Prediction [12.86325214182021]
Dynamic model pruning is a recent direction that allows for the inference of a different sub-network for each input sample during deployment.
Current dynamic methods rely on learning a continuous channel gating through regularization by inducing sparsity loss.
We show experiments on several neural architectures, such as VGG, ResNet, and MobileNet on CIFAR and ImageNet.
arXiv Detail & Related papers (2021-10-15T17:39:53Z) - Dynamic Probabilistic Pruning: A general framework for
hardware-constrained pruning at different granularities [80.06422693778141]
We propose a flexible new pruning mechanism that facilitates pruning at different granularities (weights, kernels, filters/feature maps)
We refer to this algorithm as Dynamic Probabilistic Pruning (DPP)
We show that DPP achieves competitive compression rates and classification accuracy when pruning common deep learning models trained on different benchmark datasets for image classification.
arXiv Detail & Related papers (2021-05-26T17:01:52Z) - DHP: Differentiable Meta Pruning via HyperNetworks [158.69345612783198]
This paper introduces a differentiable pruning method via hypernetworks for automatic network pruning.
Latent vectors control the output channels of the convolutional layers in the backbone network and act as a handle for the pruning of the layers.
Experiments are conducted on various networks for image classification, single image super-resolution, and denoising.
arXiv Detail & Related papers (2020-03-30T17:59:18Z) - Embedding Propagation: Smoother Manifold for Few-Shot Classification [131.81692677836202]
We propose to use embedding propagation as an unsupervised non-parametric regularizer for manifold smoothing in few-shot classification.
We empirically show that embedding propagation yields a smoother embedding manifold.
We show that embedding propagation consistently improves the accuracy of the models in multiple semi-supervised learning scenarios by up to 16% points.
arXiv Detail & Related papers (2020-03-09T13:51:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.