Exploring Gradient Flow Based Saliency for DNN Model Compression
- URL: http://arxiv.org/abs/2110.12477v1
- Date: Sun, 24 Oct 2021 16:09:40 GMT
- Title: Exploring Gradient Flow Based Saliency for DNN Model Compression
- Authors: Xinyu Liu, Baopu Li, Zhen Chen, Yixuan Yuan
- Abstract summary: Model pruning aims to reduce the deep neural network (DNN) model size or computational overhead.
Traditional model pruning methods that evaluates the channel significance for DNN pay too much attention to the local analysis of each channel.
We propose a new model pruning method from a new perspective of flow in this paper.
- Score: 21.993801817422572
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model pruning aims to reduce the deep neural network (DNN) model size or
computational overhead. Traditional model pruning methods such as l-1 pruning
that evaluates the channel significance for DNN pay too much attention to the
local analysis of each channel and make use of the magnitude of the entire
feature while ignoring its relevance to the batch normalization (BN) and ReLU
layer after each convolutional operation. To overcome these problems, we
propose a new model pruning method from a new perspective of gradient flow in
this paper. Specifically, we first theoretically analyze the channel's
influence based on Taylor expansion by integrating the effects of BN layer and
ReLU activation function. Then, the incorporation of the first-order Talyor
polynomial of the scaling parameter and the shifting parameter in the BN layer
is suggested to effectively indicate the significance of a channel in a DNN.
Comprehensive experiments on both image classification and image denoising
tasks demonstrate the superiority of the proposed novel theory and scheme. Code
is available at https://github.com/CityU-AIM-Group/GFBS.
Related papers
- On the Initialization of Graph Neural Networks [10.153841274798829]
We analyze the variance of forward and backward propagation across Graph Neural Networks layers.
We propose a new method for Variance Instability Reduction within GNN Optimization (Virgo)
We conduct comprehensive experiments on 15 datasets to show that Virgo can lead to superior model performance.
arXiv Detail & Related papers (2023-12-05T09:55:49Z) - Over-parameterised Shallow Neural Networks with Asymmetrical Node
Scaling: Global Convergence Guarantees and Feature Learning [23.47570704524471]
We consider optimisation of large and shallow neural networks via gradient flow, where the output of each hidden node is scaled by some positive parameter.
We prove that, for large neural networks, with high probability, gradient flow converges to a global minimum AND can learn features, unlike in the NTK regime.
arXiv Detail & Related papers (2023-02-02T10:40:06Z) - Interpretations Steered Network Pruning via Amortized Inferred Saliency
Maps [85.49020931411825]
Convolutional Neural Networks (CNNs) compression is crucial to deploying these models in edge devices with limited resources.
We propose to address the channel pruning problem from a novel perspective by leveraging the interpretations of a model to steer the pruning process.
We tackle this challenge by introducing a selector model that predicts real-time smooth saliency masks for pruned models.
arXiv Detail & Related papers (2022-09-07T01:12:11Z) - DDPG-Driven Deep-Unfolding with Adaptive Depth for Channel Estimation
with Sparse Bayesian Learning [23.158142411929322]
We first develop a framework of deep deterministic policy gradient (DDPG)-driven deep-unfolding with adaptive depth for different inputs.
Specifically, the framework is employed to deal with the channel estimation problem in massive multiple-input multiple-output systems.
arXiv Detail & Related papers (2022-01-20T22:35:42Z) - SkipNode: On Alleviating Performance Degradation for Deep Graph
Convolutional Networks [84.30721808557871]
We conduct theoretical and experimental analysis to explore the fundamental causes of performance degradation in deep GCNs.
We propose a simple yet effective plug-and-play module, Skipnode, to overcome the performance degradation of deep GCNs.
arXiv Detail & Related papers (2021-12-22T02:18:31Z) - Channel-Wise Early Stopping without a Validation Set via NNK Polytope
Interpolation [36.479195100553085]
Convolutional neural networks (ConvNets) comprise high-dimensional feature spaces formed by the aggregation of multiple channels.
We present channel-wise DeepNNK, a novel generalization estimate based on non-dimensional kernel regression (NNK) graphs.
arXiv Detail & Related papers (2021-07-27T17:33:30Z) - Overcoming Catastrophic Forgetting in Graph Neural Networks [50.900153089330175]
Catastrophic forgetting refers to the tendency that a neural network "forgets" the previous learned knowledge upon learning new tasks.
We propose a novel scheme dedicated to overcoming this problem and hence strengthen continual learning in graph neural networks (GNNs)
At the heart of our approach is a generic module, termed as topology-aware weight preserving(TWP)
arXiv Detail & Related papers (2020-12-10T22:30:25Z) - AutoPruning for Deep Neural Network with Dynamic Channel Masking [28.018077874687343]
We propose a learning based auto pruning algorithm for deep neural network.
A two objectives' problem that aims for the the weights and the best channels for each layer is first formulated.
An alternative optimization approach is then proposed to derive the optimal channel numbers and weights simultaneously.
arXiv Detail & Related papers (2020-10-22T20:12:46Z) - Modeling from Features: a Mean-field Framework for Over-parameterized
Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs)
In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit.
We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z) - Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix.
Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z) - Network Adjustment: Channel Search Guided by FLOPs Utilization Ratio [101.84651388520584]
This paper presents a new framework named network adjustment, which considers network accuracy as a function of FLOPs.
Experiments on standard image classification datasets and a wide range of base networks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2020-04-06T15:51:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.