Optimizing Gradient-driven Criteria in Network Sparsity: Gradient is All
You Need
- URL: http://arxiv.org/abs/2201.12826v1
- Date: Sun, 30 Jan 2022 14:15:49 GMT
- Title: Optimizing Gradient-driven Criteria in Network Sparsity: Gradient is All
You Need
- Authors: Yuxin Zhang, Mingbao Lin, Mengzhao Chen, Zihan Xu, Fei Chao, Yunhan
Shen, Ke Li, Yongjian Wu, Rongrong Ji
- Abstract summary: gradient-driven sparsity is used to reduce network complexity.
Weight independence is contrary to the fact that weights are mutually influenced.
We propose to further optimize gradient-driven sparsity (OptG) by solving this independence paradox.
- Score: 74.58939318994746
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Network sparsity receives popularity mostly due to its capability to reduce
the network complexity. Extensive studies excavate gradient-driven sparsity.
Typically, these methods are constructed upon premise of weight independence,
which however, is contrary to the fact that weights are mutually influenced.
Thus, their performance remains to be improved. In this paper, we propose to
further optimize gradient-driven sparsity (OptG) by solving this independence
paradox. Our motive comes from the recent advances on supermask training which
shows that sparse subnetworks can be located in a randomly initialized network
by simply updating mask values without modifying any weight. We prove that
supermask training is to accumulate the weight gradients and can partly solve
the independence paradox. Consequently, OptG integrates supermask training into
gradient-driven sparsity, and a specialized mask optimizer is designed to solve
the independence paradox. Experiments show that OptG can well surpass many
existing state-of-the-art competitors. Our code is available at
\url{https://github.com/zyxxmu/OptG}.
Related papers
- Leaky ReLUs That Differ in Forward and Backward Pass Facilitate Activation Maximization in Deep Neural Networks [0.022344294014777957]
Activation (AM) strives to generate optimal input, revealing features that trigger high responses in trained deep neural networks.
We show that AM fails to produce optimal input for simple functions containing ReLUs or Leaky ReLUs.
We propose a solution based on using Leaky ReLUs with a high negative slope in the backward pass while keeping the original, usually zero, slope in the forward pass.
arXiv Detail & Related papers (2024-10-22T12:38:39Z) - MaxQ: Multi-Axis Query for N:M Sparsity Network [16.033223841268747]
MaxQ achieves consistent improvements across diverse CNN architectures in various computer vision tasks.
Experiments show that MaxQ can achieve 74.6% top-1 accuracy on ImageNet and improve by over 2.8% over the state-of-the-art.
arXiv Detail & Related papers (2023-12-12T08:28:29Z) - ELSA: Partial Weight Freezing for Overhead-Free Sparse Network
Deployment [95.04504362111314]
We present ELSA, a practical solution for creating deep networks that can easily be deployed at different levels of sparsity.
The core idea is to embed one or more sparse networks within a single dense network as a proper subset of the weights.
At prediction time, any sparse model can be extracted effortlessly simply be zeroing out weights according to a predefined mask.
arXiv Detail & Related papers (2023-12-11T22:44:05Z) - Parameter-Efficient Masking Networks [61.43995077575439]
Advanced network designs often contain a large number of repetitive structures (e.g., Transformer)
In this study, we are the first to investigate the representative potential of fixed random weights with limited unique values by learning masks.
It leads to a new paradigm for model compression to diminish the model size.
arXiv Detail & Related papers (2022-10-13T03:39:03Z) - Signing the Supermask: Keep, Hide, Invert [0.9475039534437331]
We present a novel approach that either drops a neural network's initial weights or inverts their respective sign.
We achieve a pruning rate of up to 99%, while still matching or exceeding the performance of various baseline and previous models.
arXiv Detail & Related papers (2022-01-31T17:17:37Z) - Automatic Sparse Connectivity Learning for Neural Networks [4.875787559251317]
Well-designed sparse neural networks have the potential to significantly reduce FLOPs and computational resources.
In this work, we propose a new automatic pruning method - Sparse Connectivity Learning.
Deep learning models trained by SCL outperform the SOTA human-designed and automatic pruning methods in sparsity, accuracy, and FLOPs reduction.
arXiv Detail & Related papers (2022-01-13T15:12:48Z) - Joint inference and input optimization in equilibrium networks [68.63726855991052]
deep equilibrium model is a class of models that foregoes traditional network depth and instead computes the output of a network by finding the fixed point of a single nonlinear layer.
We show that there is a natural synergy between these two settings.
We demonstrate this strategy on various tasks such as training generative models while optimizing over latent codes, training models for inverse problems like denoising and inpainting, adversarial training and gradient based meta-learning.
arXiv Detail & Related papers (2021-11-25T19:59:33Z) - Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch [75.69506249886622]
Sparsity in Deep Neural Networks (DNNs) has been widely studied to compress and accelerate the models on resource-constrained environments.
In this paper, we are the first to study training from scratch an N:M fine-grained structured sparse network.
arXiv Detail & Related papers (2021-02-08T05:55:47Z) - DHP: Differentiable Meta Pruning via HyperNetworks [158.69345612783198]
This paper introduces a differentiable pruning method via hypernetworks for automatic network pruning.
Latent vectors control the output channels of the convolutional layers in the backbone network and act as a handle for the pruning of the layers.
Experiments are conducted on various networks for image classification, single image super-resolution, and denoising.
arXiv Detail & Related papers (2020-03-30T17:59:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.