WoodFisher: Efficient Second-Order Approximation for Neural Network
Compression
- URL: http://arxiv.org/abs/2004.14340v5
- Date: Wed, 25 Nov 2020 17:31:09 GMT
- Title: WoodFisher: Efficient Second-Order Approximation for Neural Network
Compression
- Authors: Sidak Pal Singh, Dan Alistarh
- Abstract summary: We develop a method to compute a faithful and efficient estimate of the inverse Hessian.
Our main application is to neural network compression.
We show how our method can be extended to take into account first-order information.
- Score: 35.45199662813043
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Second-order information, in the form of Hessian- or Inverse-Hessian-vector
products, is a fundamental tool for solving optimization problems. Recently,
there has been significant interest in utilizing this information in the
context of deep neural networks; however, relatively little is known about the
quality of existing approximations in this context. Our work examines this
question, identifies issues with existing approaches, and proposes a method
called WoodFisher to compute a faithful and efficient estimate of the inverse
Hessian.
Our main application is to neural network compression, where we build on the
classic Optimal Brain Damage/Surgeon framework. We demonstrate that WoodFisher
significantly outperforms popular state-of-the-art methods for one-shot
pruning. Further, even when iterative, gradual pruning is considered, our
method results in a gain in test accuracy over the state-of-the-art approaches,
for pruning popular neural networks (like ResNet-50, MobileNetV1) trained on
standard image classification datasets such as ImageNet ILSVRC. We examine how
our method can be extended to take into account first-order information, as
well as illustrate its ability to automatically set layer-wise pruning
thresholds and perform compression in the limited-data regime. The code is
available at the following link, https://github.com/IST-DASLab/WoodFisher.
Related papers
- PREMAP: A Unifying PREiMage APproximation Framework for Neural Networks [30.701422594374456]
We present a framework for preimage abstraction that produces under- and over-approximations of any polyhedral output set.
We evaluate our method on a range of tasks, demonstrating significant improvement in efficiency and scalability to high-input-dimensional image classification tasks.
arXiv Detail & Related papers (2024-08-17T17:24:47Z) - Concurrent Training and Layer Pruning of Deep Neural Networks [0.0]
We propose an algorithm capable of identifying and eliminating irrelevant layers of a neural network during the early stages of training.
We employ a structure using residual connections around nonlinear network sections that allow the flow of information through the network once a nonlinear section is pruned.
arXiv Detail & Related papers (2024-06-06T23:19:57Z) - Towards Efficient Verification of Quantized Neural Networks [9.352320240912109]
Quantization replaces floating point arithmetic with integer arithmetic in deep neural network models.
We show how efficiency can be improved by utilizing gradient-based search methods and also bound-propagation techniques.
arXiv Detail & Related papers (2023-12-20T00:43:13Z) - Deep Multi-Threshold Spiking-UNet for Image Processing [51.88730892920031]
This paper introduces the novel concept of Spiking-UNet for image processing, which combines the power of Spiking Neural Networks (SNNs) with the U-Net architecture.
To achieve an efficient Spiking-UNet, we face two primary challenges: ensuring high-fidelity information propagation through the network via spikes and formulating an effective training strategy.
Experimental results show that, on image segmentation and denoising, our Spiking-UNet achieves comparable performance to its non-spiking counterpart.
arXiv Detail & Related papers (2023-07-20T16:00:19Z) - The Cascaded Forward Algorithm for Neural Network Training [61.06444586991505]
We propose a new learning framework for neural networks, namely Cascaded Forward (CaFo) algorithm, which does not rely on BP optimization as that in FF.
Unlike FF, our framework directly outputs label distributions at each cascaded block, which does not require generation of additional negative samples.
In our framework each block can be trained independently, so it can be easily deployed into parallel acceleration systems.
arXiv Detail & Related papers (2023-03-17T02:01:11Z) - Towards Better Out-of-Distribution Generalization of Neural Algorithmic
Reasoning Tasks [51.8723187709964]
We study the OOD generalization of neural algorithmic reasoning tasks.
The goal is to learn an algorithm from input-output pairs using deep neural networks.
arXiv Detail & Related papers (2022-11-01T18:33:20Z) - Mixed-Privacy Forgetting in Deep Networks [114.3840147070712]
We show that the influence of a subset of the training samples can be removed from the weights of a network trained on large-scale image classification tasks.
Inspired by real-world applications of forgetting techniques, we introduce a novel notion of forgetting in mixed-privacy setting.
We show that our method allows forgetting without having to trade off the model accuracy.
arXiv Detail & Related papers (2020-12-24T19:34:56Z) - Neural Pruning via Growing Regularization [82.9322109208353]
We extend regularization to tackle two central problems of pruning: pruning schedule and weight importance scoring.
Specifically, we propose an L2 regularization variant with rising penalty factors and show it can bring significant accuracy gains.
The proposed algorithms are easy to implement and scalable to large datasets and networks in both structured and unstructured pruning.
arXiv Detail & Related papers (2020-12-16T20:16:28Z) - AutoPruning for Deep Neural Network with Dynamic Channel Masking [28.018077874687343]
We propose a learning based auto pruning algorithm for deep neural network.
A two objectives' problem that aims for the the weights and the best channels for each layer is first formulated.
An alternative optimization approach is then proposed to derive the optimal channel numbers and weights simultaneously.
arXiv Detail & Related papers (2020-10-22T20:12:46Z) - MSE-Optimal Neural Network Initialization via Layer Fusion [68.72356718879428]
Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks.
The use of gradient combined nonvolutionity renders learning susceptible to novel problems.
We propose fusing neighboring layers of deeper networks that are trained with random variables.
arXiv Detail & Related papers (2020-01-28T18:25:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.