Related papers: Neuron-based Pruning of Deep Neural Networks with Better Generalization using Kronecker Factored Curvature Approximation

Neuron-based Pruning of Deep Neural Networks with Better Generalization using Kronecker Factored Curvature Approximation

URL: http://arxiv.org/abs/2111.08577v1
Date: Tue, 16 Nov 2021 15:55:59 GMT
Title: Neuron-based Pruning of Deep Neural Networks with Better Generalization using Kronecker Factored Curvature Approximation
Authors: Abdolghani Ebrahimi, Diego Klabjan
Abstract summary: The proposed algorithm directs the parameters of the compressed model toward a flatter solution by exploring the spectral radius of Hessian. Our result shows that it improves the state-of-the-art results on neuron compression. The method is able to achieve very small networks with small accuracy across different neural network models.
Score: 18.224344440110862
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Existing methods of pruning deep neural networks focus on removing unnecessary parameters of the trained network and fine tuning the model afterwards to find a good solution that recovers the initial performance of the trained model. Unlike other works, our method pays special attention to the quality of the solution in the compressed model and inference computation time by pruning neurons. The proposed algorithm directs the parameters of the compressed model toward a flatter solution by exploring the spectral radius of Hessian which results in better generalization on unseen data. Moreover, the method does not work with a pre-trained network and performs training and pruning simultaneously. Our result shows that it improves the state-of-the-art results on neuron compression. The method is able to achieve very small networks with small accuracy degradation across different neural network models.

Related papers

Chebyshev Spectral Neural Networks for Solving Partial Differential Equations [0.0]
The study uses a feedforward neural network model and error backpropagation principles, utilizing automatic differentiation (AD) to compute the loss function. The numerical efficiency and accuracy of the CSNN model are investigated through testing on elliptic partial differential equations, and it is compared with the well-known Physics-Informed Neural Network(PINN) method.
arXiv Detail & Related papers (2024-06-06T05:31:45Z)
Improving Generalization of Deep Neural Networks by Optimum Shifting [33.092571599896814]
We propose a novel method called emphoptimum shifting, which changes the parameters of a neural network from a sharp minimum to a flatter one. Our method is based on the observation that when the input and output of a neural network are fixed, the matrix multiplications within the network can be treated as systems of under-determined linear equations.
arXiv Detail & Related papers (2024-05-23T02:31:55Z)
Graph Neural Networks for Learning Equivariant Representations of Neural Networks [55.04145324152541]
We propose to represent neural networks as computational graphs of parameters. Our approach enables a single model to encode neural computational graphs with diverse architectures. We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
arXiv Detail & Related papers (2024-03-18T18:01:01Z)
Using Cooperative Game Theory to Prune Neural Networks [7.3959659158152355]
We show how solution concepts from cooperative game theory can be used to tackle the problem of pruning neural networks. We introduce a method called Game Theory Assisted Pruning (GTAP), which reduces the neural network's size while preserving its predictive accuracy.
arXiv Detail & Related papers (2023-11-17T11:48:10Z)
Learning to Learn with Generative Models of Neural Network Checkpoints [71.06722933442956]
We construct a dataset of neural network checkpoints and train a generative model on the parameters. We find that our approach successfully generates parameters for a wide range of loss prompts. We apply our method to different neural network architectures and tasks in supervised and reinforcement learning.
arXiv Detail & Related papers (2022-09-26T17:59:58Z)
Acceleration techniques for optimization over trained neural network ensembles [1.0323063834827415]
We study optimization problems where the objective function is modeled through feedforward neural networks with rectified linear unit activation. We present a mixed-integer linear program based on existing popular big-$M$ formulations for optimizing over a single neural network.
arXiv Detail & Related papers (2021-12-13T20:50:54Z)
Non-Gradient Manifold Neural Network [79.44066256794187]
Deep neural network (DNN) generally takes thousands of iterations to optimize via gradient descent. We propose a novel manifold neural network based on non-gradient optimization.
arXiv Detail & Related papers (2021-06-15T06:39:13Z)
Measurement error models: from nonparametric methods to deep neural networks [3.1798318618973362]
We propose an efficient neural network design for estimating measurement error models. We use a fully connected feed-forward neural network to approximate the regression function $f(x)$. We conduct an extensive numerical study to compare the neural network approach with classical nonparametric methods.
arXiv Detail & Related papers (2020-07-15T06:05:37Z)
Compressive sensing with un-trained neural networks: Gradient descent finds the smoothest approximation [60.80172153614544]
Un-trained convolutional neural networks have emerged as highly successful tools for image recovery and restoration. We show that an un-trained convolutional neural network can approximately reconstruct signals and images that are sufficiently structured, from a near minimal number of random measurements.
arXiv Detail & Related papers (2020-05-07T15:57:25Z)
Beyond Dropout: Feature Map Distortion to Regularize Deep Neural Networks [107.77595511218429]
In this paper, we investigate the empirical Rademacher complexity related to intermediate layers of deep neural networks. We propose a feature distortion method (Disout) for addressing the aforementioned problem. The superiority of the proposed feature map distortion for producing deep neural network with higher testing performance is analyzed and demonstrated.
arXiv Detail & Related papers (2020-02-23T13:59:13Z)
MSE-Optimal Neural Network Initialization via Layer Fusion [68.72356718879428]
Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks. The use of gradient combined nonvolutionity renders learning susceptible to novel problems. We propose fusing neighboring layers of deeper networks that are trained with random variables.
arXiv Detail & Related papers (2020-01-28T18:25:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.