Related papers: LOss-Based SensiTivity rEgulaRization: towards deep sparse neural networks

LOss-Based SensiTivity rEgulaRization: towards deep sparse neural networks

URL: http://arxiv.org/abs/2011.09905v1
Date: Mon, 16 Nov 2020 18:55:34 GMT
Title: LOss-Based SensiTivity rEgulaRization: towards deep sparse neural networks
Authors: Enzo Tartaglione, Andrea Bragagnolo, Attilio Fiandrotti and Marco Grangetto
Abstract summary: LOss-Based SensiTivity rEgulaRization is a method for training neural networks with a sparse topology. Our method allows to train a network from scratch, i.e. without preliminary learning or rewinding.
Score: 15.373764014931792
License: http://creativecommons.org/licenses/by/4.0/
Abstract: LOBSTER (LOss-Based SensiTivity rEgulaRization) is a method for training neural networks having a sparse topology. Let the sensitivity of a network parameter be the variation of the loss function with respect to the variation of the parameter. Parameters with low sensitivity, i.e. having little impact on the loss when perturbed, are shrunk and then pruned to sparsify the network. Our method allows to train a network from scratch, i.e. without preliminary learning or rewinding. Experiments on multiple architectures and datasets show competitive compression ratios with minimal computational overhead.

Related papers

Sensitivity-Based Layer Insertion for Residual and Feedforward Neural Networks [0.3831327965422187]
Training of neural networks requires tedious and often manual tuning of the network architecture. We propose a systematic method to insert new layers during the training process, which eliminates the need to choose a fixed network size before training.
arXiv Detail & Related papers (2023-11-27T16:44:13Z)
Benign Overfitting for Two-layer ReLU Convolutional Neural Networks [60.19739010031304]
We establish algorithm-dependent risk bounds for learning two-layer ReLU convolutional neural networks with label-flipping noise. We show that, under mild conditions, the neural network trained by gradient descent can achieve near-zero training loss and Bayes optimal test risk.
arXiv Detail & Related papers (2023-03-07T18:59:38Z)
Globally Optimal Training of Neural Networks with Threshold Activation Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations. We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z)
Learning to Learn with Generative Models of Neural Network Checkpoints [71.06722933442956]
We construct a dataset of neural network checkpoints and train a generative model on the parameters. We find that our approach successfully generates parameters for a wide range of loss prompts. We apply our method to different neural network architectures and tasks in supervised and reinforcement learning.
arXiv Detail & Related papers (2022-09-26T17:59:58Z)
SAR Despeckling Using Overcomplete Convolutional Networks [53.99620005035804]
despeckling is an important problem in remote sensing as speckle degrades SAR images. Recent studies show that convolutional neural networks(CNNs) outperform classical despeckling methods. This study employs an overcomplete CNN architecture to focus on learning low-level features by restricting the receptive field. We show that the proposed network improves despeckling performance compared to recent despeckling methods on synthetic and real SAR images.
arXiv Detail & Related papers (2022-05-31T15:55:37Z)
Stochastic Neural Networks with Infinite Width are Deterministic [7.07065078444922]
We study neural networks, a main type of neural network in use. We prove that as the width of an optimized neural network tends to infinity, its predictive variance on the training set decreases to zero.
arXiv Detail & Related papers (2022-01-30T04:52:31Z)
$S^3$: Sign-Sparse-Shift Reparametrization for Effective Training of Low-bit Shift Networks [41.54155265996312]
Shift neural networks reduce complexity by removing expensive multiplication operations and quantizing continuous weights into low-bit discrete values. Our proposed training method pushes the boundaries of shift neural networks and shows 3-bit shift networks out-performs their full-precision counterparts in terms of top-1 accuracy on ImageNet.
arXiv Detail & Related papers (2021-07-07T19:33:02Z)
Spline parameterization of neural network controls for deep learning [0.0]
We choose a fixed number of B-spline basis functions whose coefficients are the trainable parameters of the neural network. We numerically show that the spline-based neural network increases robustness of the learning problem towards hyper parameters.
arXiv Detail & Related papers (2021-02-27T19:35:45Z)
SeReNe: Sensitivity based Regularization of Neurons for Structured Sparsity in Neural Networks [13.60023740064471]
SeReNe is a method for learning sparse topologies with a structure. We define the sensitivity of a neuron as the variation of the network output. By including the neuron sensitivity in the cost function as a regularization term, we areable to prune neurons with low sensitivity.
arXiv Detail & Related papers (2021-02-07T10:53:30Z)
Topological obstructions in neural networks learning [67.8848058842671]
We study global properties of the loss gradient function flow. We use topological data analysis of the loss function and its Morse complex to relate local behavior along gradient trajectories with global properties of the loss surface.
arXiv Detail & Related papers (2020-12-31T18:53:25Z)
Beyond Dropout: Feature Map Distortion to Regularize Deep Neural Networks [107.77595511218429]
In this paper, we investigate the empirical Rademacher complexity related to intermediate layers of deep neural networks. We propose a feature distortion method (Disout) for addressing the aforementioned problem. The superiority of the proposed feature map distortion for producing deep neural network with higher testing performance is analyzed and demonstrated.
arXiv Detail & Related papers (2020-02-23T13:59:13Z)
Barcodes as Summary of Loss Function Topology [65.3479573549873]
We show that increase of the neural network's depth and width lowers the barcodes of local minima. This has some natural implications for the neural network's learning and for its generalization properties.
arXiv Detail & Related papers (2019-11-29T19:22:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.