Related papers: Regularization-based Pruning of Irrelevant Weights in Deep Neural Architectures

Regularization-based Pruning of Irrelevant Weights in Deep Neural Architectures

URL: http://arxiv.org/abs/2204.04977v1
Date: Mon, 11 Apr 2022 09:44:16 GMT
Title: Regularization-based Pruning of Irrelevant Weights in Deep Neural Architectures
Authors: Giovanni Bonetta, Matteo Ribero and Rossella Cancelliere
Abstract summary: We propose a method for learning sparse neural topologies via a regularization technique which identifies non relevant weights and selectively shrinks their norm. We tested the proposed technique on different image classification and Natural language generation tasks, obtaining results on par or better then competitors in terms of sparsity and metrics.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep neural networks exploiting millions of parameters are nowadays the norm in deep learning applications. This is a potential issue because of the great amount of computational resources needed for training, and of the possible loss of generalization performance of overparametrized networks. We propose in this paper a method for learning sparse neural topologies via a regularization technique which identifies non relevant weights and selectively shrinks their norm, while performing a classic update for relevant ones. This technique, which is an improvement of classical weight decay, is based on the definition of a regularization term which can be added to any loss functional regardless of its form, resulting in a unified general framework exploitable in many different contexts. The actual elimination of parameters identified as irrelevant is handled by an iterative pruning algorithm. We tested the proposed technique on different image classification and Natural language generation tasks, obtaining results on par or better then competitors in terms of sparsity and metrics, while achieving strong models compression.

Related papers

Convolutional Dictionary Learning by End-To-End Training of Iterative Neural Networks [3.6280929178575994]
In this work, we construct an INN which can be used as a supervised and physics-informed online convolutional dictionary learning algorithm. We show that the proposed INN improves over two conventional model-agnostic training methods and yields competitive results also compared to a deep INN.
arXiv Detail & Related papers (2022-06-09T12:15:38Z)
Learning Non-Vacuous Generalization Bounds from Optimization [8.294831479902658]
We present a simple yet non-vacuous generalization bound from the optimization perspective. We achieve this goal by leveraging that the hypothesis set accessed by gradient algorithms is essentially fractal-like. Numerical studies demonstrate that our approach is able to yield plausible generalization guarantees for modern neural networks.
arXiv Detail & Related papers (2022-06-09T08:59:46Z)
Training Thinner and Deeper Neural Networks: Jumpstart Regularization [2.8348950186890467]
We use regularization to prevent neurons from dying or becoming linear. In comparison to conventional training, we obtain neural networks that are thinner, deeper, and - most importantly - more parameter-efficient.
arXiv Detail & Related papers (2022-01-30T12:11:24Z)
Subquadratic Overparameterization for Shallow Neural Networks [60.721751363271146]
We provide an analytical framework that allows us to adopt standard neural training strategies. We achieve the desiderata viaak-Lojasiewicz, smoothness, and standard assumptions.
arXiv Detail & Related papers (2021-11-02T20:24:01Z)
Understanding the Generalization of Adam in Learning Neural Networks with Proper Regularization [118.50301177912381]
We show that Adam can converge to different solutions of the objective with provably different errors, even with weight decay globalization. We show that if convex, and the weight decay regularization is employed, any optimization algorithms including Adam will converge to the same solution.
arXiv Detail & Related papers (2021-08-25T17:58:21Z)
Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks. This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z)
Analytically Tractable Inference in Deep Neural Networks [0.0]
Tractable Approximate Inference (TAGI) algorithm was shown to be a viable and scalable alternative to backpropagation for shallow fully-connected neural networks. We are demonstrating how TAGI matches or exceeds the performance of backpropagation, for training classic deep neural network architectures.
arXiv Detail & Related papers (2021-03-09T14:51:34Z)
Compressive Sensing and Neural Networks from a Statistical Learning Perspective [4.561032960211816]
We present a generalization error analysis for a class of neural networks suitable for sparse reconstruction from few linear measurements. Under realistic conditions, the generalization error scales only logarithmically in the number of layers, and at most linear in number of measurements.
arXiv Detail & Related papers (2020-10-29T15:05:43Z)
AdaS: Adaptive Scheduling of Stochastic Gradients [50.80697760166045]
We introduce the notions of textit"knowledge gain" and textit"mapping condition" and propose a new algorithm called Adaptive Scheduling (AdaS) Experimentation reveals that, using the derived metrics, AdaS exhibits: (a) faster convergence and superior generalization over existing adaptive learning methods; and (b) lack of dependence on a validation set to determine when to stop training.
arXiv Detail & Related papers (2020-06-11T16:36:31Z)
Beyond Dropout: Feature Map Distortion to Regularize Deep Neural Networks [107.77595511218429]
In this paper, we investigate the empirical Rademacher complexity related to intermediate layers of deep neural networks. We propose a feature distortion method (Disout) for addressing the aforementioned problem. The superiority of the proposed feature map distortion for producing deep neural network with higher testing performance is analyzed and demonstrated.
arXiv Detail & Related papers (2020-02-23T13:59:13Z)
Distance-Based Regularisation of Deep Networks for Fine-Tuning [116.71288796019809]
We develop an algorithm that constrains a hypothesis class to a small sphere centred on the initial pre-trained weights. Empirical evaluation shows that our algorithm works well, corroborating our theoretical results.
arXiv Detail & Related papers (2020-02-19T16:00:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.