Related papers: Learning k-Level Sparse Neural Networks Using a New Generalized Weighted Group Sparse Envelope Regularization

Learning k-Level Sparse Neural Networks Using a New Generalized Weighted Group Sparse Envelope Regularization

URL: http://arxiv.org/abs/2212.12921v3
Date: Tue, 3 Oct 2023 11:56:32 GMT
Title: Learning k-Level Sparse Neural Networks Using a New Generalized Weighted Group Sparse Envelope Regularization
Authors: Yehonathan Refael and Iftach Arbel and Wasim Huleihel
Abstract summary: We propose an efficient method for unstructured and structured neural networks during training. We use a novel sparse envelope function (SEF) used as a regularizer, termed itshape group envelope function (WGSEF) The method ensures a hardware-friendly structured sparsity a deep neural network (DNN) to efficiently accelerate the sparse's evaluation.
Score: 4.557963624437785
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose an efficient method to learn both unstructured and structured sparse neural networks during training, utilizing a novel generalization of the sparse envelope function (SEF) used as a regularizer, termed {\itshape{weighted group sparse envelope function}} (WGSEF). The WGSEF acts as a neuron group selector, which is leveraged to induce structured sparsity. The method ensures a hardware-friendly structured sparsity of a deep neural network (DNN) to efficiently accelerate the DNN's evaluation. Notably, the method is adaptable, letting any hardware specify group definitions, such as filters, channels, filter shapes, layer depths, a single parameter (unstructured), etc. Owing to the WGSEF's properties, the proposed method allows to a pre-define sparsity level that would be achieved at the training convergence, while maintaining negligible network accuracy degradation or even improvement in the case of redundant parameters. We introduce an efficient technique to calculate the exact value of the WGSEF along with its proximal operator in a worst-case complexity of $O(n)$, where $n$ is the total number of group variables. In addition, we propose a proximal-gradient-based optimization method to train the model, that is, the non-convex minimization of the sum of the neural network loss and the WGSEF. Finally, we conduct an experiment and illustrate the efficiency of our proposed technique in terms of the completion ratio, accuracy, and inference latency.

Related papers

Improving Generalization of Deep Neural Networks by Optimum Shifting [33.092571599896814]
We propose a novel method called emphoptimum shifting, which changes the parameters of a neural network from a sharp minimum to a flatter one. Our method is based on the observation that when the input and output of a neural network are fixed, the matrix multiplications within the network can be treated as systems of under-determined linear equations.
arXiv Detail & Related papers (2024-05-23T02:31:55Z)
Regularized Gauss-Newton for Optimizing Overparameterized Neural Networks [2.0072624123275533]
The generalized Gauss-Newton (GGN) optimization method incorporates curvature estimates into its solution steps. This work studies a GGN method for optimizing a two-layer neural network with explicit regularization.
arXiv Detail & Related papers (2024-04-23T10:02:22Z)
Optimization Guarantees of Unfolded ISTA and ADMM Networks With Smooth Soft-Thresholding [57.71603937699949]
We study optimization guarantees, i.e., achieving near-zero training loss with the increase in the number of learning epochs. We show that the threshold on the number of training samples increases with the increase in the network width.
arXiv Detail & Related papers (2023-09-12T13:03:47Z)
Orthogonal Stochastic Configuration Networks with Adaptive Construction Parameter for Data Analytics [6.940097162264939]
randomness makes SCNs more likely to generate approximate linear correlative nodes that are redundant and low quality. In light of a fundamental principle in machine learning, that is, a model with fewer parameters holds improved generalization. This paper proposes orthogonal SCN, termed OSCN, to filtrate out the low-quality hidden nodes for network structure reduction.
arXiv Detail & Related papers (2022-05-26T07:07:26Z)
On the Effective Number of Linear Regions in Shallow Univariate ReLU Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons. Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z)
Non-Gradient Manifold Neural Network [79.44066256794187]
Deep neural network (DNN) generally takes thousands of iterations to optimize via gradient descent. We propose a novel manifold neural network based on non-gradient optimization.
arXiv Detail & Related papers (2021-06-15T06:39:13Z)
Feature Flow Regularization: Improving Structured Sparsity in Deep Neural Networks [12.541769091896624]
Pruning is a model compression method that removes redundant parameters in deep neural networks (DNNs) We propose a simple and effective regularization strategy from a new perspective of evolution of features, which we call feature flow regularization (FFR) Experiments with VGGNets, ResNets on CIFAR-10/100, and Tiny ImageNet datasets demonstrate that FFR can significantly improve both unstructured and structured sparsity.
arXiv Detail & Related papers (2021-06-05T15:00:50Z)
Learning to Solve the AC-OPF using Sensitivity-Informed Deep Neural Networks [52.32646357164739]
We propose a deep neural network (DNN) to solve the solutions of the optimal power flow (ACOPF) The proposed SIDNN is compatible with a broad range of OPF schemes. It can be seamlessly integrated in other learning-to-OPF schemes.
arXiv Detail & Related papers (2021-03-27T00:45:23Z)
Self-grouping Convolutional Neural Networks [30.732298624941738]
We propose a novel method of designing self-grouping convolutional neural networks, called SG-CNN. For each filter, we first evaluate the importance value of their input channels to identify the importance vectors. Using the resulting emphdata-dependent centroids, we prune the less important connections, which implicitly minimizes the accuracy loss of the pruning.
arXiv Detail & Related papers (2020-09-29T06:24:32Z)
Modeling from Features: a Mean-field Framework for Over-parameterized Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs) In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit. We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z)
MSE-Optimal Neural Network Initialization via Layer Fusion [68.72356718879428]
Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks. The use of gradient combined nonvolutionity renders learning susceptible to novel problems. We propose fusing neighboring layers of deeper networks that are trained with random variables.
arXiv Detail & Related papers (2020-01-28T18:25:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.