Learning k-Level Sparse Neural Networks Using a New Generalized Weighted
Group Sparse Envelope Regularization
- URL: http://arxiv.org/abs/2212.12921v3
- Date: Tue, 3 Oct 2023 11:56:32 GMT
- Title: Learning k-Level Sparse Neural Networks Using a New Generalized Weighted
Group Sparse Envelope Regularization
- Authors: Yehonathan Refael and Iftach Arbel and Wasim Huleihel
- Abstract summary: We propose an efficient method for unstructured and structured neural networks during training.
We use a novel sparse envelope function (SEF) used as a regularizer, termed itshape group envelope function (WGSEF)
The method ensures a hardware-friendly structured sparsity a deep neural network (DNN) to efficiently accelerate the sparse's evaluation.
- Score: 4.557963624437785
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose an efficient method to learn both unstructured and structured
sparse neural networks during training, utilizing a novel generalization of the
sparse envelope function (SEF) used as a regularizer, termed {\itshape{weighted
group sparse envelope function}} (WGSEF). The WGSEF acts as a neuron group
selector, which is leveraged to induce structured sparsity. The method ensures
a hardware-friendly structured sparsity of a deep neural network (DNN) to
efficiently accelerate the DNN's evaluation. Notably, the method is adaptable,
letting any hardware specify group definitions, such as filters, channels,
filter shapes, layer depths, a single parameter (unstructured), etc. Owing to
the WGSEF's properties, the proposed method allows to a pre-define sparsity
level that would be achieved at the training convergence, while maintaining
negligible network accuracy degradation or even improvement in the case of
redundant parameters. We introduce an efficient technique to calculate the
exact value of the WGSEF along with its proximal operator in a worst-case
complexity of $O(n)$, where $n$ is the total number of group variables. In
addition, we propose a proximal-gradient-based optimization method to train the
model, that is, the non-convex minimization of the sum of the neural network
loss and the WGSEF. Finally, we conduct an experiment and illustrate the
efficiency of our proposed technique in terms of the completion ratio,
accuracy, and inference latency.
Related papers
- FFGAF-SNN: The Forward-Forward Based Gradient Approximation Free Training Framework for Spiking Neural Networks [7.310627646090302]
Spiking Neural Networks (SNNs) offer a biologically plausible framework for energy-efficient neuromorphic computing.<n>It is a challenge to train SNNs due to their non-differentiability, efficiently.<n>We propose a Forward-Forward (FF) based gradient approximation-free training framework for Spiking Neural Networks.
arXiv Detail & Related papers (2025-07-31T15:22:23Z) - Reconstructing Deep Neural Networks: Unleashing the Optimization Potential of Natural Gradient Descent [12.00557940490703]
We propose a novel optimization method for training deep neural networks called structured natural gradient descent (SNGD)
Our proposed method has the potential to significantly improve the scalability and efficiency of NGD in deep learning applications.
arXiv Detail & Related papers (2024-12-10T11:57:47Z) - Complexity-Aware Training of Deep Neural Networks for Optimal Structure Discovery [0.0]
We propose a novel algorithm for combined unit/filter and layer pruning of deep neural networks that functions during training and without requiring a pre-trained network to apply.
Our algorithm optimally trades-off learning accuracy and pruning levels while balancing layer vs. unit/filter pruning and computational vs. parameter complexity using only three user-defined parameters.
arXiv Detail & Related papers (2024-11-14T02:00:22Z) - Improving Generalization of Deep Neural Networks by Optimum Shifting [33.092571599896814]
We propose a novel method called emphoptimum shifting, which changes the parameters of a neural network from a sharp minimum to a flatter one.
Our method is based on the observation that when the input and output of a neural network are fixed, the matrix multiplications within the network can be treated as systems of under-determined linear equations.
arXiv Detail & Related papers (2024-05-23T02:31:55Z) - Fixing the NTK: From Neural Network Linearizations to Exact Convex
Programs [63.768739279562105]
We show that for a particular choice of mask weights that do not depend on the learning targets, this kernel is equivalent to the NTK of the gated ReLU network on the training data.
A consequence of this lack of dependence on the targets is that the NTK cannot perform better than the optimal MKL kernel on the training set.
arXiv Detail & Related papers (2023-09-26T17:42:52Z) - Optimization Guarantees of Unfolded ISTA and ADMM Networks With Smooth
Soft-Thresholding [57.71603937699949]
We study optimization guarantees, i.e., achieving near-zero training loss with the increase in the number of learning epochs.
We show that the threshold on the number of training samples increases with the increase in the network width.
arXiv Detail & Related papers (2023-09-12T13:03:47Z) - Leveraging power grid topology in machine learning assisted optimal
power flow [0.5076419064097734]
Machine learning assisted optimal power flow (OPF) aims to reduce the computational complexity of non-linear and non- constrained power flow problems.
We assess the performance of a variety of FCNN, CNN and GNN models for two fundamental approaches to machine assisted OPF.
For several synthetic grids with interconnected utilities, we show that locality properties between feature and target variables are scarce.
arXiv Detail & Related papers (2021-10-01T10:39:53Z) - Efficient Micro-Structured Weight Unification and Pruning for Neural
Network Compression [56.83861738731913]
Deep Neural Network (DNN) models are essential for practical applications, especially for resource limited devices.
Previous unstructured or structured weight pruning methods can hardly truly accelerate inference.
We propose a generalized weight unification framework at a hardware compatible micro-structured level to achieve high amount of compression and acceleration.
arXiv Detail & Related papers (2021-06-15T17:22:59Z) - Learning to Solve the AC-OPF using Sensitivity-Informed Deep Neural
Networks [52.32646357164739]
We propose a deep neural network (DNN) to solve the solutions of the optimal power flow (ACOPF)
The proposed SIDNN is compatible with a broad range of OPF schemes.
It can be seamlessly integrated in other learning-to-OPF schemes.
arXiv Detail & Related papers (2021-03-27T00:45:23Z) - Modeling from Features: a Mean-field Framework for Over-parameterized
Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs)
In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit.
We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z) - Resource Allocation via Graph Neural Networks in Free Space Optical
Fronthaul Networks [119.81868223344173]
This paper investigates the optimal resource allocation in free space optical (FSO) fronthaul networks.
We consider the graph neural network (GNN) for the policy parameterization to exploit the FSO network structure.
The primal-dual learning algorithm is developed to train the GNN in a model-free manner, where the knowledge of system models is not required.
arXiv Detail & Related papers (2020-06-26T14:20:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.