Optimization Guarantees of Unfolded ISTA and ADMM Networks With Smooth
Soft-Thresholding
- URL: http://arxiv.org/abs/2309.06195v1
- Date: Tue, 12 Sep 2023 13:03:47 GMT
- Title: Optimization Guarantees of Unfolded ISTA and ADMM Networks With Smooth
Soft-Thresholding
- Authors: Shaik Basheeruddin Shah, Pradyumna Pradhan, Wei Pu, Ramunaidu Randhi,
Miguel R. D. Rodrigues, Yonina C. Eldar
- Abstract summary: We study optimization guarantees, i.e., achieving near-zero training loss with the increase in the number of learning epochs.
We show that the threshold on the number of training samples increases with the increase in the network width.
- Score: 57.71603937699949
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Solving linear inverse problems plays a crucial role in numerous
applications. Algorithm unfolding based, model-aware data-driven approaches
have gained significant attention for effectively addressing these problems.
Learned iterative soft-thresholding algorithm (LISTA) and alternating direction
method of multipliers compressive sensing network (ADMM-CSNet) are two widely
used such approaches, based on ISTA and ADMM algorithms, respectively. In this
work, we study optimization guarantees, i.e., achieving near-zero training loss
with the increase in the number of learning epochs, for finite-layer unfolded
networks such as LISTA and ADMM-CSNet with smooth soft-thresholding in an
over-parameterized (OP) regime. We achieve this by leveraging a modified
version of the Polyak-Lojasiewicz, denoted PL$^*$, condition. Satisfying the
PL$^*$ condition within a specific region of the loss landscape ensures the
existence of a global minimum and exponential convergence from initialization
using gradient descent based methods. Hence, we provide conditions, in terms of
the network width and the number of training samples, on these unfolded
networks for the PL$^*$ condition to hold. We achieve this by deriving the
Hessian spectral norm of these networks. Additionally, we show that the
threshold on the number of training samples increases with the increase in the
network width. Furthermore, we compare the threshold on training samples of
unfolded networks with that of a standard fully-connected feed-forward network
(FFNN) with smooth soft-thresholding non-linearity. We prove that unfolded
networks have a higher threshold value than FFNN. Consequently, one can expect
a better expected error for unfolded networks than FFNN.
Related papers
- Concurrent Training and Layer Pruning of Deep Neural Networks [0.0]
We propose an algorithm capable of identifying and eliminating irrelevant layers of a neural network during the early stages of training.
We employ a structure using residual connections around nonlinear network sections that allow the flow of information through the network once a nonlinear section is pruned.
arXiv Detail & Related papers (2024-06-06T23:19:57Z) - Adaptive Multilevel Neural Networks for Parametric PDEs with Error Estimation [0.0]
A neural network architecture is presented to solve high-dimensional parameter-dependent partial differential equations (pPDEs)
It is constructed to map parameters of the model data to corresponding finite element solutions.
It outputs a coarse grid solution and a series of corrections as produced in an adaptive finite element method (AFEM)
arXiv Detail & Related papers (2024-03-19T11:34:40Z) - Fixing the NTK: From Neural Network Linearizations to Exact Convex
Programs [63.768739279562105]
We show that for a particular choice of mask weights that do not depend on the learning targets, this kernel is equivalent to the NTK of the gated ReLU network on the training data.
A consequence of this lack of dependence on the targets is that the NTK cannot perform better than the optimal MKL kernel on the training set.
arXiv Detail & Related papers (2023-09-26T17:42:52Z) - Learning k-Level Structured Sparse Neural Networks Using Group Envelope Regularization [4.0554893636822]
We introduce a novel approach to deploy large-scale Deep Neural Networks on constrained resources.
The method speeds up inference time and aims to reduce memory demand and power consumption.
arXiv Detail & Related papers (2022-12-25T15:40:05Z) - On the Effective Number of Linear Regions in Shallow Univariate ReLU
Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons.
Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z) - Algorithms for Efficiently Learning Low-Rank Neural Networks [12.916132936159713]
We study algorithms for learning low-rank neural networks.
We present a provably efficient algorithm which learns an optimal low-rank approximation to a single-hidden-layer ReLU network.
We propose a novel low-rank framework for training low-rank $textitdeep$ networks.
arXiv Detail & Related papers (2022-02-02T01:08:29Z) - Efficient power allocation using graph neural networks and deep
algorithm unfolding [40.78748956518785]
We study the problem of optimal power allocation in a single-hop ad hoc wireless network.
We propose a hybrid neural architecture inspired by the unfolding of the algorithmic weighted minimum mean squared error (WMMSE)
We show that UWMMSE achieves robustness comparable to that of WMMSE while significantly reducing the computational complexity.
arXiv Detail & Related papers (2020-11-18T05:28:24Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z) - Network Adjustment: Channel Search Guided by FLOPs Utilization Ratio [101.84651388520584]
This paper presents a new framework named network adjustment, which considers network accuracy as a function of FLOPs.
Experiments on standard image classification datasets and a wide range of base networks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2020-04-06T15:51:00Z) - MSE-Optimal Neural Network Initialization via Layer Fusion [68.72356718879428]
Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks.
The use of gradient combined nonvolutionity renders learning susceptible to novel problems.
We propose fusing neighboring layers of deeper networks that are trained with random variables.
arXiv Detail & Related papers (2020-01-28T18:25:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.