Related papers: Precise asymptotics of reweighted least-squares algorithms for linear diagonal networks

Precise asymptotics of reweighted least-squares algorithms for linear diagonal networks

URL: http://arxiv.org/abs/2406.02769v1
Date: Tue, 4 Jun 2024 20:37:17 GMT
Title: Precise asymptotics of reweighted least-squares algorithms for linear diagonal networks
Authors: Chiraag Kaushik, Justin Romberg, Vidya Muthukumar,
Abstract summary: We provide a unified analysis for a family of algorithms that encompasses IRLS, the recently proposed linlin-RFM algorithm, and the alternating diagonal neural networks. We show that, with appropriately chosen reweighting policy, a handful of sparse structures can achieve favorable performance. We also show that leveraging this in the reweighting scheme provably improves test error compared to coordinate-wise reweighting.
Score: 15.074950361970194
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The classical iteratively reweighted least-squares (IRLS) algorithm aims to recover an unknown signal from linear measurements by performing a sequence of weighted least squares problems, where the weights are recursively updated at each step. Varieties of this algorithm have been shown to achieve favorable empirical performance and theoretical guarantees for sparse recovery and $\ell_p$-norm minimization. Recently, some preliminary connections have also been made between IRLS and certain types of non-convex linear neural network architectures that are observed to exploit low-dimensional structure in high-dimensional linear models. In this work, we provide a unified asymptotic analysis for a family of algorithms that encompasses IRLS, the recently proposed lin-RFM algorithm (which was motivated by feature learning in neural networks), and the alternating minimization algorithm on linear diagonal neural networks. Our analysis operates in a "batched" setting with i.i.d. Gaussian covariates and shows that, with appropriately chosen reweighting policy, the algorithm can achieve favorable performance in only a handful of iterations. We also extend our results to the case of group-sparse recovery and show that leveraging this structure in the reweighting scheme provably improves test error compared to coordinate-wise reweighting.

Related papers

Neural Lattice Reduction: A Self-Supervised Geometric Deep Learning Approach [12.679411410749521]
We show that it is possible to parametrize the algorithm space for lattice reduction problem with neural networks and find an algorithm without supervised data. We design a deep neural model outputting factorized unimodular matrices and train it in a self-supervised manner by penalizing non-orthogonal lattice bases. We show that this approach yields an algorithm with comparable complexity and performance to the Lenstra-Lenstra-Lov'asz algorithm on a set of benchmarks.
arXiv Detail & Related papers (2023-11-14T13:54:35Z)
Robust empirical risk minimization via Newton's method [9.797319790710711]
A new variant of Newton's method for empirical risk minimization is studied. The gradient and Hessian of the objective function are replaced by robust estimators. An algorithm for obtaining robust Newton directions based on the conjugate gradient method is also proposed.
arXiv Detail & Related papers (2023-01-30T18:54:54Z)
Implicit Regularization for Group Sparsity [33.487964460794764]
We show that gradient descent over the squared regression loss, without any explicit regularization, biases towards solutions with a group sparsity structure. We analyze the gradient dynamics of the corresponding regression problem in the general noise setting and obtain minimax-optimal error rates. In the degenerate case of size-one groups, our approach gives rise to a new algorithm for sparse linear regression.
arXiv Detail & Related papers (2023-01-29T20:54:03Z)
A Recursively Recurrent Neural Network (R2N2) Architecture for Learning Iterative Algorithms [64.3064050603721]
We generalize Runge-Kutta neural network to a recurrent neural network (R2N2) superstructure for the design of customized iterative algorithms. We demonstrate that regular training of the weight parameters inside the proposed superstructure on input/output data of various computational problem classes yields similar iterations to Krylov solvers for linear equation systems, Newton-Krylov solvers for nonlinear equation systems, and Runge-Kutta solvers for ordinary differential equations.
arXiv Detail & Related papers (2022-11-22T16:30:33Z)
An Inexact Augmented Lagrangian Algorithm for Training Leaky ReLU Neural Network with Group Sparsity [13.27709100571336]
A leaky ReLU network with a group regularization term has been widely used in the recent years. We show that there is a lack of approaches to compute a stationary point deterministically. We propose an inexact augmented Lagrangian algorithm for solving the new model.
arXiv Detail & Related papers (2022-05-11T11:53:15Z)
Robust lEarned Shrinkage-Thresholding (REST): Robust unrolling for sparse recover [87.28082715343896]
We consider deep neural networks for solving inverse problems that are robust to forward model mis-specifications. We design a new robust deep neural network architecture by applying algorithm unfolding techniques to a robust version of the underlying recovery problem. The proposed REST network is shown to outperform state-of-the-art model-based and data-driven algorithms in both compressive sensing and radar imaging problems.
arXiv Detail & Related papers (2021-10-20T06:15:45Z)
Unfolding Projection-free SDP Relaxation of Binary Graph Classifier via GDPA Linearization [59.87663954467815]
Algorithm unfolding creates an interpretable and parsimonious neural network architecture by implementing each iteration of a model-based algorithm as a neural layer. In this paper, leveraging a recent linear algebraic theorem called Gershgorin disc perfect alignment (GDPA), we unroll a projection-free algorithm for semi-definite programming relaxation (SDR) of a binary graph. Experimental results show that our unrolled network outperformed pure model-based graph classifiers, and achieved comparable performance to pure data-driven networks but using far fewer parameters.
arXiv Detail & Related papers (2021-09-10T07:01:15Z)
Revisiting Recursive Least Squares for Training Deep Neural Networks [10.44340837533087]
Recursive least squares (RLS) algorithms were once widely used for training small-scale neural networks, due to their fast convergence. Previous RLS algorithms are unsuitable for training deep neural networks (DNNs), since they have high computational complexity and too many preconditions. We propose three novel RLS optimization algorithms for training feedforward neural networks, convolutional neural networks and recurrent neural networks.
arXiv Detail & Related papers (2021-09-07T17:43:51Z)
Fractal Structure and Generalization Properties of Stochastic Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure. We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z)
Investigating the Scalability and Biological Plausibility of the Activation Relaxation Algorithm [62.997667081978825]
Activation Relaxation (AR) algorithm provides a simple and robust approach for approximating the backpropagation of error algorithm. We show that the algorithm can be further simplified and made more biologically plausible by introducing a learnable set of backwards weights. We also investigate whether another biologically implausible assumption of the original AR algorithm -- the frozen feedforward pass -- can be relaxed without damaging performance.
arXiv Detail & Related papers (2020-10-13T08:02:38Z)
Activation Relaxation: A Local Dynamical Approximation to Backpropagation in the Brain [62.997667081978825]
Activation Relaxation (AR) is motivated by constructing the backpropagation gradient as the equilibrium point of a dynamical system. Our algorithm converges rapidly and robustly to the correct backpropagation gradients, requires only a single type of computational unit, and can operate on arbitrary computation graphs.
arXiv Detail & Related papers (2020-09-11T11:56:34Z)
Interpretable Deep Recurrent Neural Networks via Unfolding Reweighted $\ell_1$-$\ell_1$ Minimization: Architecture Design and Generalization Analysis [19.706363403596196]
This paper develops a novel deep recurrent neural network (coined reweighted-RNN) by the unfolding of a reweighted minimization algorithm. To the best of our knowledge, this is the first deep unfolding method that explores reweighted minimization. The experimental results on the moving MNIST dataset demonstrate that the proposed deep reweighted-RNN significantly outperforms existing RNN models.
arXiv Detail & Related papers (2020-03-18T17:02:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.