Algorithms for Efficiently Learning Low-Rank Neural Networks
- URL: http://arxiv.org/abs/2202.00834v2
- Date: Thu, 3 Feb 2022 16:27:08 GMT
- Title: Algorithms for Efficiently Learning Low-Rank Neural Networks
- Authors: Kiran Vodrahalli and Rakesh Shivanna and Maheswaran Sathiamoorthy and
Sagar Jain and Ed H. Chi
- Abstract summary: We study algorithms for learning low-rank neural networks.
We present a provably efficient algorithm which learns an optimal low-rank approximation to a single-hidden-layer ReLU network.
We propose a novel low-rank framework for training low-rank $textitdeep$ networks.
- Score: 12.916132936159713
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We study algorithms for learning low-rank neural networks -- networks where
the weight parameters are re-parameterized by products of two low-rank
matrices. First, we present a provably efficient algorithm which learns an
optimal low-rank approximation to a single-hidden-layer ReLU network up to
additive error $\epsilon$ with probability $\ge 1 - \delta$, given access to
noiseless samples with Gaussian marginals in polynomial time and samples. Thus,
we provide the first example of an algorithm which can efficiently learn a
neural network up to additive error without assuming the ground truth is
realizable. To solve this problem, we introduce an efficient SVD-based
$\textit{Nonlinear Kernel Projection}$ algorithm for solving a nonlinear
low-rank approximation problem over Gaussian space. Inspired by the efficiency
of our algorithm, we propose a novel low-rank initialization framework for
training low-rank $\textit{deep}$ networks, and prove that for ReLU networks,
the gap between our method and existing schemes widens as the desired rank of
the approximating weights decreases, or as the dimension of the inputs
increases (the latter point holds when network width is superlinear in
dimension). Finally, we validate our theory by training ResNet and EfficientNet
models on ImageNet.
Related papers
- Bayes-optimal learning of an extensive-width neural network from quadratically many samples [28.315491743569897]
We consider the problem of learning a target function corresponding to a single hidden layer neural network.
We consider the limit where the input dimension and the network width are proportionally large.
arXiv Detail & Related papers (2024-08-07T12:41:56Z) - LinSATNet: The Positive Linear Satisfiability Neural Networks [116.65291739666303]
This paper studies how to introduce the popular positive linear satisfiability to neural networks.
We propose the first differentiable satisfiability layer based on an extension of the classic Sinkhorn algorithm for jointly encoding multiple sets of marginal distributions.
arXiv Detail & Related papers (2024-07-18T22:05:21Z) - Optimization Guarantees of Unfolded ISTA and ADMM Networks With Smooth
Soft-Thresholding [57.71603937699949]
We study optimization guarantees, i.e., achieving near-zero training loss with the increase in the number of learning epochs.
We show that the threshold on the number of training samples increases with the increase in the network width.
arXiv Detail & Related papers (2023-09-12T13:03:47Z) - Unfolded proximal neural networks for robust image Gaussian denoising [7.018591019975253]
We propose a unified framework to build PNNs for the Gaussian denoising task, based on both the dual-FB and the primal-dual Chambolle-Pock algorithms.
We also show that accelerated versions of these algorithms enable skip connections in the associated NN layers.
arXiv Detail & Related papers (2023-08-06T15:32:16Z) - On Model Compression for Neural Networks: Framework, Algorithm, and Convergence Guarantee [21.818773423324235]
This paper focuses on two model compression techniques: low-rank approximation and weight approximation.
In this paper, a holistic framework is proposed for model compression from a novel perspective of non optimization.
arXiv Detail & Related papers (2023-03-13T02:14:42Z) - Efficient Dataset Distillation Using Random Feature Approximation [109.07737733329019]
We propose a novel algorithm that uses a random feature approximation (RFA) of the Neural Network Gaussian Process (NNGP) kernel.
Our algorithm provides at least a 100-fold speedup over KIP and can run on a single GPU.
Our new method, termed an RFA Distillation (RFAD), performs competitively with KIP and other dataset condensation algorithms in accuracy over a range of large-scale datasets.
arXiv Detail & Related papers (2022-10-21T15:56:13Z) - Robust Training and Verification of Implicit Neural Networks: A
Non-Euclidean Contractive Approach [64.23331120621118]
This paper proposes a theoretical and computational framework for training and robustness verification of implicit neural networks.
We introduce a related embedded network and show that the embedded network can be used to provide an $ell_infty$-norm box over-approximation of the reachable sets of the original network.
We apply our algorithms to train implicit neural networks on the MNIST dataset and compare the robustness of our models with the models trained via existing approaches in the literature.
arXiv Detail & Related papers (2022-08-08T03:13:24Z) - Efficiently Learning Any One Hidden Layer ReLU Network From Queries [27.428198343906352]
We give the first-time algorithm for learning arbitrary one hidden layer neural networks activations provided black-box access to the network.
Ours is the first with fully-time guarantees of efficiency even for worst-case networks.
arXiv Detail & Related papers (2021-11-08T18:59:40Z) - Unfolding Projection-free SDP Relaxation of Binary Graph Classifier via
GDPA Linearization [59.87663954467815]
Algorithm unfolding creates an interpretable and parsimonious neural network architecture by implementing each iteration of a model-based algorithm as a neural layer.
In this paper, leveraging a recent linear algebraic theorem called Gershgorin disc perfect alignment (GDPA), we unroll a projection-free algorithm for semi-definite programming relaxation (SDR) of a binary graph.
Experimental results show that our unrolled network outperformed pure model-based graph classifiers, and achieved comparable performance to pure data-driven networks but using far fewer parameters.
arXiv Detail & Related papers (2021-09-10T07:01:15Z) - Revisiting Recursive Least Squares for Training Deep Neural Networks [10.44340837533087]
Recursive least squares (RLS) algorithms were once widely used for training small-scale neural networks, due to their fast convergence.
Previous RLS algorithms are unsuitable for training deep neural networks (DNNs), since they have high computational complexity and too many preconditions.
We propose three novel RLS optimization algorithms for training feedforward neural networks, convolutional neural networks and recurrent neural networks.
arXiv Detail & Related papers (2021-09-07T17:43:51Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.