Asymmetric matrix sensing by gradient descent with small random
initialization
- URL: http://arxiv.org/abs/2309.01796v1
- Date: Mon, 4 Sep 2023 20:23:35 GMT
- Title: Asymmetric matrix sensing by gradient descent with small random
initialization
- Authors: Johan S. Wind
- Abstract summary: We study the problem of reconstructing a low-rank matrix from a few linear measurements.
Our key contribution is introducing a continuous gradient flow equation that we call the $texted gradient flow$.
- Score: 0.8611782340880084
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study matrix sensing, which is the problem of reconstructing a low-rank
matrix from a few linear measurements. It can be formulated as an
overparameterized regression problem, which can be solved by factorized
gradient descent when starting from a small random initialization.
Linear neural networks, and in particular matrix sensing by factorized
gradient descent, serve as prototypical models of non-convex problems in modern
machine learning, where complex phenomena can be disentangled and studied in
detail. Much research has been devoted to studying special cases of asymmetric
matrix sensing, such as asymmetric matrix factorization and symmetric positive
semi-definite matrix sensing.
Our key contribution is introducing a continuous differential equation that
we call the $\textit{perturbed gradient flow}$. We prove that the perturbed
gradient flow converges quickly to the true target matrix whenever the
perturbation is sufficiently bounded. The dynamics of gradient descent for
matrix sensing can be reduced to this formulation, yielding a novel proof of
asymmetric matrix sensing with factorized gradient descent. Compared to
directly analyzing the dynamics of gradient descent, the continuous formulation
allows bounding key quantities by considering their derivatives, often
simplifying the proofs. We believe the general proof technique may prove useful
in other settings as well.
Related papers
- Preconditioned Gradient Descent for Over-Parameterized Nonconvex Matrix Factorization [19.32160757444549]
In practical instances of nonspecified matrix factorization, the rank of the true solutionrstar$ is often unknown, so the rankr$ of the model can be singular as $r>rstar$.
We propose an inexpensive suber for matrix sensing variant non matrix factorization that restores the convergence factor back to linear, even in agnosticized case.
Our numerical experiments find that PrecGD works equally well in restoring the convergence of other variants non matrix factorization.
arXiv Detail & Related papers (2025-04-13T20:06:49Z) - Two-Point Deterministic Equivalence for Stochastic Gradient Dynamics in Linear Models [76.52307406752556]
We derive a novel deterministic equivalence for the two-point function of a random resolvent.
We give a unified derivation of the performance of a wide variety of high-dimensional trained linear models with gradient descent.
arXiv Detail & Related papers (2025-02-07T16:45:40Z) - Entrywise error bounds for low-rank approximations of kernel matrices [55.524284152242096]
We derive entrywise error bounds for low-rank approximations of kernel matrices obtained using the truncated eigen-decomposition.
A key technical innovation is a delocalisation result for the eigenvectors of the kernel matrix corresponding to small eigenvalues.
We validate our theory with an empirical study of a collection of synthetic and real-world datasets.
arXiv Detail & Related papers (2024-05-23T12:26:25Z) - Provably Accelerating Ill-Conditioned Low-rank Estimation via Scaled
Gradient Descent, Even with Overparameterization [48.65416821017865]
This chapter introduces a new algorithmic approach, dubbed scaled gradient (ScaledGD)
It converges linearly at a constant rate independent of the condition number of the low-rank object.
It maintains the low periteration cost of gradient descent for a variety of tasks.
arXiv Detail & Related papers (2023-10-09T21:16:57Z) - Gradient-Based Feature Learning under Structured Data [57.76552698981579]
In the anisotropic setting, the commonly used spherical gradient dynamics may fail to recover the true direction.
We show that appropriate weight normalization that is reminiscent of batch normalization can alleviate this issue.
In particular, under the spiked model with a suitably large spike, the sample complexity of gradient-based training can be made independent of the information exponent.
arXiv Detail & Related papers (2023-09-07T16:55:50Z) - The Inductive Bias of Flatness Regularization for Deep Matrix
Factorization [58.851514333119255]
This work takes the first step toward understanding the inductive bias of the minimum trace of the Hessian solutions in deep linear networks.
We show that for all depth greater than one, with the standard Isometry Property (RIP) on the measurements, minimizing the trace of Hessian is approximately equivalent to minimizing the Schatten 1-norm of the corresponding end-to-end matrix parameters.
arXiv Detail & Related papers (2023-06-22T23:14:57Z) - Neural incomplete factorization: learning preconditioners for the conjugate gradient method [2.899792823251184]
We develop a data-driven approach to accelerate the generation of effective preconditioners.
We replace the typically hand-engineered preconditioners by the output of graph neural networks.
Our method generates an incomplete factorization of the matrix and is, therefore, referred to as neural incomplete factorization (NeuralIF)
arXiv Detail & Related papers (2023-05-25T11:45:46Z) - Implicit Balancing and Regularization: Generalization and Convergence
Guarantees for Overparameterized Asymmetric Matrix Sensing [28.77440901439686]
A series of recent papers have begun to generalize this role for non-random Positive Semi-Defin (PSD) matrix sensing problems.
In this paper, we show that the trajectory of the gradient descent from small random measurements moves towards solutions that are both globally well.
arXiv Detail & Related papers (2023-03-24T19:05:52Z) - Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data [63.34506218832164]
In this work, we investigate the implicit bias of gradient flow and gradient descent in two-layer fully-connected neural networks with ReLU activations.
For gradient flow, we leverage recent work on the implicit bias for homogeneous neural networks to show that leakyally, gradient flow produces a neural network with rank at most two.
For gradient descent, provided the random variance is small enough, we show that a single step of gradient descent suffices to drastically reduce the rank of the network, and that the rank remains small throughout training.
arXiv Detail & Related papers (2022-10-13T15:09:54Z) - Algorithmic Regularization in Model-free Overparametrized Asymmetric
Matrix Factorization [16.325663190517773]
We consider the asymmetric factorization problem under a natural non formulation with arbitrary overparamatrization.
We produce the best low-rank approximation to the observed matrix.
arXiv Detail & Related papers (2022-03-06T00:07:53Z) - Memory-Efficient Backpropagation through Large Linear Layers [107.20037639738433]
In modern neural networks like Transformers, linear layers require significant memory to store activations during backward pass.
This study proposes a memory reduction approach to perform backpropagation through linear layers.
arXiv Detail & Related papers (2022-01-31T13:02:41Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - Small random initialization is akin to spectral learning: Optimization
and generalization guarantees for overparameterized low-rank matrix
reconstruction [35.585697639325105]
In this paper we show that small random initialization are not fully understood.
We reconstruct a gradient from a small randomrank matrix and find solutions akin to an optimal gradient from a low randomrank matrix.
arXiv Detail & Related papers (2021-06-28T22:52:39Z) - Implicit Regularization of Sub-Gradient Method in Robust Matrix
Recovery: Don't be Afraid of Outliers [6.320141734801679]
We show that a simple sub-gradient method converges to the true low-rank solution efficiently.
We also build upon a new notion of restricted isometry property, called sign-RIP, to prove the robustness of the method.
arXiv Detail & Related papers (2021-02-05T02:52:00Z) - Exact Linear Convergence Rate Analysis for Low-Rank Symmetric Matrix
Completion via Gradient Descent [22.851500417035947]
Factorization-based gradient descent is a scalable and efficient algorithm for solving the factorrank matrix completion.
We show that gradient descent enjoys fast convergence to estimate a solution of the global nature problem.
arXiv Detail & Related papers (2021-02-04T03:41:54Z) - Beyond Procrustes: Balancing-Free Gradient Descent for Asymmetric
Low-Rank Matrix Sensing [36.96922859748537]
Low-rank matrix estimation plays a central role in various applications across science and engineering.
Existing approaches rely on adding a metric regularization term to balance the scale of the two matrix factors.
In this paper, we provide a theoretical justification for the performance in recovering a low-rank matrix from a small number of linear measurements.
arXiv Detail & Related papers (2021-01-13T15:03:52Z) - Robust Low-rank Matrix Completion via an Alternating Manifold Proximal
Gradient Continuation Method [47.80060761046752]
Robust low-rank matrix completion (RMC) has been studied extensively for computer vision, signal processing and machine learning applications.
This problem aims to decompose a partially observed matrix into the superposition of a low-rank matrix and a sparse matrix, where the sparse matrix captures the grossly corrupted entries of the matrix.
A widely used approach to tackle RMC is to consider a convex formulation, which minimizes the nuclear norm of the low-rank matrix (to promote low-rankness) and the l1 norm of the sparse matrix (to promote sparsity).
In this paper, motivated by some recent works on low-
arXiv Detail & Related papers (2020-08-18T04:46:22Z) - Understanding Implicit Regularization in Over-Parameterized Single Index
Model [55.41685740015095]
We design regularization-free algorithms for the high-dimensional single index model.
We provide theoretical guarantees for the induced implicit regularization phenomenon.
arXiv Detail & Related papers (2020-07-16T13:27:47Z) - Accelerating Ill-Conditioned Low-Rank Matrix Estimation via Scaled
Gradient Descent [34.0533596121548]
Low-rank matrix estimation converges convex problem that finds numerous applications in signal processing, machine learning and imaging science.
We show that ScaledGD achieves the best of the best in terms of the number of the low-rank matrix.
Our analysis is also applicable to general loss that are similar to low-rank gradient descent.
arXiv Detail & Related papers (2020-05-18T17:17:16Z) - Optimal Iterative Sketching with the Subsampled Randomized Hadamard
Transform [64.90148466525754]
We study the performance of iterative sketching for least-squares problems.
We show that the convergence rate for Haar and randomized Hadamard matrices are identical, andally improve upon random projections.
These techniques may be applied to other algorithms that employ randomized dimension reduction.
arXiv Detail & Related papers (2020-02-03T16:17:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.