Rank-1 Matrix Completion with Gradient Descent and Small Random
Initialization
- URL: http://arxiv.org/abs/2212.09396v1
- Date: Mon, 19 Dec 2022 12:05:37 GMT
- Title: Rank-1 Matrix Completion with Gradient Descent and Small Random
Initialization
- Authors: Daesung Kim and Hye Won Chung
- Abstract summary: We show that implicit regularization of GD plays a critical role in analysis.
We observe that implicit regularization GD plays a critical role in affordable analysis.
- Score: 15.127728811011245
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The nonconvex formulation of matrix completion problem has received
significant attention in recent years due to its affordable complexity compared
to the convex formulation. Gradient descent (GD) is the simplest yet efficient
baseline algorithm for solving nonconvex optimization problems. The success of
GD has been witnessed in many different problems in both theory and practice
when it is combined with random initialization. However, previous works on
matrix completion require either careful initialization or regularizers to
prove the convergence of GD. In this work, we study the rank-1 symmetric matrix
completion and prove that GD converges to the ground truth when small random
initialization is used. We show that in logarithmic amount of iterations, the
trajectory enters the region where local convergence occurs. We provide an
upper bound on the initialization size that is sufficient to guarantee the
convergence and show that a larger initialization can be used as more samples
are available. We observe that implicit regularization effect of GD plays a
critical role in the analysis, and for the entire trajectory, it prevents each
entry from becoming much larger than the others.
Related papers
- Symmetric Matrix Completion with ReLU Sampling [15.095194065320987]
We study the problem of symmetric positive semi-definite low-rank matrix completion (MC) with entry-dependent sampling.
In particular, we consider rectified linear unit (ReLU) sampling, where only stationary points are observed.
arXiv Detail & Related papers (2024-06-09T15:14:53Z) - Alternating minimization for generalized rank one matrix sensing: Sharp
predictions from a random initialization [10.516962652888989]
We show a technique for estimating properties of a rank random matrix with i.i.d.
We show sharp convergence guarantees exact recovery in a single step.
Our analysis also exposes several other properties of this problem.
arXiv Detail & Related papers (2022-07-20T05:31:05Z) - Randomly Initialized Alternating Least Squares: Fast Convergence for
Matrix Sensing [9.264464791978362]
We show that Alternating Least Squares (ALS) converges to the true solution with $varepsilon$-accuracy in $O(log n + log (1/varepsilon)) $ iterations.
Key to our proof is the observation that the trajectory of the ALS iterates only depends very mildly on certain entries of the random measurement matrices.
arXiv Detail & Related papers (2022-04-25T08:55:38Z) - Distributed stochastic optimization with large delays [59.95552973784946]
One of the most widely used methods for solving large-scale optimization problems is distributed asynchronous gradient descent (DASGD)
We show that DASGD converges to a global optimal implementation model under same delay assumptions.
arXiv Detail & Related papers (2021-07-06T21:59:49Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z) - Fast Global Convergence for Low-rank Matrix Recovery via Riemannian
Gradient Descent with Random Initialization [3.0263650731957275]
We analyze the global behavior for a class of low-rank matrix recovery problems on the Riemannian manifold.
We reveal a previously unknown geometric property of the low-rank matrix manifold, which is the existence of spurious critical points for the simple least squares function.
Our convergence guarantee is nearly optimal and almost dimension-free, which fully explains the numerical observations.
arXiv Detail & Related papers (2020-12-31T06:40:43Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z) - Provably Convergent Working Set Algorithm for Non-Convex Regularized
Regression [0.0]
This paper proposes a working set algorithm for non-regular regularizers with convergence guarantees.
Our results demonstrate high gain compared to the full problem solver for both block-coordinates or a gradient solver.
arXiv Detail & Related papers (2020-06-24T07:40:31Z) - Effective Dimension Adaptive Sketching Methods for Faster Regularized
Least-Squares Optimization [56.05635751529922]
We propose a new randomized algorithm for solving L2-regularized least-squares problems based on sketching.
We consider two of the most popular random embeddings, namely, Gaussian embeddings and the Subsampled Randomized Hadamard Transform (SRHT)
arXiv Detail & Related papers (2020-06-10T15:00:09Z) - Optimal Randomized First-Order Methods for Least-Squares Problems [56.05635751529922]
This class of algorithms encompasses several randomized methods among the fastest solvers for least-squares problems.
We focus on two classical embeddings, namely, Gaussian projections and subsampled Hadamard transforms.
Our resulting algorithm yields the best complexity known for solving least-squares problems with no condition number dependence.
arXiv Detail & Related papers (2020-02-21T17:45:32Z) - Optimal Iterative Sketching with the Subsampled Randomized Hadamard
Transform [64.90148466525754]
We study the performance of iterative sketching for least-squares problems.
We show that the convergence rate for Haar and randomized Hadamard matrices are identical, andally improve upon random projections.
These techniques may be applied to other algorithms that employ randomized dimension reduction.
arXiv Detail & Related papers (2020-02-03T16:17:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.