Rank-1 Matrix Completion with Gradient Descent and Small Random
Initialization
- URL: http://arxiv.org/abs/2212.09396v1
- Date: Mon, 19 Dec 2022 12:05:37 GMT
- Title: Rank-1 Matrix Completion with Gradient Descent and Small Random
Initialization
- Authors: Daesung Kim and Hye Won Chung
- Abstract summary: We show that implicit regularization of GD plays a critical role in analysis.
We observe that implicit regularization GD plays a critical role in affordable analysis.
- Score: 15.127728811011245
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The nonconvex formulation of matrix completion problem has received
significant attention in recent years due to its affordable complexity compared
to the convex formulation. Gradient descent (GD) is the simplest yet efficient
baseline algorithm for solving nonconvex optimization problems. The success of
GD has been witnessed in many different problems in both theory and practice
when it is combined with random initialization. However, previous works on
matrix completion require either careful initialization or regularizers to
prove the convergence of GD. In this work, we study the rank-1 symmetric matrix
completion and prove that GD converges to the ground truth when small random
initialization is used. We show that in logarithmic amount of iterations, the
trajectory enters the region where local convergence occurs. We provide an
upper bound on the initialization size that is sufficient to guarantee the
convergence and show that a larger initialization can be used as more samples
are available. We observe that implicit regularization effect of GD plays a
critical role in the analysis, and for the entire trajectory, it prevents each
entry from becoming much larger than the others.
Related papers
- Riemannian Optimization for Non-convex Euclidean Distance Geometry with Global Recovery Guarantees [6.422262171968397]
Two algorithms are proposed to solve the Euclidean Distance Geometry problem.
First algorithm converges linearly to the true solution.
Second algorithm demonstrates strong numerical performance on both synthetic and real data.
arXiv Detail & Related papers (2024-10-08T21:19:22Z) - A Sample Efficient Alternating Minimization-based Algorithm For Robust Phase Retrieval [56.67706781191521]
In this work, we present a robust phase retrieval problem where the task is to recover an unknown signal.
Our proposed oracle avoids the need for computationally spectral descent, using a simple gradient step and outliers.
arXiv Detail & Related papers (2024-09-07T06:37:23Z) - Symmetric Matrix Completion with ReLU Sampling [15.095194065320987]
We study the problem of symmetric positive semi-definite low-rank matrix completion (MC) with entry-dependent sampling.
In particular, we consider rectified linear unit (ReLU) sampling, where only stationary points are observed.
arXiv Detail & Related papers (2024-06-09T15:14:53Z) - Alternating minimization for generalized rank one matrix sensing: Sharp predictions from a random initialization [5.900674344455754]
We show a technique for estimating properties of a rank random matrix with i.i.d.
We show sharp convergence guarantees exact recovery in a single step.
Our analysis also exposes several other properties of this problem.
arXiv Detail & Related papers (2022-07-20T05:31:05Z) - Randomly Initialized Alternating Least Squares: Fast Convergence for
Matrix Sensing [9.264464791978362]
We show that Alternating Least Squares (ALS) converges to the true solution with $varepsilon$-accuracy in $O(log n + log (1/varepsilon)) $ iterations.
Key to our proof is the observation that the trajectory of the ALS iterates only depends very mildly on certain entries of the random measurement matrices.
arXiv Detail & Related papers (2022-04-25T08:55:38Z) - Distributed stochastic optimization with large delays [59.95552973784946]
One of the most widely used methods for solving large-scale optimization problems is distributed asynchronous gradient descent (DASGD)
We show that DASGD converges to a global optimal implementation model under same delay assumptions.
arXiv Detail & Related papers (2021-07-06T21:59:49Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z) - Fast Global Convergence for Low-rank Matrix Recovery via Riemannian
Gradient Descent with Random Initialization [3.0263650731957275]
We analyze the global behavior for a class of low-rank matrix recovery problems on the Riemannian manifold.
We reveal a previously unknown geometric property of the low-rank matrix manifold, which is the existence of spurious critical points for the simple least squares function.
Our convergence guarantee is nearly optimal and almost dimension-free, which fully explains the numerical observations.
arXiv Detail & Related papers (2020-12-31T06:40:43Z) - ROOT-SGD: Sharp Nonasymptotics and Near-Optimal Asymptotics in a Single Algorithm [71.13558000599839]
We study the problem of solving strongly convex and smooth unconstrained optimization problems using first-order algorithms.
We devise a novel, referred to as Recursive One-Over-T SGD, based on an easily implementable, averaging of past gradients.
We prove that it simultaneously achieves state-of-the-art performance in both a finite-sample, nonasymptotic sense and an sense.
arXiv Detail & Related papers (2020-08-28T14:46:56Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z) - Optimal Iterative Sketching with the Subsampled Randomized Hadamard
Transform [64.90148466525754]
We study the performance of iterative sketching for least-squares problems.
We show that the convergence rate for Haar and randomized Hadamard matrices are identical, andally improve upon random projections.
These techniques may be applied to other algorithms that employ randomized dimension reduction.
arXiv Detail & Related papers (2020-02-03T16:17:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.