Noise Regularizes Over-parameterized Rank One Matrix Recovery, Provably
- URL: http://arxiv.org/abs/2202.03535v1
- Date: Mon, 7 Feb 2022 21:53:51 GMT
- Title: Noise Regularizes Over-parameterized Rank One Matrix Recovery, Provably
- Authors: Tianyi Liu, Yan Li, Enlu Zhou and Tuo Zhao
- Abstract summary: We parameterize the rank one matrix $Y*$ by $XXtop$, where $Xin Rdtimes d$.
We then show that under mild conditions, the estimator, obtained by the randomly perturbed gradient descent algorithm using the square loss function, attains a mean square error of $O(sigma2/d)$.
In contrast, the estimator obtained by gradient descent without random perturbation only attains a mean square error of $O(sigma2)$.
- Score: 42.427869499882206
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We investigate the role of noise in optimization algorithms for learning
over-parameterized models. Specifically, we consider the recovery of a rank one
matrix $Y^*\in R^{d\times d}$ from a noisy observation $Y$ using an
over-parameterization model. We parameterize the rank one matrix $Y^*$ by
$XX^\top$, where $X\in R^{d\times d}$. We then show that under mild conditions,
the estimator, obtained by the randomly perturbed gradient descent algorithm
using the square loss function, attains a mean square error of $O(\sigma^2/d)$,
where $\sigma^2$ is the variance of the observational noise. In contrast, the
estimator obtained by gradient descent without random perturbation only attains
a mean square error of $O(\sigma^2)$. Our result partially justifies the
implicit regularization effect of noise when learning over-parameterized
models, and provides new understanding of training over-parameterized neural
networks.
Related papers
- Efficient Over-parameterized Matrix Sensing from Noisy Measurements via Alternating Preconditioned Gradient Descent [17.73720530889677]
Preconditioning methods have been proposed to accelerate the convergence of matrix sensing problem.
We propose an alternating preconditioned descent (APGD) algorithm, which alternately updates the two factor parameter.
We theoretically prove that APGD achieves near-optimal convergence at a linear rate, starting from arbitrary randoms.
arXiv Detail & Related papers (2025-02-01T15:44:39Z) - Entangled Mean Estimation in High-Dimensions [36.97113089188035]
We study the task of high-dimensional entangled mean estimation in the subset-of-signals model.
We show that the optimal error (up to polylogarithmic factors) is $f(alpha,N) + sqrtD/(alpha N)$, where the term $f(alpha,N)$ is the error of the one-dimensional problem and the second term is the sub-Gaussian error rate.
arXiv Detail & Related papers (2025-01-09T18:31:35Z) - Variable Selection in Convex Piecewise Linear Regression [5.366354612549172]
This paper presents Sparse Gradient as a solution for variable selection in convex piecewise linear regression.
A non-asymptotic local convergence analysis is provided for SpGD under subGaussian noise.
arXiv Detail & Related papers (2024-11-04T16:19:09Z) - Inverting the Leverage Score Gradient: An Efficient Approximate Newton Method [10.742859956268655]
This paper aims to recover the intrinsic model parameters given the leverage scores gradient.
We specifically scrutinize the inversion of the leverage score gradient, denoted as $g(x)$.
arXiv Detail & Related papers (2024-08-21T01:39:42Z) - Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic
Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel.
An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost.
Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z) - Estimating the minimizer and the minimum value of a regression function
under passive design [72.85024381807466]
We propose a new method for estimating the minimizer $boldsymbolx*$ and the minimum value $f*$ of a smooth and strongly convex regression function $f$.
We derive non-asymptotic upper bounds for the quadratic risk and optimization error of $boldsymbolz_n$, and for the risk of estimating $f*$.
arXiv Detail & Related papers (2022-11-29T18:38:40Z) - Computationally Efficient Horizon-Free Reinforcement Learning for Linear
Mixture MDPs [111.75736569611159]
We propose the first computationally efficient horizon-free algorithm for linear mixture MDPs.
Our algorithm adapts a weighted least square estimator for the unknown transitional dynamic.
This also improves upon the best-known algorithms in this setting when $sigma_k2$'s are known.
arXiv Detail & Related papers (2022-05-23T17:59:18Z) - Randomized Exploration for Reinforcement Learning with General Value
Function Approximation [122.70803181751135]
We propose a model-free reinforcement learning algorithm inspired by the popular randomized least squares value iteration (RLSVI) algorithm.
Our algorithm drives exploration by simply perturbing the training data with judiciously chosen i.i.d. scalar noises.
We complement the theory with an empirical evaluation across known difficult exploration tasks.
arXiv Detail & Related papers (2021-06-15T02:23:07Z) - Truncated Linear Regression in High Dimensions [26.41623833920794]
In truncated linear regression, $(A_i, y_i)_i$ whose dependent variable equals $y_i= A_irm T cdot x* + eta_i$ is some fixed unknown vector of interest.
The goal is to recover $x*$ under some favorable conditions on the $A_i$'s and the noise distribution.
We prove that there exists a computationally and statistically efficient method for recovering $k$-sparse $n$-dimensional vectors $x*$ from $m$ truncated samples.
arXiv Detail & Related papers (2020-07-29T00:31:34Z) - Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$
We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.