Noise Regularizes Over-parameterized Rank One Matrix Recovery, Provably
- URL: http://arxiv.org/abs/2202.03535v1
- Date: Mon, 7 Feb 2022 21:53:51 GMT
- Title: Noise Regularizes Over-parameterized Rank One Matrix Recovery, Provably
- Authors: Tianyi Liu, Yan Li, Enlu Zhou and Tuo Zhao
- Abstract summary: We parameterize the rank one matrix $Y*$ by $XXtop$, where $Xin Rdtimes d$.
We then show that under mild conditions, the estimator, obtained by the randomly perturbed gradient descent algorithm using the square loss function, attains a mean square error of $O(sigma2/d)$.
In contrast, the estimator obtained by gradient descent without random perturbation only attains a mean square error of $O(sigma2)$.
- Score: 42.427869499882206
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We investigate the role of noise in optimization algorithms for learning
over-parameterized models. Specifically, we consider the recovery of a rank one
matrix $Y^*\in R^{d\times d}$ from a noisy observation $Y$ using an
over-parameterization model. We parameterize the rank one matrix $Y^*$ by
$XX^\top$, where $X\in R^{d\times d}$. We then show that under mild conditions,
the estimator, obtained by the randomly perturbed gradient descent algorithm
using the square loss function, attains a mean square error of $O(\sigma^2/d)$,
where $\sigma^2$ is the variance of the observational noise. In contrast, the
estimator obtained by gradient descent without random perturbation only attains
a mean square error of $O(\sigma^2)$. Our result partially justifies the
implicit regularization effect of noise when learning over-parameterized
models, and provides new understanding of training over-parameterized neural
networks.
Related papers
- Variable Selection in Convex Piecewise Linear Regression [5.366354612549172]
This paper presents Sparse Gradient as a solution for variable selection in convex piecewise linear regression.
A non-asymptotic local convergence analysis is provided for SpGD under subGaussian noise.
arXiv Detail & Related papers (2024-11-04T16:19:09Z) - Inverting the Leverage Score Gradient: An Efficient Approximate Newton Method [10.742859956268655]
This paper aims to recover the intrinsic model parameters given the leverage scores gradient.
We specifically scrutinize the inversion of the leverage score gradient, denoted as $g(x)$.
arXiv Detail & Related papers (2024-08-21T01:39:42Z) - Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic
Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel.
An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost.
Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z) - Mirror Natural Evolution Strategies [10.495496415022064]
We focus on the theory of zeroth-order optimization which utilizes both the first-order and second-order information approximated by the zeroth-order queries.
We show that the estimated covariance matrix of textttMiNES converges to the inverse of Hessian matrix of the objective function.
arXiv Detail & Related papers (2023-08-01T11:45:24Z) - Estimating the minimizer and the minimum value of a regression function
under passive design [72.85024381807466]
We propose a new method for estimating the minimizer $boldsymbolx*$ and the minimum value $f*$ of a smooth and strongly convex regression function $f$.
We derive non-asymptotic upper bounds for the quadratic risk and optimization error of $boldsymbolz_n$, and for the risk of estimating $f*$.
arXiv Detail & Related papers (2022-11-29T18:38:40Z) - (Nearly) Optimal Private Linear Regression via Adaptive Clipping [22.639650869444395]
We study the problem of differentially private linear regression where each data point is sampled from a fixed sub-Gaussian style distribution.
We propose and analyze a one-pass mini-batch gradient descent method (DP-AMBSSGD) where points in each iteration are sampled without replacement.
arXiv Detail & Related papers (2022-07-11T08:04:46Z) - Computationally Efficient Horizon-Free Reinforcement Learning for Linear
Mixture MDPs [111.75736569611159]
We propose the first computationally efficient horizon-free algorithm for linear mixture MDPs.
Our algorithm adapts a weighted least square estimator for the unknown transitional dynamic.
This also improves upon the best-known algorithms in this setting when $sigma_k2$'s are known.
arXiv Detail & Related papers (2022-05-23T17:59:18Z) - Randomized Exploration for Reinforcement Learning with General Value
Function Approximation [122.70803181751135]
We propose a model-free reinforcement learning algorithm inspired by the popular randomized least squares value iteration (RLSVI) algorithm.
Our algorithm drives exploration by simply perturbing the training data with judiciously chosen i.i.d. scalar noises.
We complement the theory with an empirical evaluation across known difficult exploration tasks.
arXiv Detail & Related papers (2021-06-15T02:23:07Z) - Truncated Linear Regression in High Dimensions [26.41623833920794]
In truncated linear regression, $(A_i, y_i)_i$ whose dependent variable equals $y_i= A_irm T cdot x* + eta_i$ is some fixed unknown vector of interest.
The goal is to recover $x*$ under some favorable conditions on the $A_i$'s and the noise distribution.
We prove that there exists a computationally and statistically efficient method for recovering $k$-sparse $n$-dimensional vectors $x*$ from $m$ truncated samples.
arXiv Detail & Related papers (2020-07-29T00:31:34Z) - Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$
We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.