Related papers: On the computational and statistical complexity of over-parameterized matrix sensing

On the computational and statistical complexity of over-parameterized matrix sensing

URL: http://arxiv.org/abs/2102.02756v1
Date: Wed, 27 Jan 2021 04:23:49 GMT
Title: On the computational and statistical complexity of over-parameterized matrix sensing
Authors: Jiacheng Zhuo, Jeongyeol Kwon, Nhat Ho, Constantine Caramanis
Abstract summary: We consider solving the low rank matrix sensing problem with Factorized Gradient Descend (FGD) method. By decomposing the factorized matrix $mathbfF$ into separate column spaces, we show that $|mathbfF_t - mathbfF_t - mathbfX*|_F2$ converges to a statistical error.
Score: 30.785670369640872
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider solving the low rank matrix sensing problem with Factorized Gradient Descend (FGD) method when the true rank is unknown and over-specified, which we refer to as over-parameterized matrix sensing. If the ground truth signal $\mathbf{X}^* \in \mathbb{R}^{d*d}$ is of rank $r$, but we try to recover it using $\mathbf{F} \mathbf{F}^\top$ where $\mathbf{F} \in \mathbb{R}^{d*k}$ and $k>r$, the existing statistical analysis falls short, due to a flat local curvature of the loss function around the global maxima. By decomposing the factorized matrix $\mathbf{F}$ into separate column spaces to capture the effect of extra ranks, we show that $\|\mathbf{F}_t \mathbf{F}_t - \mathbf{X}^*\|_{F}^2$ converges to a statistical error of $\tilde{\mathcal{O}} ({k d \sigma^2/n})$ after $\tilde{\mathcal{O}}(\frac{\sigma_{r}}{\sigma}\sqrt{\frac{n}{d}})$ number of iterations where $\mathbf{F}_t$ is the output of FGD after $t$ iterations, $\sigma^2$ is the variance of the observation noise, $\sigma_{r}$ is the $r$-th largest eigenvalue of $\mathbf{X}^*$, and $n$ is the number of sample. Our results, therefore, offer a comprehensive picture of the statistical and computational complexity of FGD for the over-parameterized matrix sensing problem.

Related papers

In-depth Analysis of Low-rank Matrix Factorisation in a Federated Setting [21.002519159190538]
We analyze a distributed algorithm to compute a low-rank matrix factorization on $N$ clients. We obtain a global $mathbfV$ in $mathbbRd times r$ common to all clients and a local $mathbfUi$ in $mathbbRn_itimes r$.
arXiv Detail & Related papers (2024-09-13T12:28:42Z)
Asymptotic Normality of Generalized Low-Rank Matrix Sensing via Riemannian Geometry [37.53442095760427]
We prove an normality guarantee for generalized low-rank matrix sensing. We parameterize the manifold of low-rank matrices by $barthetabarthetatop$. We prove $sqrtn(phi0-phi*)xrightarrowDN(0,(H*)-1)$ as $ntoinfty$, where $phi0$ and $phi*$ are representations of $bartheta*$ and $barthe
arXiv Detail & Related papers (2024-07-14T15:11:13Z)
Provably learning a multi-head attention layer [55.2904547651831]
Multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models. In this work, we initiate the study of provably learning a multi-head attention layer from random examples. We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable.
arXiv Detail & Related papers (2024-02-06T15:39:09Z)
Optimal Estimator for Linear Regression with Shuffled Labels [17.99906229036223]
This paper considers the task of linear regression with shuffled labels. $mathbf Y in mathbb Rntimes m, mathbf Pi in mathbb Rntimes p, mathbf B in mathbb Rptimes m$, and $mathbf Win mathbb Rntimes m$, respectively.
arXiv Detail & Related papers (2023-10-02T16:44:47Z)
Learning a Single Neuron with Adversarial Label Noise via Gradient Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations. The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z)
Perturbation Analysis of Randomized SVD and its Applications to Statistics [8.731676546744353]
RSVD is a class of computationally efficient algorithms for computing the truncated SVD of large data matrices.<n>In this paper we derive upper bounds for the $ell$ and $ell_2,infty$ distances between the exact left singular vectors $widehatmathbfU$ of $widehatmathbfM$.<n>We apply our theoretical results to settings where $widehatmathbfM$ is an additive perturbation of some unobserved signal matrix $mathbfM$.
arXiv Detail & Related papers (2022-03-19T07:26:45Z)
Beyond Independent Measurements: General Compressed Sensing with GNN Application [4.924126492174801]
We consider the problem of recovering a structured signal $mathbfx in mathbbRn$ from noisy cone observations. We show that the effective rank of $mathbfB$ may be used as a surrogate for the number of measurements.
arXiv Detail & Related papers (2021-10-30T20:35:56Z)
Random matrices in service of ML footprint: ternary random features with no performance loss [55.30329197651178]
We show that the eigenspectrum of $bf K$ is independent of the distribution of the i.i.d. entries of $bf w$. We propose a novel random technique, called Ternary Random Feature (TRF) The computation of the proposed random features requires no multiplication and a factor of $b$ less bits for storage compared to classical random features.
arXiv Detail & Related papers (2021-10-05T09:33:49Z)
Spectral properties of sample covariance matrices arising from random matrices with independent non identically distributed columns [50.053491972003656]
It was previously shown that the functionals $texttr(AR(z))$, for $R(z) = (frac1nXXT- zI_p)-1$ and $Ain mathcal M_p$ deterministic, have a standard deviation of order $O(|A|_* / sqrt n)$. Here, we show that $|mathbb E[R(z)] - tilde R(z)|_F
arXiv Detail & Related papers (2021-09-06T14:21:43Z)
Global Convergence of Gradient Descent for Asymmetric Low-Rank Matrix Factorization [49.090785356633695]
We study the asymmetric low-rank factorization problem: [mathbfU in mathbbRm min d, mathbfU$ and mathV$.
arXiv Detail & Related papers (2021-06-27T17:25:24Z)
Non-Parametric Estimation of Manifolds from Noisy Data [1.0152838128195467]
We consider the problem of estimating a $d$ dimensional sub-manifold of $mathbbRD$ from a finite set of noisy samples. We show that the estimation yields rates of convergence of $n-frack2k + d$ for the point estimation and $n-frack-12k + d$ for the estimation of tangent space.
arXiv Detail & Related papers (2021-05-11T02:29:33Z)
On the Optimal Weighted $\ell_2$ Regularization in Overparameterized Linear Regression [23.467801864841526]
We consider the linear model $mathbfy = mathbfX mathbfbeta_star + mathbfepsilon$ with $mathbfXin mathbbRntimes p$ in the overparameterized regime $p>n$. We provide an exact characterization of the prediction risk $mathbbE(y-mathbfxThatmathbfbeta_lambda)2$ in proportional limit $p/n
arXiv Detail & Related papers (2020-06-10T12:38:43Z)
The Average-Case Time Complexity of Certifying the Restricted Isometry Property [66.65353643599899]
In compressed sensing, the restricted isometry property (RIP) on $M times N$ sensing matrices guarantees efficient reconstruction of sparse vectors. We investigate the exact average-case time complexity of certifying the RIP property for $Mtimes N$ matrices with i.i.d. $mathcalN(0,1/M)$ entries.
arXiv Detail & Related papers (2020-05-22T16:55:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.