On the computational and statistical complexity of over-parameterized
matrix sensing
- URL: http://arxiv.org/abs/2102.02756v1
- Date: Wed, 27 Jan 2021 04:23:49 GMT
- Title: On the computational and statistical complexity of over-parameterized
matrix sensing
- Authors: Jiacheng Zhuo, Jeongyeol Kwon, Nhat Ho, Constantine Caramanis
- Abstract summary: We consider solving the low rank matrix sensing problem with Factorized Gradient Descend (FGD) method.
By decomposing the factorized matrix $mathbfF$ into separate column spaces, we show that $|mathbfF_t - mathbfF_t - mathbfX*|_F2$ converges to a statistical error.
- Score: 30.785670369640872
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider solving the low rank matrix sensing problem with Factorized
Gradient Descend (FGD) method when the true rank is unknown and over-specified,
which we refer to as over-parameterized matrix sensing. If the ground truth
signal $\mathbf{X}^* \in \mathbb{R}^{d*d}$ is of rank $r$, but we try to
recover it using $\mathbf{F} \mathbf{F}^\top$ where $\mathbf{F} \in
\mathbb{R}^{d*k}$ and $k>r$, the existing statistical analysis falls short, due
to a flat local curvature of the loss function around the global maxima. By
decomposing the factorized matrix $\mathbf{F}$ into separate column spaces to
capture the effect of extra ranks, we show that $\|\mathbf{F}_t \mathbf{F}_t -
\mathbf{X}^*\|_{F}^2$ converges to a statistical error of $\tilde{\mathcal{O}}
({k d \sigma^2/n})$ after
$\tilde{\mathcal{O}}(\frac{\sigma_{r}}{\sigma}\sqrt{\frac{n}{d}})$ number of
iterations where $\mathbf{F}_t$ is the output of FGD after $t$ iterations,
$\sigma^2$ is the variance of the observation noise, $\sigma_{r}$ is the $r$-th
largest eigenvalue of $\mathbf{X}^*$, and $n$ is the number of sample. Our
results, therefore, offer a comprehensive picture of the statistical and
computational complexity of FGD for the over-parameterized matrix sensing
problem.
Related papers
- In-depth Analysis of Low-rank Matrix Factorisation in a Federated Setting [21.002519159190538]
We analyze a distributed algorithm to compute a low-rank matrix factorization on $N$ clients.
We obtain a global $mathbfV$ in $mathbbRd times r$ common to all clients and a local $mathbfUi$ in $mathbbRn_itimes r$.
arXiv Detail & Related papers (2024-09-13T12:28:42Z) - Provably learning a multi-head attention layer [55.2904547651831]
Multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models.
In this work, we initiate the study of provably learning a multi-head attention layer from random examples.
We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable.
arXiv Detail & Related papers (2024-02-06T15:39:09Z) - Optimal Estimator for Linear Regression with Shuffled Labels [17.99906229036223]
This paper considers the task of linear regression with shuffled labels.
$mathbf Y in mathbb Rntimes m, mathbf Pi in mathbb Rntimes p, mathbf B in mathbb Rptimes m$, and $mathbf Win mathbb Rntimes m$, respectively.
arXiv Detail & Related papers (2023-10-02T16:44:47Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Beyond Independent Measurements: General Compressed Sensing with GNN
Application [4.924126492174801]
We consider the problem of recovering a structured signal $mathbfx in mathbbRn$ from noisy cone observations.
We show that the effective rank of $mathbfB$ may be used as a surrogate for the number of measurements.
arXiv Detail & Related papers (2021-10-30T20:35:56Z) - Random matrices in service of ML footprint: ternary random features with
no performance loss [55.30329197651178]
We show that the eigenspectrum of $bf K$ is independent of the distribution of the i.i.d. entries of $bf w$.
We propose a novel random technique, called Ternary Random Feature (TRF)
The computation of the proposed random features requires no multiplication and a factor of $b$ less bits for storage compared to classical random features.
arXiv Detail & Related papers (2021-10-05T09:33:49Z) - Spectral properties of sample covariance matrices arising from random
matrices with independent non identically distributed columns [50.053491972003656]
It was previously shown that the functionals $texttr(AR(z))$, for $R(z) = (frac1nXXT- zI_p)-1$ and $Ain mathcal M_p$ deterministic, have a standard deviation of order $O(|A|_* / sqrt n)$.
Here, we show that $|mathbb E[R(z)] - tilde R(z)|_F
arXiv Detail & Related papers (2021-09-06T14:21:43Z) - Global Convergence of Gradient Descent for Asymmetric Low-Rank Matrix
Factorization [49.090785356633695]
We study the asymmetric low-rank factorization problem: [mathbfU in mathbbRm min d, mathbfU$ and mathV$.
arXiv Detail & Related papers (2021-06-27T17:25:24Z) - Non-Parametric Estimation of Manifolds from Noisy Data [1.0152838128195467]
We consider the problem of estimating a $d$ dimensional sub-manifold of $mathbbRD$ from a finite set of noisy samples.
We show that the estimation yields rates of convergence of $n-frack2k + d$ for the point estimation and $n-frack-12k + d$ for the estimation of tangent space.
arXiv Detail & Related papers (2021-05-11T02:29:33Z) - On the Optimal Weighted $\ell_2$ Regularization in Overparameterized
Linear Regression [23.467801864841526]
We consider the linear model $mathbfy = mathbfX mathbfbeta_star + mathbfepsilon$ with $mathbfXin mathbbRntimes p$ in the overparameterized regime $p>n$.
We provide an exact characterization of the prediction risk $mathbbE(y-mathbfxThatmathbfbeta_lambda)2$ in proportional limit $p/n
arXiv Detail & Related papers (2020-06-10T12:38:43Z) - The Average-Case Time Complexity of Certifying the Restricted Isometry
Property [66.65353643599899]
In compressed sensing, the restricted isometry property (RIP) on $M times N$ sensing matrices guarantees efficient reconstruction of sparse vectors.
We investigate the exact average-case time complexity of certifying the RIP property for $Mtimes N$ matrices with i.i.d. $mathcalN(0,1/M)$ entries.
arXiv Detail & Related papers (2020-05-22T16:55:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.