Related papers: A matrix concentration inequality for products

A matrix concentration inequality for products

URL: http://arxiv.org/abs/2008.05104v2
Date: Tue, 18 Aug 2020 22:16:20 GMT
Title: A matrix concentration inequality for products
Authors: Sina Baghal
Abstract summary: We show that for small enough positive $alpha$, $Z_n$ satisfies the concentration inequality beginequationlabeleq:CTbound mathbbPleft(leftVert Z_n-mathbbEleft[Z_nright]rightVert geq tright) leq 2d2cdotexpleft(frac-t2alpha sigma2 right) quad textfor all
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present a non-asymptotic concentration inequality for the random matrix product \begin{equation}\label{eq:Zn} Z_n = \left(I_d-\alpha X_n\right)\left(I_d-\alpha X_{n-1}\right)\cdots \left(I_d-\alpha X_1\right), \end{equation} where $\left\{X_k \right\}_{k=1}^{+\infty}$ is a sequence of bounded independent random positive semidefinite matrices with common expectation $\mathbb{E}\left[X_k\right]=\Sigma$. Under these assumptions, we show that, for small enough positive $\alpha$, $Z_n$ satisfies the concentration inequality \begin{equation}\label{eq:CTbound} \mathbb{P}\left(\left\Vert Z_n-\mathbb{E}\left[Z_n\right]\right\Vert \geq t\right) \leq 2d^2\cdot\exp\left(\frac{-t^2}{\alpha \sigma^2} \right) \quad \text{for all } t\geq 0, \end{equation} where $\sigma^2$ denotes a variance parameter.

Related papers

Eigenvalue distribution of the Neural Tangent Kernel in the quadratic scaling [5.142160533428576]
We compute the eigenvalue distribution of the neural tangent kernel of a two-layer neural network under a specific scaling of dimension.<n>We describe the distribution as the free multiplicative convolution of the Marchenko--Pastur distribution with a deterministic distribution depending on $sigma$ and $D$.
arXiv Detail & Related papers (2025-08-27T16:41:01Z)
MLPs at the EOC: Concentration of the NTK [7.826806223782053]
We study the concentration of the Neural Tangent (NTK) $K_theta. We prove that an approximate version of gradient independence holds at finite width. Our results imply that in order to accurately approximate the limit, hidden layer widths have to grow quadratically as $m_k = k2 m$ for some $m in bbN+1$ for sufficient concentration.
arXiv Detail & Related papers (2025-01-24T18:58:50Z)
The Communication Complexity of Approximating Matrix Rank [50.6867896228563]
We show that this problem has randomized communication complexity $Omega(frac1kcdot n2log|mathbbF|)$. As an application, we obtain an $Omega(frac1kcdot n2log|mathbbF|)$ space lower bound for any streaming algorithm with $k$ passes.
arXiv Detail & Related papers (2024-10-26T06:21:42Z)
Efficient Continual Finite-Sum Minimization [52.5238287567572]
We propose a key twist into the finite-sum minimization, dubbed as continual finite-sum minimization. Our approach significantly improves upon the $mathcalO(n/epsilon)$ FOs that $mathrmStochasticGradientDescent$ requires. We also prove that there is no natural first-order method with $mathcalOleft(n/epsilonalpharight)$ complexity gradient for $alpha 1/4$, establishing that the first-order complexity of our method is nearly tight.
arXiv Detail & Related papers (2024-06-07T08:26:31Z)
Exact objectives of random linear programs and mean widths of random polyhedrons [0.0]
We consider emphrandom linear programs (rlps) as a subclass of emphrandom optimization problems (rops) Our particular focus is on appropriate linear objectives which connect the rlps to the mean widths of random polyhedrons/polytopes.
arXiv Detail & Related papers (2024-03-06T11:51:52Z)
Provably learning a multi-head attention layer [55.2904547651831]
Multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models. In this work, we initiate the study of provably learning a multi-head attention layer from random examples. We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable.
arXiv Detail & Related papers (2024-02-06T15:39:09Z)
Convergence Rates of Stochastic Zeroth-order Gradient Descent for \L ojasiewicz Functions [6.137707924685666]
We prove convergence rates of Zeroth-order Gradient Descent (SZGD) algorithms for Lojasiewicz functions. Our results show that $ f (mathbfx_t) - f (mathbfx_infty) _t in mathbbN $ can converge faster than $ | mathbfx_infty.
arXiv Detail & Related papers (2022-10-31T00:53:17Z)
Matrix concentration inequalities and efficiency of random universal sets of quantum gates [0.0]
For a random set $mathcalS subset U(d)$ of quantum gates we provide bounds on the probability that $mathcalS$ forms a $delta$-approximate $t$-design. We show that for $mathcalS$ drawn from an exact $t$-design the probability that it forms a $delta$-approximate $t$-design satisfies the inequality $mathbbPleft(delta geq x right)leq 2D_t
arXiv Detail & Related papers (2022-02-10T23:44:09Z)
On the Self-Penalization Phenomenon in Feature Selection [69.16452769334367]
We describe an implicit sparsity-inducing mechanism based on over a family of kernels. As an application, we use this sparsity-inducing mechanism to build algorithms consistent for feature selection.
arXiv Detail & Related papers (2021-10-12T09:36:41Z)
Spectral properties of sample covariance matrices arising from random matrices with independent non identically distributed columns [50.053491972003656]
It was previously shown that the functionals $texttr(AR(z))$, for $R(z) = (frac1nXXT- zI_p)-1$ and $Ain mathcal M_p$ deterministic, have a standard deviation of order $O(|A|_* / sqrt n)$. Here, we show that $|mathbb E[R(z)] - tilde R(z)|_F
arXiv Detail & Related papers (2021-09-06T14:21:43Z)
Nonasymptotic one-and two-sample tests in high dimension with unknown covariance structure [0.0]
We consider the problem of testing if $mu is $eta-close to zero, i.e. $|mu| leq eta against $|mu| geq (eta + delta)$. The aim of this paper is to obtain nonasymptotic upper and lower bounds on the minimal separation distancedelta$ such that we can control both the Type I and Type II errors at a given level.
arXiv Detail & Related papers (2021-09-01T06:22:53Z)
Optimal Mean Estimation without a Variance [103.26777953032537]
We study the problem of heavy-tailed mean estimation in settings where the variance of the data-generating distribution does not exist. We design an estimator which attains the smallest possible confidence interval as a function of $n,d,delta$.
arXiv Detail & Related papers (2020-11-24T22:39:21Z)
On the Regularization Effect of Stochastic Gradient Descent applied to Least Squares [0.0]
We study the behavior of gradient descent applied to $|Ax -b |2 rightarrow min$ for invertible $A in mathbbRn times n$. We show that there is an explicit constant $c_A$ depending (mildly) on $A$ such that $$ mathbbE left| Ax_k+1-bright|2_2 leq.
arXiv Detail & Related papers (2020-07-27T03:01:09Z)
Near-Optimal SQ Lower Bounds for Agnostically Learning Halfspaces and ReLUs under Gaussian Marginals [49.60752558064027]
We study the fundamental problems of agnostically learning halfspaces and ReLUs under Gaussian marginals. Our lower bounds provide strong evidence that current upper bounds for these tasks are essentially best possible.
arXiv Detail & Related papers (2020-06-29T17:10:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.