A matrix concentration inequality for products
- URL: http://arxiv.org/abs/2008.05104v2
- Date: Tue, 18 Aug 2020 22:16:20 GMT
- Title: A matrix concentration inequality for products
- Authors: Sina Baghal
- Abstract summary: We show that for small enough positive $alpha$, $Z_n$ satisfies the concentration inequality beginequationlabeleq:CTbound mathbbPleft(leftVert Z_n-mathbbEleft[Z_nright]rightVert geq tright) leq 2d2cdotexpleft(frac-t2alpha sigma2 right) quad textfor all
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a non-asymptotic concentration inequality for the random matrix
product \begin{equation}\label{eq:Zn} Z_n = \left(I_d-\alpha
X_n\right)\left(I_d-\alpha X_{n-1}\right)\cdots \left(I_d-\alpha X_1\right),
\end{equation} where $\left\{X_k \right\}_{k=1}^{+\infty}$ is a sequence of
bounded independent random positive semidefinite matrices with common
expectation $\mathbb{E}\left[X_k\right]=\Sigma$. Under these assumptions, we
show that, for small enough positive $\alpha$, $Z_n$ satisfies the
concentration inequality \begin{equation}\label{eq:CTbound}
\mathbb{P}\left(\left\Vert Z_n-\mathbb{E}\left[Z_n\right]\right\Vert \geq
t\right) \leq 2d^2\cdot\exp\left(\frac{-t^2}{\alpha \sigma^2} \right) \quad
\text{for all } t\geq 0, \end{equation} where $\sigma^2$ denotes a variance
parameter.
Related papers
- Efficient Continual Finite-Sum Minimization [52.5238287567572]
We propose a key twist into the finite-sum minimization, dubbed as continual finite-sum minimization.
Our approach significantly improves upon the $mathcalO(n/epsilon)$ FOs that $mathrmStochasticGradientDescent$ requires.
We also prove that there is no natural first-order method with $mathcalOleft(n/epsilonalpharight)$ complexity gradient for $alpha 1/4$, establishing that the first-order complexity of our method is nearly tight.
arXiv Detail & Related papers (2024-06-07T08:26:31Z) - Exact objectives of random linear programs and mean widths of random
polyhedrons [0.0]
We consider emphrandom linear programs (rlps) as a subclass of emphrandom optimization problems (rops)
Our particular focus is on appropriate linear objectives which connect the rlps to the mean widths of random polyhedrons/polytopes.
arXiv Detail & Related papers (2024-03-06T11:51:52Z) - Provably learning a multi-head attention layer [55.2904547651831]
Multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models.
In this work, we initiate the study of provably learning a multi-head attention layer from random examples.
We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable.
arXiv Detail & Related papers (2024-02-06T15:39:09Z) - Convergence Rates of Stochastic Zeroth-order Gradient Descent for \L
ojasiewicz Functions [6.137707924685666]
We prove convergence rates of Zeroth-order Gradient Descent (SZGD) algorithms for Lojasiewicz functions.
Our results show that $ f (mathbfx_t) - f (mathbfx_infty) _t in mathbbN $ can converge faster than $ | mathbfx_infty.
arXiv Detail & Related papers (2022-10-31T00:53:17Z) - Matrix concentration inequalities and efficiency of random universal
sets of quantum gates [0.0]
For a random set $mathcalS subset U(d)$ of quantum gates we provide bounds on the probability that $mathcalS$ forms a $delta$-approximate $t$-design.
We show that for $mathcalS$ drawn from an exact $t$-design the probability that it forms a $delta$-approximate $t$-design satisfies the inequality $mathbbPleft(delta geq x right)leq 2D_t
arXiv Detail & Related papers (2022-02-10T23:44:09Z) - On the Self-Penalization Phenomenon in Feature Selection [69.16452769334367]
We describe an implicit sparsity-inducing mechanism based on over a family of kernels.
As an application, we use this sparsity-inducing mechanism to build algorithms consistent for feature selection.
arXiv Detail & Related papers (2021-10-12T09:36:41Z) - Spectral properties of sample covariance matrices arising from random
matrices with independent non identically distributed columns [50.053491972003656]
It was previously shown that the functionals $texttr(AR(z))$, for $R(z) = (frac1nXXT- zI_p)-1$ and $Ain mathcal M_p$ deterministic, have a standard deviation of order $O(|A|_* / sqrt n)$.
Here, we show that $|mathbb E[R(z)] - tilde R(z)|_F
arXiv Detail & Related papers (2021-09-06T14:21:43Z) - Optimal Mean Estimation without a Variance [103.26777953032537]
We study the problem of heavy-tailed mean estimation in settings where the variance of the data-generating distribution does not exist.
We design an estimator which attains the smallest possible confidence interval as a function of $n,d,delta$.
arXiv Detail & Related papers (2020-11-24T22:39:21Z) - On the Regularization Effect of Stochastic Gradient Descent applied to
Least Squares [0.0]
We study the behavior of gradient descent applied to $|Ax -b |2 rightarrow min$ for invertible $A in mathbbRn times n$.
We show that there is an explicit constant $c_A$ depending (mildly) on $A$ such that $$ mathbbE left| Ax_k+1-bright|2_2 leq.
arXiv Detail & Related papers (2020-07-27T03:01:09Z) - Near-Optimal SQ Lower Bounds for Agnostically Learning Halfspaces and
ReLUs under Gaussian Marginals [49.60752558064027]
We study the fundamental problems of agnostically learning halfspaces and ReLUs under Gaussian marginals.
Our lower bounds provide strong evidence that current upper bounds for these tasks are essentially best possible.
arXiv Detail & Related papers (2020-06-29T17:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.