On the robustness of the minimum $\ell_2$ interpolator
- URL: http://arxiv.org/abs/2003.05838v2
- Date: Tue, 5 Jan 2021 13:48:18 GMT
- Title: On the robustness of the minimum $\ell_2$ interpolator
- Authors: Geoffrey Chinot, Matthieu Lerasle
- Abstract summary: We analyse the interpolator with minimal $ell$-norm $hatbeta$ in a general high dimensional linear regression framework.
We prove that, with high probability, the prediction loss of this estimator is bounded from above by $(|beta*|2r_cn(Sigma)vee |xi|2)/n$, where $r_k(Sigma)sum_igeq klambda_i(Sigma)$ are the rests of the
- Score: 2.918940961856197
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We analyse the interpolator with minimal $\ell_2$-norm $\hat{\beta}$ in a
general high dimensional linear regression framework where $\mathbb Y=\mathbb
X\beta^*+\xi$ where $\mathbb X$ is a random $n\times p$ matrix with independent
$\mathcal N(0,\Sigma)$ rows and without assumption on the noise vector $\xi\in
\mathbb R^n$. We prove that, with high probability, the prediction loss of this
estimator is bounded from above by $(\|\beta^*\|^2_2r_{cn}(\Sigma)\vee
\|\xi\|^2)/n$, where $r_{k}(\Sigma)=\sum_{i\geq k}\lambda_i(\Sigma)$ are the
rests of the sum of eigenvalues of $\Sigma$. These bounds show a transition in
the rates. For high signal to noise ratios, the rates
$\|\beta^*\|^2_2r_{cn}(\Sigma)/n$ broadly improve the existing ones. For low
signal to noise ratio, we also provide lower bound holding with large
probability. Under assumptions on the sprectrum of $\Sigma$, this lower bound
is of order $\| \xi\|_2^2/n$, matching the upper bound. Consequently, in the
large noise regime, we are able to precisely track the prediction error with
large probability. This results give new insight when the interpolation can be
harmless in high dimensions.
Related papers
- Provably learning a multi-head attention layer [55.2904547651831]
Multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models.
In this work, we initiate the study of provably learning a multi-head attention layer from random examples.
We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable.
arXiv Detail & Related papers (2024-02-06T15:39:09Z) - A Unified Framework for Uniform Signal Recovery in Nonlinear Generative
Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously.
Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples.
We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z) - Statistically Optimal Robust Mean and Covariance Estimation for
Anisotropic Gaussians [3.5788754401889014]
In the strong $varepsilon$-contamination model we assume that the adversary replaced an $varepsilon$ fraction of vectors in the original Gaussian sample by any other vectors.
We construct an estimator $widehat Sigma of the cofrac matrix $Sigma that satisfies, with probability at least $1 - delta.
arXiv Detail & Related papers (2023-01-21T23:28:55Z) - Unique Games hardness of Quantum Max-Cut, and a conjectured
vector-valued Borell's inequality [6.621324975749854]
We show that the noise stability of a function $f:mathbbRn to -1, 1$ is the expected value of $f(boldsymbolx) cdot f(boldsymboly)$.
We conjecture that the expected value of $langle f(boldsymbolx), f(boldsymboly)rangle$ is minimized by the function $f(x) = x_leq k / Vert x_leq k /
arXiv Detail & Related papers (2021-11-01T20:45:42Z) - Nonasymptotic one-and two-sample tests in high dimension with unknown
covariance structure [0.0]
We consider the problem of testing if $mu is $eta-close to zero, i.e. $|mu| leq eta against $|mu| geq (eta + delta)$.
The aim of this paper is to obtain nonasymptotic upper and lower bounds on the minimal separation distancedelta$ such that we can control both the Type I and Type II errors at a given level.
arXiv Detail & Related papers (2021-09-01T06:22:53Z) - Sparse sketches with small inversion bias [79.77110958547695]
Inversion bias arises when averaging estimates of quantities that depend on the inverse covariance.
We develop a framework for analyzing inversion bias, based on our proposed concept of an $(epsilon,delta)$-unbiased estimator for random matrices.
We show that when the sketching matrix $S$ is dense and has i.i.d. sub-gaussian entries, the estimator $(epsilon,delta)$-unbiased for $(Atop A)-1$ with a sketch of size $m=O(d+sqrt d/
arXiv Detail & Related papers (2020-11-21T01:33:15Z) - Near-Optimal SQ Lower Bounds for Agnostically Learning Halfspaces and
ReLUs under Gaussian Marginals [49.60752558064027]
We study the fundamental problems of agnostically learning halfspaces and ReLUs under Gaussian marginals.
Our lower bounds provide strong evidence that current upper bounds for these tasks are essentially best possible.
arXiv Detail & Related papers (2020-06-29T17:10:10Z) - Efficient Statistics for Sparse Graphical Models from Truncated Samples [19.205541380535397]
We focus on two fundamental and classical problems: (i) inference of sparse Gaussian graphical models and (ii) support recovery of sparse linear models.
For sparse linear regression, suppose samples $(bf x,y)$ are generated where $y = bf xtopOmega* + mathcalN(0,1)$ and $(bf x, y)$ is seen only if $y$ belongs to a truncation set $S subseteq mathbbRd$.
arXiv Detail & Related papers (2020-06-17T09:21:00Z) - On the Optimal Weighted $\ell_2$ Regularization in Overparameterized
Linear Regression [23.467801864841526]
We consider the linear model $mathbfy = mathbfX mathbfbeta_star + mathbfepsilon$ with $mathbfXin mathbbRntimes p$ in the overparameterized regime $p>n$.
We provide an exact characterization of the prediction risk $mathbbE(y-mathbfxThatmathbfbeta_lambda)2$ in proportional limit $p/n
arXiv Detail & Related papers (2020-06-10T12:38:43Z) - Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
arXiv Detail & Related papers (2020-05-29T07:20:35Z) - The Average-Case Time Complexity of Certifying the Restricted Isometry
Property [66.65353643599899]
In compressed sensing, the restricted isometry property (RIP) on $M times N$ sensing matrices guarantees efficient reconstruction of sparse vectors.
We investigate the exact average-case time complexity of certifying the RIP property for $Mtimes N$ matrices with i.i.d. $mathcalN(0,1/M)$ entries.
arXiv Detail & Related papers (2020-05-22T16:55:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.