A spectral least-squares-type method for heavy-tailed corrupted
regression with unknown covariance \& heterogeneous noise
- URL: http://arxiv.org/abs/2209.02856v1
- Date: Tue, 6 Sep 2022 23:37:31 GMT
- Title: A spectral least-squares-type method for heavy-tailed corrupted
regression with unknown covariance \& heterogeneous noise
- Authors: Roberto I. Oliveira and Zoraida F. Rico and Philip Thompson
- Abstract summary: We revisit heavy-tailed corrupted least-squares linear regression assuming to have a corrupted $n$-sized label-feature sample of at most $epsilon n$ arbitrary outliers.
We propose a near-optimal computationally tractable estimator, based on the power method, assuming no knowledge on $(Sigma,Xi) nor the operator norm of $Xi$.
- Score: 2.019622939313173
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We revisit heavy-tailed corrupted least-squares linear regression assuming to
have a corrupted $n$-sized label-feature sample of at most $\epsilon n$
arbitrary outliers. We wish to estimate a $p$-dimensional parameter $b^*$ given
such sample of a label-feature pair $(y,x)$ satisfying $y=\langle
x,b^*\rangle+\xi$ with heavy-tailed $(x,\xi)$. We only assume $x$ is $L^4-L^2$
hypercontractive with constant $L>0$ and has covariance matrix $\Sigma$ with
minimum eigenvalue $1/\mu^2>0$ and bounded condition number $\kappa>0$. The
noise $\xi$ can be arbitrarily dependent on $x$ and nonsymmetric as long as
$\xi x$ has finite covariance matrix $\Xi$. We propose a near-optimal
computationally tractable estimator, based on the power method, assuming no
knowledge on $(\Sigma,\Xi)$ nor the operator norm of $\Xi$. With probability at
least $1-\delta$, our proposed estimator attains the statistical rate
$\mu^2\Vert\Xi\Vert^{1/2}(\frac{p}{n}+\frac{\log(1/\delta)}{n}+\epsilon)^{1/2}$
and breakdown-point $\epsilon\lesssim\frac{1}{L^4\kappa^2}$, both optimal in
the $\ell_2$-norm, assuming the near-optimal minimum sample size
$L^4\kappa^2(p\log p + \log(1/\delta))\lesssim n$, up to a log factor. To the
best of our knowledge, this is the first computationally tractable algorithm
satisfying simultaneously all the mentioned properties. Our estimator is based
on a two-stage Multiplicative Weight Update algorithm. The first stage
estimates a descent direction $\hat v$ with respect to the (unknown)
pre-conditioned inner product $\langle\Sigma(\cdot),\cdot\rangle$. The second
stage estimate the descent direction $\Sigma\hat v$ with respect to the (known)
inner product $\langle\cdot,\cdot\rangle$, without knowing nor estimating
$\Sigma$.
Related papers
- Iterative thresholding for non-linear learning in the strong $\varepsilon$-contamination model [3.309767076331365]
We derive approximation bounds for learning single neuron models using thresholded descent.
We also study the linear regression problem, where $sigma(mathbfx) = mathbfx$.
arXiv Detail & Related papers (2024-09-05T16:59:56Z) - Optimal Sketching for Residual Error Estimation for Matrix and Vector Norms [50.15964512954274]
We study the problem of residual error estimation for matrix and vector norms using a linear sketch.
We demonstrate that this gives a substantial advantage empirically, for roughly the same sketch size and accuracy as in previous work.
We also show an $Omega(k2/pn1-2/p)$ lower bound for the sparse recovery problem, which is tight up to a $mathrmpoly(log n)$ factor.
arXiv Detail & Related papers (2024-08-16T02:33:07Z) - Distribution-Independent Regression for Generalized Linear Models with
Oblivious Corruptions [49.69852011882769]
We show the first algorithms for the problem of regression for generalized linear models (GLMs) in the presence of additive oblivious noise.
We present an algorithm that tackles newthis problem in its most general distribution-independent setting.
This is the first newalgorithmic result for GLM regression newwith oblivious noise which can handle more than half the samples being arbitrarily corrupted.
arXiv Detail & Related papers (2023-09-20T21:41:59Z) - Statistically Optimal Robust Mean and Covariance Estimation for
Anisotropic Gaussians [3.5788754401889014]
In the strong $varepsilon$-contamination model we assume that the adversary replaced an $varepsilon$ fraction of vectors in the original Gaussian sample by any other vectors.
We construct an estimator $widehat Sigma of the cofrac matrix $Sigma that satisfies, with probability at least $1 - delta.
arXiv Detail & Related papers (2023-01-21T23:28:55Z) - Active Sampling for Linear Regression Beyond the $\ell_2$ Norm [70.49273459706546]
We study active sampling algorithms for linear regression, which aim to query only a small number of entries of a target vector.
We show that this dependence on $d$ is optimal, up to logarithmic factors.
We also provide the first total sensitivity upper bound $O(dmax1,p/2log2 n)$ for loss functions with at most degree $p$ growth.
arXiv Detail & Related papers (2021-11-09T00:20:01Z) - Private Stochastic Convex Optimization: Optimal Rates in $\ell_1$
Geometry [69.24618367447101]
Up to logarithmic factors the optimal excess population loss of any $(varepsilon,delta)$-differently private is $sqrtlog(d)/n + sqrtd/varepsilon n.$
We show that when the loss functions satisfy additional smoothness assumptions, the excess loss is upper bounded (up to logarithmic factors) by $sqrtlog(d)/n + (log(d)/varepsilon n)2/3.
arXiv Detail & Related papers (2021-03-02T06:53:44Z) - Sparse sketches with small inversion bias [79.77110958547695]
Inversion bias arises when averaging estimates of quantities that depend on the inverse covariance.
We develop a framework for analyzing inversion bias, based on our proposed concept of an $(epsilon,delta)$-unbiased estimator for random matrices.
We show that when the sketching matrix $S$ is dense and has i.i.d. sub-gaussian entries, the estimator $(epsilon,delta)$-unbiased for $(Atop A)-1$ with a sketch of size $m=O(d+sqrt d/
arXiv Detail & Related papers (2020-11-21T01:33:15Z) - Robust Gaussian Covariance Estimation in Nearly-Matrix Multiplication
Time [14.990725929840892]
We show an algorithm that runs in time $widetildeO(T(N, d) log kappa / mathrmpoly (varepsilon))$, where $T(N, d)$ is the time it takes to multiply a $d times N$ matrix by its transpose.
Our runtime matches that of the fastest algorithm for covariance estimation without outliers, up to poly-logarithmic factors.
arXiv Detail & Related papers (2020-06-23T20:21:27Z) - $\lambda$-Regularized A-Optimal Design and its Approximation by
$\lambda$-Regularized Proportional Volume Sampling [1.256413718364189]
We study the $lambda$-regularized $A$-optimal design problem and introduce the $lambda$-regularized proportional volume sampling algorithm.
The problem is motivated from optimal design in ridge regression, where one tries to minimize the expected squared error of the ridge regression predictor from the true coefficient in the underlying linear model.
arXiv Detail & Related papers (2020-06-19T15:17:57Z) - Efficient Statistics for Sparse Graphical Models from Truncated Samples [19.205541380535397]
We focus on two fundamental and classical problems: (i) inference of sparse Gaussian graphical models and (ii) support recovery of sparse linear models.
For sparse linear regression, suppose samples $(bf x,y)$ are generated where $y = bf xtopOmega* + mathcalN(0,1)$ and $(bf x, y)$ is seen only if $y$ belongs to a truncation set $S subseteq mathbbRd$.
arXiv Detail & Related papers (2020-06-17T09:21:00Z) - Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample
Complexity [59.34067736545355]
Given an MDP with $S$ states, $A$ actions, the discount factor $gamma in (0,1)$, and an approximation threshold $epsilon > 0$, we provide a model-free algorithm to learn an $epsilon$-optimal policy.
For small enough $epsilon$, we show an improved algorithm with sample complexity.
arXiv Detail & Related papers (2020-06-06T13:34:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.