On the Power of Preconditioning in Sparse Linear Regression
- URL: http://arxiv.org/abs/2106.09207v1
- Date: Thu, 17 Jun 2021 02:12:01 GMT
- Title: On the Power of Preconditioning in Sparse Linear Regression
- Authors: Jonathan Kelner, Frederic Koehler, Raghu Meka, Dhruv Rohatgi
- Abstract summary: We show that a preconditioned Lasso can solve a large class of sparse linear regression problems nearly optimally.
For the first time, we construct random-design instances which are provably hard for an optimally preconditioned Lasso.
- Score: 24.140675945592704
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sparse linear regression is a fundamental problem in high-dimensional
statistics, but strikingly little is known about how to efficiently solve it
without restrictive conditions on the design matrix. We consider the
(correlated) random design setting, where the covariates are independently
drawn from a multivariate Gaussian $N(0,\Sigma)$ with $\Sigma : n \times n$,
and seek estimators $\hat{w}$ minimizing $(\hat{w}-w^*)^T\Sigma(\hat{w}-w^*)$,
where $w^*$ is the $k$-sparse ground truth. Information theoretically, one can
achieve strong error bounds with $O(k \log n)$ samples for arbitrary $\Sigma$
and $w^*$; however, no efficient algorithms are known to match these guarantees
even with $o(n)$ samples, without further assumptions on $\Sigma$ or $w^*$. As
far as hardness, computational lower bounds are only known with worst-case
design matrices. Random-design instances are known which are hard for the
Lasso, but these instances can generally be solved by Lasso after a simple
change-of-basis (i.e. preconditioning).
In this work, we give upper and lower bounds clarifying the power of
preconditioning in sparse linear regression. First, we show that the
preconditioned Lasso can solve a large class of sparse linear regression
problems nearly optimally: it succeeds whenever the dependency structure of the
covariates, in the sense of the Markov property, has low treewidth -- even if
$\Sigma$ is highly ill-conditioned. Second, we construct (for the first time)
random-design instances which are provably hard for an optimally preconditioned
Lasso. In fact, we complete our treewidth classification by proving that for
any treewidth-$t$ graph, there exists a Gaussian Markov Random Field on this
graph such that the preconditioned Lasso, with any choice of preconditioner,
requires $\Omega(t^{1/20})$ samples to recover $O(\log n)$-sparse signals when
covariates are drawn from this model.
Related papers
- Optimal Sketching for Residual Error Estimation for Matrix and Vector Norms [50.15964512954274]
We study the problem of residual error estimation for matrix and vector norms using a linear sketch.
We demonstrate that this gives a substantial advantage empirically, for roughly the same sketch size and accuracy as in previous work.
We also show an $Omega(k2/pn1-2/p)$ lower bound for the sparse recovery problem, which is tight up to a $mathrmpoly(log n)$ factor.
arXiv Detail & Related papers (2024-08-16T02:33:07Z) - Sparse Linear Regression and Lattice Problems [9.50127504736299]
We give evidence of average-case hardness of SLR w.r.t. all efficient algorithms assuming the worst-case hardness of lattice problems.
Specifically, we give an instance-by-instance reduction from a variant of the bounded distance decoding (BDD) problem on lattices to SLR.
arXiv Detail & Related papers (2024-02-22T15:45:27Z) - Computational-Statistical Gaps for Improper Learning in Sparse Linear Regression [4.396860522241307]
We show that an efficient learning algorithm for sparse linear regression can be used to solve sparse PCA problems with a negative spike.
We complement our reduction with low-degree and statistical query lower bounds for the sparse problems from which we reduce.
arXiv Detail & Related papers (2024-02-21T19:55:01Z) - Feature Adaptation for Sparse Linear Regression [20.923321050404827]
Sparse linear regression is a central problem in high-dimensional statistics.
We provide an algorithm that adapts to tolerate a small number of approximate dependencies.
Our framework fits into a broader framework of feature adaptation for sparse linear regression.
arXiv Detail & Related papers (2023-05-26T12:53:13Z) - Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model.
The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z) - Robust Testing in High-Dimensional Sparse Models [0.0]
We consider the problem of robustly testing the norm of a high-dimensional sparse signal vector under two different observation models.
We show that any algorithm that reliably tests the norm of the regression coefficient requires at least $n=Omegaleft(min(slog d,1/gamma4)right) samples.
arXiv Detail & Related papers (2022-05-16T07:47:22Z) - Sketching Algorithms and Lower Bounds for Ridge Regression [65.0720777731368]
We give a sketching-based iterative algorithm that computes $1+varepsilon$ approximate solutions for the ridge regression problem.
We also show that this algorithm can be used to give faster algorithms for kernel ridge regression.
arXiv Detail & Related papers (2022-04-13T22:18:47Z) - Minimax Optimal Quantization of Linear Models: Information-Theoretic
Limits and Efficient Algorithms [59.724977092582535]
We consider the problem of quantizing a linear model learned from measurements.
We derive an information-theoretic lower bound for the minimax risk under this setting.
We show that our method and upper-bounds can be extended for two-layer ReLU neural networks.
arXiv Detail & Related papers (2022-02-23T02:39:04Z) - Statistical Query Lower Bounds for List-Decodable Linear Regression [55.06171096484622]
We study the problem of list-decodable linear regression, where an adversary can corrupt a majority of the examples.
Our main result is a Statistical Query (SQ) lower bound of $dmathrmpoly (1/alpha)$ for this problem.
arXiv Detail & Related papers (2021-06-17T17:45:21Z) - Estimating Stochastic Linear Combination of Non-linear Regressions
Efficiently and Scalably [23.372021234032363]
We show that when the sub-sample sizes are large then the estimation errors will be sacrificed by too much.
To the best of our knowledge, this is the first work that and guarantees for the lineartext+Stochasticity model.
arXiv Detail & Related papers (2020-10-19T07:15:38Z) - Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$
We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.