Related papers: Lasso and Partially-Rotated Designs

Lasso and Partially-Rotated Designs

URL: http://arxiv.org/abs/2505.11093v1
Date: Fri, 16 May 2025 10:25:08 GMT
Title: Lasso and Partially-Rotated Designs
Authors: Rares-Darius Buhai,
Abstract summary: We introduce a new $textitsemirandom$ family of designs for which the RE constant with respect to the secret is bounded away from zero.<n>Our results imply that Lasso achieves prediction error $O(k log d / lambda_min n)$ with high probability.
Score: 2.28438857884398
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We consider the sparse linear regression model $\mathbf{y} = X \beta +\mathbf{w}$, where $X \in \mathbb{R}^{n \times d}$ is the design, $\beta \in \mathbb{R}^{d}$ is a $k$-sparse secret, and $\mathbf{w} \sim N(0, I_n)$ is the noise. Given input $X$ and $\mathbf{y}$, the goal is to estimate $\beta$. In this setting, the Lasso estimate achieves prediction error $O(k \log d / \gamma n)$, where $\gamma$ is the restricted eigenvalue (RE) constant of $X$ with respect to $\mathrm{support}(\beta)$. In this paper, we introduce a new $\textit{semirandom}$ family of designs -- which we call $\textit{partially-rotated}$ designs -- for which the RE constant with respect to the secret is bounded away from zero even when a subset of the design columns are arbitrarily correlated among themselves. As an example of such a design, suppose we start with some arbitrary $X$, and then apply a random rotation to the columns of $X$ indexed by $\mathrm{support}(\beta)$. Let $\lambda_{\min}$ be the smallest eigenvalue of $\frac{1}{n} X_{\mathrm{support}(\beta)}^\top X_{\mathrm{support}(\beta)}$, where $X_{\mathrm{support}(\beta)}$ is the restriction of $X$ to the columns indexed by $\mathrm{support}(\beta)$. In this setting, our results imply that Lasso achieves prediction error $O(k \log d / \lambda_{\min} n)$ with high probability. This prediction error bound is independent of the arbitrary columns of $X$ not indexed by $\mathrm{support}(\beta)$, and is as good as if all of these columns were perfectly well-conditioned. Technically, our proof reduces to showing that matrices with a certain deterministic property -- which we call $\textit{restricted normalized orthogonality}$ (RNO) -- lead to RE constants that are independent of a subset of the matrix columns. This property is similar but incomparable with the restricted orthogonality condition of [CT05].

Related papers

Conditional regression for the Nonlinear Single-Variable Model [4.565636963872865]
We consider a model $F(X):=f(Pi_gamma):mathbbRdto[0,rmlen_gamma]$ where $Pi_gamma: [0,rmlen_gamma]tomathbbRd$ and $f:[0,rmlen_gamma]tomathbbR1$. We propose a nonparametric estimator, based on conditional regression, and show that it can achieve the $one$-dimensional optimal min-max rate
arXiv Detail & Related papers (2024-11-14T18:53:51Z)
The Communication Complexity of Approximating Matrix Rank [50.6867896228563]
We show that this problem has randomized communication complexity $Omega(frac1kcdot n2log|mathbbF|)$. As an application, we obtain an $Omega(frac1kcdot n2log|mathbbF|)$ space lower bound for any streaming algorithm with $k$ passes.
arXiv Detail & Related papers (2024-10-26T06:21:42Z)
Optimal Sketching for Residual Error Estimation for Matrix and Vector Norms [50.15964512954274]
We study the problem of residual error estimation for matrix and vector norms using a linear sketch. We demonstrate that this gives a substantial advantage empirically, for roughly the same sketch size and accuracy as in previous work. We also show an $Omega(k2/pn1-2/p)$ lower bound for the sparse recovery problem, which is tight up to a $mathrmpoly(log n)$ factor.
arXiv Detail & Related papers (2024-08-16T02:33:07Z)
Provably learning a multi-head attention layer [55.2904547651831]
Multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models. In this work, we initiate the study of provably learning a multi-head attention layer from random examples. We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable.
arXiv Detail & Related papers (2024-02-06T15:39:09Z)
A Nearly-Optimal Bound for Fast Regression with $\ell_\infty$ Guarantee [16.409210914237086]
Given a matrix $Ain mathbbRntimes d$ and a tensor $bin mathbbRn$, we consider the regression problem with $ell_infty$ guarantees. We show that in order to obtain such $ell_infty$ guarantee for $ell$ regression, one has to use sketching matrices that are dense. We also develop a novel analytical framework for $ell_infty$ guarantee regression that utilizes the Oblivious Coordinate-wise Embedding (OCE) property
arXiv Detail & Related papers (2023-02-01T05:22:40Z)
Spectral properties of sample covariance matrices arising from random matrices with independent non identically distributed columns [50.053491972003656]
It was previously shown that the functionals $texttr(AR(z))$, for $R(z) = (frac1nXXT- zI_p)-1$ and $Ain mathcal M_p$ deterministic, have a standard deviation of order $O(|A|_* / sqrt n)$. Here, we show that $|mathbb E[R(z)] - tilde R(z)|_F
arXiv Detail & Related papers (2021-09-06T14:21:43Z)
The planted matching problem: Sharp threshold and infinite-order phase transition [25.41713098167692]
We study the problem of reconstructing a perfect matching $M*$ hidden in a randomly weighted $ntimes n$ bipartite graph. We show that if $sqrtd B(mathcalP,mathcalQ) ge 1+epsilon$ for an arbitrarily small constant $epsilon>0$, the reconstruction error for any estimator is shown to be bounded away from $0$.
arXiv Detail & Related papers (2021-03-17T00:59:33Z)
Optimal Mean Estimation without a Variance [103.26777953032537]
We study the problem of heavy-tailed mean estimation in settings where the variance of the data-generating distribution does not exist. We design an estimator which attains the smallest possible confidence interval as a function of $n,d,delta$.
arXiv Detail & Related papers (2020-11-24T22:39:21Z)
Efficient Statistics for Sparse Graphical Models from Truncated Samples [19.205541380535397]
We focus on two fundamental and classical problems: (i) inference of sparse Gaussian graphical models and (ii) support recovery of sparse linear models. For sparse linear regression, suppose samples $(bf x,y)$ are generated where $y = bf xtopOmega* + mathcalN(0,1)$ and $(bf x, y)$ is seen only if $y$ belongs to a truncation set $S subseteq mathbbRd$.
arXiv Detail & Related papers (2020-06-17T09:21:00Z)
On the Optimal Weighted $\ell_2$ Regularization in Overparameterized Linear Regression [23.467801864841526]
We consider the linear model $mathbfy = mathbfX mathbfbeta_star + mathbfepsilon$ with $mathbfXin mathbbRntimes p$ in the overparameterized regime $p>n$. We provide an exact characterization of the prediction risk $mathbbE(y-mathbfxThatmathbfbeta_lambda)2$ in proportional limit $p/n
arXiv Detail & Related papers (2020-06-10T12:38:43Z)
The Average-Case Time Complexity of Certifying the Restricted Isometry Property [66.65353643599899]
In compressed sensing, the restricted isometry property (RIP) on $M times N$ sensing matrices guarantees efficient reconstruction of sparse vectors. We investigate the exact average-case time complexity of certifying the RIP property for $Mtimes N$ matrices with i.i.d. $mathcalN(0,1/M)$ entries.
arXiv Detail & Related papers (2020-05-22T16:55:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.