Active Sampling for Linear Regression Beyond the $\ell_2$ Norm
- URL: http://arxiv.org/abs/2111.04888v1
- Date: Tue, 9 Nov 2021 00:20:01 GMT
- Title: Active Sampling for Linear Regression Beyond the $\ell_2$ Norm
- Authors: Cameron Musco, Christopher Musco, David P. Woodruff, Taisuke Yasuda
- Abstract summary: We study active sampling algorithms for linear regression, which aim to query only a small number of entries of a target vector.
We show that this dependence on $d$ is optimal, up to logarithmic factors.
We also provide the first total sensitivity upper bound $O(dmax1,p/2log2 n)$ for loss functions with at most degree $p$ growth.
- Score: 70.49273459706546
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study active sampling algorithms for linear regression, which aim to query
only a small number of entries of a target vector $b\in\mathbb{R}^n$ and output
a near minimizer to $\min_{x\in\mathbb{R}^d}\|Ax-b\|$, where $A\in\mathbb{R}^{n
\times d}$ is a design matrix and $\|\cdot\|$ is some loss function.
For $\ell_p$ norm regression for any $0<p<\infty$, we give an algorithm based
on Lewis weight sampling that outputs a $(1+\epsilon)$ approximate solution
using just $\tilde{O}(d^{\max(1,{p/2})}/\mathrm{poly}(\epsilon))$ queries to
$b$. We show that this dependence on $d$ is optimal, up to logarithmic factors.
Our result resolves a recent open question of Chen and Derezi\'{n}ski, who gave
near optimal bounds for the $\ell_1$ norm, and suboptimal bounds for $\ell_p$
regression with $p\in(1,2)$.
We also provide the first total sensitivity upper bound of
$O(d^{\max\{1,p/2\}}\log^2 n)$ for loss functions with at most degree $p$
polynomial growth. This improves a recent result of Tukan, Maalouf, and
Feldman. By combining this with our techniques for the $\ell_p$ regression
result, we obtain an active regression algorithm making $\tilde
O(d^{1+\max\{1,p/2\}}/\mathrm{poly}(\epsilon))$ queries, answering another open
question of Chen and Derezi\'{n}ski. For the important special case of the
Huber loss, we further improve our bound to an active sample complexity of
$\tilde O(d^{(1+\sqrt2)/2}/\epsilon^c)$ and a non-active sample complexity of
$\tilde O(d^{4-2\sqrt 2}/\epsilon^c)$, improving a previous $d^4$ bound for
Huber regression due to Clarkson and Woodruff. Our sensitivity bounds have
further implications, improving a variety of previous results using sensitivity
sampling, including Orlicz norm subspace embeddings and robust subspace
approximation. Finally, our active sampling results give the first sublinear
time algorithms for Kronecker product regression under every $\ell_p$ norm.
Related papers
- Iterative thresholding for non-linear learning in the strong $\varepsilon$-contamination model [3.309767076331365]
We derive approximation bounds for learning single neuron models using thresholded descent.
We also study the linear regression problem, where $sigma(mathbfx) = mathbfx$.
arXiv Detail & Related papers (2024-09-05T16:59:56Z) - Optimal Sketching for Residual Error Estimation for Matrix and Vector Norms [50.15964512954274]
We study the problem of residual error estimation for matrix and vector norms using a linear sketch.
We demonstrate that this gives a substantial advantage empirically, for roughly the same sketch size and accuracy as in previous work.
We also show an $Omega(k2/pn1-2/p)$ lower bound for the sparse recovery problem, which is tight up to a $mathrmpoly(log n)$ factor.
arXiv Detail & Related papers (2024-08-16T02:33:07Z) - Optimal bounds for $\ell_p$ sensitivity sampling via $\ell_2$ augmentation [56.403488578703865]
We show that by augmenting the $ell_p$ sensitivities by $ell$ sensitivities, we obtain better bounds on optimal linear $tilde O(varepsilon-2mu2 d)$ sampling complexity.
We also obtain an $tilde O(varepsilon-2mu2 d)$ sensitivity sampling bound for logistic regression.
arXiv Detail & Related papers (2024-06-01T07:03:40Z) - Sample-Efficient Linear Regression with Self-Selection Bias [7.605563562103568]
We consider the problem of linear regression with self-selection bias in the unknown-index setting.
We provide a novel and near optimally sample-efficient (in terms of $k$) algorithm to recover $mathbfw_1,ldots,mathbfw_kin.
Our algorithm succeeds under significantly relaxed noise assumptions, and therefore also succeeds in the related setting of max-linear regression.
arXiv Detail & Related papers (2024-02-22T02:20:24Z) - Optimal Sketching Bounds for Sparse Linear Regression [116.30196615349226]
We study oblivious sketching for $k$-sparse linear regression under various loss functions such as an $ell_p$ norm, or from a broad class of hinge-like loss functions.
We show that for sparse $ell$vareps regression, there is an oblivious distribution over sketches with $Theta(klog(d/k)/varepsilon2)$ rows, which is tight up to a constant factor.
We also show that sketching $O(mu2 klog(mu n d/varepsilon)/var
arXiv Detail & Related papers (2023-04-05T07:24:19Z) - A spectral least-squares-type method for heavy-tailed corrupted
regression with unknown covariance \& heterogeneous noise [2.019622939313173]
We revisit heavy-tailed corrupted least-squares linear regression assuming to have a corrupted $n$-sized label-feature sample of at most $epsilon n$ arbitrary outliers.
We propose a near-optimal computationally tractable estimator, based on the power method, assuming no knowledge on $(Sigma,Xi) nor the operator norm of $Xi$.
arXiv Detail & Related papers (2022-09-06T23:37:31Z) - Low-Rank Approximation with $1/\epsilon^{1/3}$ Matrix-Vector Products [58.05771390012827]
We study iterative methods based on Krylov subspaces for low-rank approximation under any Schatten-$p$ norm.
Our main result is an algorithm that uses only $tildeO(k/sqrtepsilon)$ matrix-vector products.
arXiv Detail & Related papers (2022-02-10T16:10:41Z) - Compressed Deep Networks: Goodbye SVD, Hello Robust Low-Rank
Approximation [23.06440095688755]
A common technique for compressing a neural network is to compute the $k$-rank $ell$ approximation $A_k,2$ of the matrix $AinmathbbRntimes d$ that corresponds to a fully connected layer (or embedding layer)
Here, $d$ is the number of the neurons in the layer, $n$ is the number in the next one, and $A_k,2$ can be stored in $O(n+d)k)$ memory instead of $O(nd)$.
This
arXiv Detail & Related papers (2020-09-11T20:21:42Z) - Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample
Complexity [59.34067736545355]
Given an MDP with $S$ states, $A$ actions, the discount factor $gamma in (0,1)$, and an approximation threshold $epsilon > 0$, we provide a model-free algorithm to learn an $epsilon$-optimal policy.
For small enough $epsilon$, we show an improved algorithm with sample complexity.
arXiv Detail & Related papers (2020-06-06T13:34:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.