A Nearly-Optimal Bound for Fast Regression with $\ell_\infty$ Guarantee
- URL: http://arxiv.org/abs/2302.00248v1
- Date: Wed, 1 Feb 2023 05:22:40 GMT
- Title: A Nearly-Optimal Bound for Fast Regression with $\ell_\infty$ Guarantee
- Authors: Zhao Song and Mingquan Ye and Junze Yin and Lichen Zhang
- Abstract summary: Given a matrix $Ain mathbbRntimes d$ and a tensor $bin mathbbRn$, we consider the regression problem with $ell_infty$ guarantees.
We show that in order to obtain such $ell_infty$ guarantee for $ell$ regression, one has to use sketching matrices that are dense.
We also develop a novel analytical framework for $ell_infty$ guarantee regression that utilizes the Oblivious Coordinate-wise Embedding (OCE) property
- Score: 16.409210914237086
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Given a matrix $A\in \mathbb{R}^{n\times d}$ and a vector $b\in
\mathbb{R}^n$, we consider the regression problem with $\ell_\infty$
guarantees: finding a vector $x'\in \mathbb{R}^d$ such that $ \|x'-x^*\|_\infty
\leq \frac{\epsilon}{\sqrt{d}}\cdot \|Ax^*-b\|_2\cdot \|A^\dagger\|$ where
$x^*=\arg\min_{x\in \mathbb{R}^d}\|Ax-b\|_2$. One popular approach for solving
such $\ell_2$ regression problem is via sketching: picking a structured random
matrix $S\in \mathbb{R}^{m\times n}$ with $m\ll n$ and $SA$ can be quickly
computed, solve the ``sketched'' regression problem $\arg\min_{x\in
\mathbb{R}^d} \|SAx-Sb\|_2$. In this paper, we show that in order to obtain
such $\ell_\infty$ guarantee for $\ell_2$ regression, one has to use sketching
matrices that are dense. To the best of our knowledge, this is the first user
case in which dense sketching matrices are necessary. On the algorithmic side,
we prove that there exists a distribution of dense sketching matrices with
$m=\epsilon^{-2}d\log^3(n/\delta)$ such that solving the sketched regression
problem gives the $\ell_\infty$ guarantee, with probability at least
$1-\delta$. Moreover, the matrix $SA$ can be computed in time $O(nd\log n)$.
Our row count is nearly-optimal up to logarithmic factors, and significantly
improves the result in [Price, Song and Woodruff, ICALP'17], in which a
super-linear in $d$ rows, $m=\Omega(\epsilon^{-2}d^{1+\gamma})$ for
$\gamma=\Theta(\sqrt{\frac{\log\log n}{\log d}})$ is required. We also develop
a novel analytical framework for $\ell_\infty$ guarantee regression that
utilizes the Oblivious Coordinate-wise Embedding (OCE) property introduced in
[Song and Yu, ICML'21]. Our analysis is arguably much simpler and more general
than [Price, Song and Woodruff, ICALP'17], and it extends to dense sketches for
tensor product of vectors.
Related papers
- The Communication Complexity of Approximating Matrix Rank [50.6867896228563]
We show that this problem has randomized communication complexity $Omega(frac1kcdot n2log|mathbbF|)$.
As an application, we obtain an $Omega(frac1kcdot n2log|mathbbF|)$ space lower bound for any streaming algorithm with $k$ passes.
arXiv Detail & Related papers (2024-10-26T06:21:42Z) - Optimal Sketching for Residual Error Estimation for Matrix and Vector Norms [50.15964512954274]
We study the problem of residual error estimation for matrix and vector norms using a linear sketch.
We demonstrate that this gives a substantial advantage empirically, for roughly the same sketch size and accuracy as in previous work.
We also show an $Omega(k2/pn1-2/p)$ lower bound for the sparse recovery problem, which is tight up to a $mathrmpoly(log n)$ factor.
arXiv Detail & Related papers (2024-08-16T02:33:07Z) - Sample-Efficient Linear Regression with Self-Selection Bias [7.605563562103568]
We consider the problem of linear regression with self-selection bias in the unknown-index setting.
We provide a novel and near optimally sample-efficient (in terms of $k$) algorithm to recover $mathbfw_1,ldots,mathbfw_kin.
Our algorithm succeeds under significantly relaxed noise assumptions, and therefore also succeeds in the related setting of max-linear regression.
arXiv Detail & Related papers (2024-02-22T02:20:24Z) - A Fast Optimization View: Reformulating Single Layer Attention in LLM
Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time [7.613259578185218]
We focus on giving a provable guarantee for the one-layer attention network objective function $L(X,Y).
In a multi-layer LLM network, the matrix $B in mathbbRn times d2$ can be viewed as the output of a layer.
We provide an iterative algorithm to train loss function $L(X,Y)$ up $epsilon$ that runs in $widetildeO( (cal T_mathrmmat(n,d) + d
arXiv Detail & Related papers (2023-09-14T04:23:40Z) - Randomized and Deterministic Attention Sparsification Algorithms for
Over-parameterized Feature Dimension [18.57735939471469]
We consider the sparsification of the attention problem.
For any super large feature dimension, we can reduce it down to the size nearly linear in length of sentence.
arXiv Detail & Related papers (2023-04-10T05:52:38Z) - Optimal Sketching Bounds for Sparse Linear Regression [116.30196615349226]
We study oblivious sketching for $k$-sparse linear regression under various loss functions such as an $ell_p$ norm, or from a broad class of hinge-like loss functions.
We show that for sparse $ell$vareps regression, there is an oblivious distribution over sketches with $Theta(klog(d/k)/varepsilon2)$ rows, which is tight up to a constant factor.
We also show that sketching $O(mu2 klog(mu n d/varepsilon)/var
arXiv Detail & Related papers (2023-04-05T07:24:19Z) - Sketching Algorithms and Lower Bounds for Ridge Regression [65.0720777731368]
We give a sketching-based iterative algorithm that computes $1+varepsilon$ approximate solutions for the ridge regression problem.
We also show that this algorithm can be used to give faster algorithms for kernel ridge regression.
arXiv Detail & Related papers (2022-04-13T22:18:47Z) - Active Sampling for Linear Regression Beyond the $\ell_2$ Norm [70.49273459706546]
We study active sampling algorithms for linear regression, which aim to query only a small number of entries of a target vector.
We show that this dependence on $d$ is optimal, up to logarithmic factors.
We also provide the first total sensitivity upper bound $O(dmax1,p/2log2 n)$ for loss functions with at most degree $p$ growth.
arXiv Detail & Related papers (2021-11-09T00:20:01Z) - The Average-Case Time Complexity of Certifying the Restricted Isometry
Property [66.65353643599899]
In compressed sensing, the restricted isometry property (RIP) on $M times N$ sensing matrices guarantees efficient reconstruction of sparse vectors.
We investigate the exact average-case time complexity of certifying the RIP property for $Mtimes N$ matrices with i.i.d. $mathcalN(0,1/M)$ entries.
arXiv Detail & Related papers (2020-05-22T16:55:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.