Oblivious sketching for logistic regression
- URL: http://arxiv.org/abs/2107.06615v1
- Date: Wed, 14 Jul 2021 11:29:26 GMT
- Title: Oblivious sketching for logistic regression
- Authors: Alexander Munteanu, Simon Omlor, David Woodruff
- Abstract summary: We present the first data oblivious sketch for logistic regression.
Our sketches are fast, simple, easy to implement, and our experiments demonstrate their practicality.
- Score: 72.42202783677811
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: What guarantees are possible for solving logistic regression in one pass over
a data stream? To answer this question, we present the first data oblivious
sketch for logistic regression. Our sketch can be computed in input sparsity
time over a turnstile data stream and reduces the size of a $d$-dimensional
data set from $n$ to only $\operatorname{poly}(\mu d\log n)$ weighted points,
where $\mu$ is a useful parameter which captures the complexity of compressing
the data. Solving (weighted) logistic regression on the sketch gives an $O(\log
n)$-approximation to the original problem on the full data set. We also show
how to obtain an $O(1)$-approximation with slight modifications. Our sketches
are fast, simple, easy to implement, and our experiments demonstrate their
practicality.
Related papers
- Data subsampling for Poisson regression with pth-root-link [53.63838219437508]
We develop and analyze data subsampling techniques for Poisson regression.
In particular, we consider the Poisson generalized linear model with ID- and square root-link functions.
arXiv Detail & Related papers (2024-10-30T10:09:05Z) - Turnstile $\ell_p$ leverage score sampling with applications [56.403488578703865]
We develop a novel algorithm for sampling rows $a_i$ of a matrix $AinmathbbRntimes d$, proportional to their $ell_p$ norm, when $A$ is presented in a turnstile data stream.
Our algorithm not only returns the set of sampled row indexes, it also returns slightly perturbed rows $tildea_i approx a_i$, and approximates their sampling probabilities up to $varepsilon$ relative error.
For logistic regression, our framework yields the first algorithm that achieves a $
arXiv Detail & Related papers (2024-06-01T07:33:41Z) - Scaling Up Differentially Private LASSO Regularized Logistic Regression
via Faster Frank-Wolfe Iterations [51.14495595270775]
We adapt the Frank-Wolfe algorithm for $L_1$ penalized linear regression to be aware of sparse inputs and to use them effectively.
Our results demonstrate that this procedure can reduce runtime by a factor of up to $2,200times$, depending on the value of the privacy parameter $epsilon$ and the sparsity of the dataset.
arXiv Detail & Related papers (2023-10-30T19:52:43Z) - Almost Linear Constant-Factor Sketching for $\ell_1$ and Logistic
Regression [74.28017932704704]
We improve upon previous oblivious sketching and turnstile streaming results for $ell_1$ and logistic regression.
We also give a tradeoff that yields a $1+varepsilon$ approximation in input sparsity time.
Our sketch can be extended to approximate a regularized version of logistic regression where the data-dependent regularizer corresponds to the variance of the individual logistic losses.
arXiv Detail & Related papers (2023-03-31T18:12:33Z) - Truncated Linear Regression in High Dimensions [26.41623833920794]
In truncated linear regression, $(A_i, y_i)_i$ whose dependent variable equals $y_i= A_irm T cdot x* + eta_i$ is some fixed unknown vector of interest.
The goal is to recover $x*$ under some favorable conditions on the $A_i$'s and the noise distribution.
We prove that there exists a computationally and statistically efficient method for recovering $k$-sparse $n$-dimensional vectors $x*$ from $m$ truncated samples.
arXiv Detail & Related papers (2020-07-29T00:31:34Z) - WOR and $p$'s: Sketches for $\ell_p$-Sampling Without Replacement [75.12782480740822]
We design novel composable sketches for WOR $ell_p$ sampling.
Our sketches have size that grows only linearly with the sample size.
Our method is the first to provide WOR sampling in the important regime of $p>1$ and the first to handle signed updates for $p>0$.
arXiv Detail & Related papers (2020-07-14T00:19:27Z) - Online Robust Regression via SGD on the l1 loss [19.087335681007477]
We consider the robust linear regression problem in the online setting where we have access to the data in a streaming manner.
We show in this work that the descent on the $ell_O( 1 / (1 - eta)2 n )$ loss converges to the true parameter vector at a $tildeO( 1 / (1 - eta)2 n )$ rate which is independent of the values of the contaminated measurements.
arXiv Detail & Related papers (2020-07-01T11:38:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.