Related papers: Oblivious sketching for logistic regression

Oblivious sketching for logistic regression

URL: http://arxiv.org/abs/2107.06615v1
Date: Wed, 14 Jul 2021 11:29:26 GMT
Title: Oblivious sketching for logistic regression
Authors: Alexander Munteanu, Simon Omlor, David Woodruff
Abstract summary: We present the first data oblivious sketch for logistic regression. Our sketches are fast, simple, easy to implement, and our experiments demonstrate their practicality.
Score: 72.42202783677811
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: What guarantees are possible for solving logistic regression in one pass over a data stream? To answer this question, we present the first data oblivious sketch for logistic regression. Our sketch can be computed in input sparsity time over a turnstile data stream and reduces the size of a $d$-dimensional data set from $n$ to only $\operatorname{poly}(\mu d\log n)$ weighted points, where $\mu$ is a useful parameter which captures the complexity of compressing the data. Solving (weighted) logistic regression on the sketch gives an $O(\log n)$-approximation to the original problem on the full data set. We also show how to obtain an $O(1)$-approximation with slight modifications. Our sketches are fast, simple, easy to implement, and our experiments demonstrate their practicality.

Related papers

Data subsampling for Poisson regression with pth-root-link [53.63838219437508]
We develop and analyze data subsampling techniques for Poisson regression. In particular, we consider the Poisson generalized linear model with ID- and square root-link functions.
arXiv Detail & Related papers (2024-10-30T10:09:05Z)
Turnstile $\ell_p$ leverage score sampling with applications [56.403488578703865]
We develop a novel algorithm for sampling rows $a_i$ of a matrix $AinmathbbRntimes d$, proportional to their $ell_p$ norm, when $A$ is presented in a turnstile data stream. Our algorithm not only returns the set of sampled row indexes, it also returns slightly perturbed rows $tildea_i approx a_i$, and approximates their sampling probabilities up to $varepsilon$ relative error. For logistic regression, our framework yields the first algorithm that achieves a $
arXiv Detail & Related papers (2024-06-01T07:33:41Z)
Scaling Up Differentially Private LASSO Regularized Logistic Regression via Faster Frank-Wolfe Iterations [51.14495595270775]
We adapt the Frank-Wolfe algorithm for $L_1$ penalized linear regression to be aware of sparse inputs and to use them effectively. Our results demonstrate that this procedure can reduce runtime by a factor of up to $2,200times$, depending on the value of the privacy parameter $epsilon$ and the sparsity of the dataset.
arXiv Detail & Related papers (2023-10-30T19:52:43Z)
Almost Linear Constant-Factor Sketching for $\ell_1$ and Logistic Regression [74.28017932704704]
We improve upon previous oblivious sketching and turnstile streaming results for $ell_1$ and logistic regression. We also give a tradeoff that yields a $1+varepsilon$ approximation in input sparsity time. Our sketch can be extended to approximate a regularized version of logistic regression where the data-dependent regularizer corresponds to the variance of the individual logistic losses.
arXiv Detail & Related papers (2023-03-31T18:12:33Z)
Truncated Linear Regression in High Dimensions [26.41623833920794]
In truncated linear regression, $(A_i, y_i)_i$ whose dependent variable equals $y_i= A_irm T cdot x* + eta_i$ is some fixed unknown vector of interest. The goal is to recover $x*$ under some favorable conditions on the $A_i$'s and the noise distribution. We prove that there exists a computationally and statistically efficient method for recovering $k$-sparse $n$-dimensional vectors $x*$ from $m$ truncated samples.
arXiv Detail & Related papers (2020-07-29T00:31:34Z)
WOR and $p$'s: Sketches for $\ell_p$-Sampling Without Replacement [75.12782480740822]
We design novel composable sketches for WOR $ell_p$ sampling. Our sketches have size that grows only linearly with the sample size. Our method is the first to provide WOR sampling in the important regime of $p>1$ and the first to handle signed updates for $p>0$.
arXiv Detail & Related papers (2020-07-14T00:19:27Z)
Online Robust Regression via SGD on the l1 loss [19.087335681007477]
We consider the robust linear regression problem in the online setting where we have access to the data in a streaming manner. We show in this work that the descent on the $ell_O( 1 / (1 - eta)2 n )$ loss converges to the true parameter vector at a $tildeO( 1 / (1 - eta)2 n )$ rate which is independent of the values of the contaminated measurements.
arXiv Detail & Related papers (2020-07-01T11:38:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.