Ridge Regression with Frequent Directions: Statistical and Optimization
Perspectives
- URL: http://arxiv.org/abs/2011.03607v1
- Date: Fri, 6 Nov 2020 21:40:38 GMT
- Title: Ridge Regression with Frequent Directions: Statistical and Optimization
Perspectives
- Authors: Charlie Dickens
- Abstract summary: We show that acrshortfd can be used in the optimization setting through an iterative scheme which yields high-accuracy solutions.
This improves on randomized approaches which need to compromise the need for a new sketch every iteration with speed of convergence.
- Score: 1.0152838128195465
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite its impressive theory \& practical performance, Frequent Directions
(\acrshort{fd}) has not been widely adopted for large-scale regression tasks.
Prior work has shown randomized sketches (i) perform worse in estimating the
covariance matrix of the data than \acrshort{fd}; (ii) incur high error when
estimating the bias and/or variance on sketched ridge regression. We give the
first constant factor relative error bounds on the bias \& variance for
sketched ridge regression using \acrshort{fd}. We complement these statistical
results by showing that \acrshort{fd} can be used in the optimization setting
through an iterative scheme which yields high-accuracy solutions. This improves
on randomized approaches which need to compromise the need for a new sketch
every iteration with speed of convergence. In both settings, we also show using
\emph{Robust Frequent Directions} further enhances performance.
Related papers
- Estimating Generalization Performance Along the Trajectory of Proximal SGD in Robust Regression [4.150180443030652]
We introduce estimators that precisely track the generalization error of the iterates along the trajectory of the iterative algorithm.
The results are illustrated through several examples, including Huber regression, pseudo-Huber regression, and their penalized variants with non-smooth regularizer.
arXiv Detail & Related papers (2024-10-03T16:13:42Z) - Multivariate root-n-consistent smoothing parameter free matching estimators and estimators of inverse density weighted expectations [51.000851088730684]
We develop novel modifications of nearest-neighbor and matching estimators which converge at the parametric $sqrt n $-rate.
We stress that our estimators do not involve nonparametric function estimators and in particular do not rely on sample-size dependent parameters smoothing.
arXiv Detail & Related papers (2024-07-11T13:28:34Z) - Stagewise Boosting Distributional Regression [0.0]
We propose a stagewise boosting-type algorithm for distributional regression.
We extend it with a novel regularization method, correlation filtering, to provide additional stability.
Besides the advantage of processing large datasets, the nature of the approximations can lead to better results.
arXiv Detail & Related papers (2024-05-28T15:40:39Z) - Distributed Sketching for Randomized Optimization: Exact
Characterization, Concentration and Lower Bounds [54.51566432934556]
We consider distributed optimization methods for problems where forming the Hessian is computationally challenging.
We leverage randomized sketches for reducing the problem dimensions as well as preserving privacy and improving straggler resilience in asynchronous distributed systems.
arXiv Detail & Related papers (2022-03-18T05:49:13Z) - On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD)
We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting.
We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z) - High-dimensional regression with potential prior information on variable
importance [0.0]
We propose a simple scheme involving fitting a sequence of models indicated by the ordering.
We show that the computational cost for fitting all models when ridge regression is used is no more than for a single fit of ridge regression.
We describe a strategy for Lasso regression that makes use of previous fits to greatly speed up fitting the entire sequence of models.
arXiv Detail & Related papers (2021-09-23T10:34:37Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z) - Rao-Blackwellizing the Straight-Through Gumbel-Softmax Gradient
Estimator [93.05919133288161]
We show that the variance of the straight-through variant of the popular Gumbel-Softmax estimator can be reduced through Rao-Blackwellization.
This provably reduces the mean squared error.
We empirically demonstrate that this leads to variance reduction, faster convergence, and generally improved performance in two unsupervised latent variable models.
arXiv Detail & Related papers (2020-10-09T22:54:38Z) - Gaussian MRF Covariance Modeling for Efficient Black-Box Adversarial
Attacks [86.88061841975482]
We study the problem of generating adversarial examples in a black-box setting, where we only have access to a zeroth order oracle.
We use this setting to find fast one-step adversarial attacks, akin to a black-box version of the Fast Gradient Sign Method(FGSM)
We show that the method uses fewer queries and achieves higher attack success rates than the current state of the art.
arXiv Detail & Related papers (2020-10-08T18:36:51Z) - Variance Regularization for Accelerating Stochastic Optimization [14.545770519120898]
We propose a universal principle which reduces the random error accumulation by exploiting statistic information hidden in mini-batch gradients.
This is achieved by regularizing the learning-rate according to mini-batch variances.
arXiv Detail & Related papers (2020-08-13T15:34:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.