Stochastic Variance-Reduced Newton: Accelerating Finite-Sum Minimization
with Large Batches
- URL: http://arxiv.org/abs/2206.02702v1
- Date: Mon, 6 Jun 2022 16:00:18 GMT
- Title: Stochastic Variance-Reduced Newton: Accelerating Finite-Sum Minimization
with Large Batches
- Authors: Micha{\l} Derezi\'nski
- Abstract summary: We propose a finite-sum minimization algorithm which enjoys all the benefits of second-order methods.
We show that SVRN can accelerate many second-order methods (such as Subsampled Newton) as well as iterative least squares solvers.
- Score: 22.925108493465363
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Stochastic variance reduction has proven effective at accelerating
first-order algorithms for solving convex finite-sum optimization tasks such as
empirical risk minimization. Incorporating additional second-order information
has proven helpful in further improving the performance of these first-order
methods. However, comparatively little is known about the benefits of using
variance reduction to accelerate popular stochastic second-order methods such
as Subsampled Newton. To address this, we propose Stochastic Variance-Reduced
Newton (SVRN), a finite-sum minimization algorithm which enjoys all the
benefits of second-order methods: simple unit step size, easily parallelizable
large-batch operations, and fast local convergence, while at the same time
taking advantage of variance reduction to achieve improved convergence rates
(per data pass) for smooth and strongly convex problems. We show that SVRN can
accelerate many stochastic second-order methods (such as Subsampled Newton) as
well as iterative least squares solvers (such as Iterative Hessian Sketch), and
it compares favorably to popular first-order methods with variance reduction.
Related papers
- Improving Stochastic Cubic Newton with Momentum [37.1630298053787]
We show that momentum provably stabilizes the variance of estimates.
Using our globalization technique, we prove a convergence point.
We also show convex Newtonized methods with momentum.
arXiv Detail & Related papers (2024-10-25T15:49:16Z) - Stochastic Optimization for Non-convex Problem with Inexact Hessian
Matrix, Gradient, and Function [99.31457740916815]
Trust-region (TR) and adaptive regularization using cubics have proven to have some very appealing theoretical properties.
We show that TR and ARC methods can simultaneously provide inexact computations of the Hessian, gradient, and function values.
arXiv Detail & Related papers (2023-10-18T10:29:58Z) - Fast Computation of Optimal Transport via Entropy-Regularized Extragradient Methods [75.34939761152587]
Efficient computation of the optimal transport distance between two distributions serves as an algorithm that empowers various applications.
This paper develops a scalable first-order optimization-based method that computes optimal transport to within $varepsilon$ additive accuracy.
arXiv Detail & Related papers (2023-01-30T15:46:39Z) - Hessian Averaging in Stochastic Newton Methods Achieves Superlinear
Convergence [69.65563161962245]
We consider a smooth and strongly convex objective function using a Newton method.
We show that there exists a universal weighted averaging scheme that transitions to local convergence at an optimal stage.
arXiv Detail & Related papers (2022-04-20T07:14:21Z) - Accelerated SGD for Non-Strongly-Convex Least Squares [14.010916616909743]
We consider approximation for the least squares regression problem in the non-strongly convex setting.
We present the first practical algorithm that achieves the optimal prediction error rates in terms of dependence on the noise of the problem.
arXiv Detail & Related papers (2022-03-03T14:39:33Z) - SCORE: Approximating Curvature Information under Self-Concordant
Regularization [0.0]
We propose a generalized Gauss-Newton with Self-Concordant Regularization (GGN-SCORE) algorithm that updates the minimization speed each time it receives a new input.
The proposed algorithm exploits the structure of the second-order information in the Hessian matrix, thereby reducing computational overhead.
arXiv Detail & Related papers (2021-12-14T13:03:04Z) - Newton-LESS: Sparsification without Trade-offs for the Sketched Newton
Update [88.73437209862891]
In second-order optimization, a potential bottleneck can be computing the Hessian matrix of the optimized function at every iteration.
We show that the Gaussian sketching matrix can be drastically sparsified, significantly reducing the computational cost of sketching.
We prove that Newton-LESS enjoys nearly the same problem-independent local convergence rate as Gaussian embeddings.
arXiv Detail & Related papers (2021-07-15T17:33:05Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z) - SPAN: A Stochastic Projected Approximate Newton Method [17.94221425332409]
We propose SPAN, a novel approximate and fast Newton method to compute the inverse of the Hessian matrix.
SPAN outperforms existing first-order and second-order optimization methods in terms of the convergence wall-clock time.
arXiv Detail & Related papers (2020-02-10T12:42:42Z) - Variance Reduction with Sparse Gradients [82.41780420431205]
Variance reduction methods such as SVRG and SpiderBoost use a mixture of large and small batch gradients.
We introduce a new sparsity operator: The random-top-k operator.
Our algorithm consistently outperforms SpiderBoost on various tasks including image classification, natural language processing, and sparse matrix factorization.
arXiv Detail & Related papers (2020-01-27T08:23:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.