SPAN: A Stochastic Projected Approximate Newton Method
- URL: http://arxiv.org/abs/2002.03687v2
- Date: Tue, 3 Mar 2020 02:01:33 GMT
- Title: SPAN: A Stochastic Projected Approximate Newton Method
- Authors: Xunpeng Huang, Xianfeng Liang, Zhengyang Liu, Yitan Li, Linyun Yu, Yue
Yu, Lei Li
- Abstract summary: We propose SPAN, a novel approximate and fast Newton method to compute the inverse of the Hessian matrix.
SPAN outperforms existing first-order and second-order optimization methods in terms of the convergence wall-clock time.
- Score: 17.94221425332409
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Second-order optimization methods have desirable convergence properties.
However, the exact Newton method requires expensive computation for the Hessian
and its inverse. In this paper, we propose SPAN, a novel approximate and fast
Newton method. SPAN computes the inverse of the Hessian matrix via low-rank
approximation and stochastic Hessian-vector products. Our experiments on
multiple benchmark datasets demonstrate that SPAN outperforms existing
first-order and second-order optimization methods in terms of the convergence
wall-clock time. Furthermore, we provide a theoretical analysis of the
per-iteration complexity, the approximation error, and the convergence rate.
Both the theoretical analysis and experimental results show that our proposed
method achieves a better trade-off between the convergence rate and the
per-iteration efficiency.
Related papers
- Incremental Quasi-Newton Methods with Faster Superlinear Convergence
Rates [50.36933471975506]
We consider the finite-sum optimization problem, where each component function is strongly convex and has Lipschitz continuous gradient and Hessian.
The recently proposed incremental quasi-Newton method is based on BFGS update and achieves a local superlinear convergence rate.
This paper proposes a more efficient quasi-Newton method by incorporating the symmetric rank-1 update into the incremental framework.
arXiv Detail & Related papers (2024-02-04T05:54:51Z) - Fast Nonlinear Two-Time-Scale Stochastic Approximation: Achieving $O(1/k)$ Finite-Sample Complexity [2.5382095320488665]
This paper proposes to develop a new variant of the two-time-scale monotone approximation to find the roots of two coupled nonlinear operators.
Our key idea is to leverage the classic Ruppert-Polyak averaging technique to dynamically estimate the operators through their samples.
The estimated values of these averaging steps will then be used in the two-time-scale approximation updates to find the desired solution.
arXiv Detail & Related papers (2024-01-23T13:44:15Z) - Stochastic Optimization for Non-convex Problem with Inexact Hessian
Matrix, Gradient, and Function [99.31457740916815]
Trust-region (TR) and adaptive regularization using cubics have proven to have some very appealing theoretical properties.
We show that TR and ARC methods can simultaneously provide inexact computations of the Hessian, gradient, and function values.
arXiv Detail & Related papers (2023-10-18T10:29:58Z) - Robust empirical risk minimization via Newton's method [9.797319790710711]
A new variant of Newton's method for empirical risk minimization is studied.
The gradient and Hessian of the objective function are replaced by robust estimators.
An algorithm for obtaining robust Newton directions based on the conjugate gradient method is also proposed.
arXiv Detail & Related papers (2023-01-30T18:54:54Z) - A Unified Convergence Theorem for Stochastic Optimization Methods [4.94128206910124]
We provide a fundamental unified convergence theorem used for deriving convergence results for a series of unified optimization methods.
As a direct application, we recover almost sure convergence results under general settings.
arXiv Detail & Related papers (2022-06-08T14:01:42Z) - Stochastic Variance-Reduced Newton: Accelerating Finite-Sum Minimization
with Large Batches [22.925108493465363]
We propose a finite-sum minimization algorithm which enjoys all the benefits of second-order methods.
We show that SVRN can accelerate many second-order methods (such as Subsampled Newton) as well as iterative least squares solvers.
arXiv Detail & Related papers (2022-06-06T16:00:18Z) - A Stochastic Newton Algorithm for Distributed Convex Optimization [62.20732134991661]
We analyze a Newton algorithm for homogeneous distributed convex optimization, where each machine can calculate gradients of the same population objective.
We show that our method can reduce the number, and frequency, of required communication rounds compared to existing methods without hurting performance.
arXiv Detail & Related papers (2021-10-07T17:51:10Z) - Momentum Accelerates the Convergence of Stochastic AUPRC Maximization [80.8226518642952]
We study optimization of areas under precision-recall curves (AUPRC), which is widely used for imbalanced tasks.
We develop novel momentum methods with a better iteration of $O (1/epsilon4)$ for finding an $epsilon$stationary solution.
We also design a novel family of adaptive methods with the same complexity of $O (1/epsilon4)$, which enjoy faster convergence in practice.
arXiv Detail & Related papers (2021-07-02T16:21:52Z) - Bilevel Optimization: Convergence Analysis and Enhanced Design [63.64636047748605]
Bilevel optimization is a tool for many machine learning problems.
We propose a novel stoc-efficientgradient estimator named stoc-BiO.
arXiv Detail & Related papers (2020-10-15T18:09:48Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.