Unified Convergence Theory of Stochastic and Variance-Reduced Cubic
Newton Methods
- URL: http://arxiv.org/abs/2302.11962v2
- Date: Wed, 6 Sep 2023 14:38:35 GMT
- Title: Unified Convergence Theory of Stochastic and Variance-Reduced Cubic
Newton Methods
- Authors: El Mahdi Chayti and Nikita Doikov and Martin Jaggi
- Abstract summary: We propose a new framework, which we call the helper framework.
It provides a unified view of the variance and second-order algorithms equipped with global complexity guarantees.
- Score: 41.76752121302988
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study stochastic Cubic Newton methods for solving general possibly
non-convex minimization problems. We propose a new framework, which we call the
helper framework, that provides a unified view of the stochastic and
variance-reduced second-order algorithms equipped with global complexity
guarantees. It can also be applied to learning with auxiliary information. Our
helper framework offers the algorithm designer high flexibility for
constructing and analyzing the stochastic Cubic Newton methods, allowing
arbitrary size batches, and the use of noisy and possibly biased estimates of
the gradients and Hessians, incorporating both the variance reduction and the
lazy Hessian updates. We recover the best-known complexities for the stochastic
and variance-reduced Cubic Newton, under weak assumptions on the noise. A
direct consequence of our theory is the new lazy stochastic second-order
method, which significantly improves the arithmetic complexity for large
dimension problems. We also establish complexity bounds for the classes of
gradient-dominated objectives, that include convex and strongly convex
problems. For Auxiliary Learning, we show that using a helper (auxiliary
function) can outperform training alone if a given similarity measure is small.
Related papers
- An Adaptive Second-order Method for a Class of Nonconvex Nonsmooth Composite Optimization [11.56128809794923]
This paper explores a specific type of non sparsity-promoting regularization problems, namely those involving $ell_p-$ iterations of local property convergence.
arXiv Detail & Related papers (2024-07-24T12:15:59Z) - Optimal Algorithms for Stochastic Complementary Composite Minimization [55.26935605535377]
Inspired by regularization techniques in statistics and machine learning, we study complementary composite minimization.
We provide novel excess risk bounds, both in expectation and with high probability.
Our algorithms are nearly optimal, which we prove via novel lower complexity bounds for this class of problems.
arXiv Detail & Related papers (2022-11-03T12:40:24Z) - Stochastic Variance-Reduced Newton: Accelerating Finite-Sum Minimization
with Large Batches [22.925108493465363]
We propose a finite-sum minimization algorithm which enjoys all the benefits of second-order methods.
We show that SVRN can accelerate many second-order methods (such as Subsampled Newton) as well as iterative least squares solvers.
arXiv Detail & Related papers (2022-06-06T16:00:18Z) - A framework for bilevel optimization that enables stochastic and global
variance reduction algorithms [17.12280360174073]
Bilevel optimization is a problem of minimizing a value function which involves the arg-minimum of another function.
We introduce a novel framework, in which the solution of the inner problem, the solution of the linear system, and the main variable evolve at the same time.
We demonstrate that SABA, an adaptation of the celebrated SAGA algorithm in our framework, has $O(frac1T)$ convergence rate, and that it achieves linear convergence under Polyak-Lojasciewicz assumption.
arXiv Detail & Related papers (2022-01-31T18:17:25Z) - Simple Stochastic and Online Gradient DescentAlgorithms for Pairwise
Learning [65.54757265434465]
Pairwise learning refers to learning tasks where the loss function depends on a pair instances.
Online descent (OGD) is a popular approach to handle streaming data in pairwise learning.
In this paper, we propose simple and online descent to methods for pairwise learning.
arXiv Detail & Related papers (2021-11-23T18:10:48Z) - A Boosting Approach to Reinforcement Learning [59.46285581748018]
We study efficient algorithms for reinforcement learning in decision processes whose complexity is independent of the number of states.
We give an efficient algorithm that is capable of improving the accuracy of such weak learning methods.
arXiv Detail & Related papers (2021-08-22T16:00:45Z) - Newton-LESS: Sparsification without Trade-offs for the Sketched Newton
Update [88.73437209862891]
In second-order optimization, a potential bottleneck can be computing the Hessian matrix of the optimized function at every iteration.
We show that the Gaussian sketching matrix can be drastically sparsified, significantly reducing the computational cost of sketching.
We prove that Newton-LESS enjoys nearly the same problem-independent local convergence rate as Gaussian embeddings.
arXiv Detail & Related papers (2021-07-15T17:33:05Z) - Stability and Generalization of Stochastic Gradient Methods for Minimax
Problems [71.60601421935844]
Many machine learning problems can be formulated as minimax problems such as Generative Adversarial Networks (GANs)
We provide a comprehensive generalization analysis of examples from training gradient methods for minimax problems.
arXiv Detail & Related papers (2021-05-08T22:38:00Z) - Hybrid Variance-Reduced SGD Algorithms For Nonconvex-Concave Minimax
Problems [26.24895953952318]
We develop an algorithm to solve a class of non-gence minimax problems.
They can also work with both single or two mini-batch derivatives.
arXiv Detail & Related papers (2020-06-27T03:05:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.