Stochastic Gradient Methods with Preconditioned Updates
- URL: http://arxiv.org/abs/2206.00285v2
- Date: Sun, 14 Jan 2024 17:18:32 GMT
- Title: Stochastic Gradient Methods with Preconditioned Updates
- Authors: Abdurakhmon Sadiev, Aleksandr Beznosikov, Abdulla Jasem Almansoori,
Dmitry Kamzolov, Rachael Tappenden, Martin Tak\'a\v{c}
- Abstract summary: There are several algorithms for such problems, but existing methods often work poorly when badly scaled and/or ill-conditioned.
Here we include preconditionimater based on Hutchinson's approach to approxing the diagonal Hessian.
We prove convergence both when smoothness and the PL condition are assumed.
- Score: 47.23741709751474
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work considers the non-convex finite sum minimization problem. There are
several algorithms for such problems, but existing methods often work poorly
when the problem is badly scaled and/or ill-conditioned, and a primary goal of
this work is to introduce methods that alleviate this issue. Thus, here we
include a preconditioner based on Hutchinson's approach to approximating the
diagonal of the Hessian, and couple it with several gradient-based methods to
give new scaled algorithms: Scaled SARAH and Scaled L-SVRG. Theoretical
complexity guarantees under smoothness assumptions are presented. We prove
linear convergence when both smoothness and the PL condition are assumed. Our
adaptively scaled methods use approximate partial second-order curvature
information and, therefore, can better mitigate the impact of badly scaled
problems. This improved practical performance is demonstrated in the numerical
experiments also presented in this work.
Related papers
- An Optimization-based Deep Equilibrium Model for Hyperspectral Image
Deconvolution with Convergence Guarantees [71.57324258813675]
We propose a novel methodology for addressing the hyperspectral image deconvolution problem.
A new optimization problem is formulated, leveraging a learnable regularizer in the form of a neural network.
The derived iterative solver is then expressed as a fixed-point calculation problem within the Deep Equilibrium framework.
arXiv Detail & Related papers (2023-06-10T08:25:16Z) - Faster One-Sample Stochastic Conditional Gradient Method for Composite
Convex Minimization [61.26619639722804]
We propose a conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms.
The proposed method, equipped with an average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques.
arXiv Detail & Related papers (2022-02-26T19:10:48Z) - A framework for bilevel optimization that enables stochastic and global
variance reduction algorithms [17.12280360174073]
Bilevel optimization is a problem of minimizing a value function which involves the arg-minimum of another function.
We introduce a novel framework, in which the solution of the inner problem, the solution of the linear system, and the main variable evolve at the same time.
We demonstrate that SABA, an adaptation of the celebrated SAGA algorithm in our framework, has $O(frac1T)$ convergence rate, and that it achieves linear convergence under Polyak-Lojasciewicz assumption.
arXiv Detail & Related papers (2022-01-31T18:17:25Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - A Scalable Second Order Method for Ill-Conditioned Matrix Completion
from Few Samples [0.0]
We propose an iterative algorithm for low-rank matrix completion.
It is able to complete very ill-conditioned matrices with a condition number of up to $10$ from few samples.
arXiv Detail & Related papers (2021-06-03T20:31:00Z) - AI-SARAH: Adaptive and Implicit Stochastic Recursive Gradient Methods [7.486132958737807]
We present an adaptive variance reduced method with an implicit approach for adaptivity.
We provide convergence guarantees for finite-sum minimization problems and show a faster convergence than SARAH can be achieved if local geometry permits.
This algorithm implicitly computes step-size and efficiently estimates local Lipschitz smoothness of functions.
arXiv Detail & Related papers (2021-02-19T01:17:15Z) - Conditional gradient methods for stochastically constrained convex
minimization [54.53786593679331]
We propose two novel conditional gradient-based methods for solving structured convex optimization problems.
The most important feature of our framework is that only a subset of the constraints is processed at each iteration.
Our algorithms rely on variance reduction and smoothing used in conjunction with conditional gradient steps, and are accompanied by rigorous convergence guarantees.
arXiv Detail & Related papers (2020-07-07T21:26:35Z) - Adaptive Gradient Methods Converge Faster with Over-Parameterization
(but you should do a line-search) [32.24244211281863]
We study a simplistic setting -- smooth, convex losses with models over- parameterized enough to interpolate the data.
We prove that AMSGrad with constant step-size and momentum converges to the minimizer at a faster $O(1/T)$ rate.
We show that these techniques improve the convergence and generalization of adaptive gradient methods across tasks.
arXiv Detail & Related papers (2020-06-11T21:23:30Z) - Effective Dimension Adaptive Sketching Methods for Faster Regularized
Least-Squares Optimization [56.05635751529922]
We propose a new randomized algorithm for solving L2-regularized least-squares problems based on sketching.
We consider two of the most popular random embeddings, namely, Gaussian embeddings and the Subsampled Randomized Hadamard Transform (SRHT)
arXiv Detail & Related papers (2020-06-10T15:00:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.