Backward error analysis and the qualitative behaviour of stochastic
optimization algorithms: Application to stochastic coordinate descent
- URL: http://arxiv.org/abs/2309.02082v1
- Date: Tue, 5 Sep 2023 09:39:56 GMT
- Title: Backward error analysis and the qualitative behaviour of stochastic
optimization algorithms: Application to stochastic coordinate descent
- Authors: Stefano Di Giovacchino, Desmond J. Higham, Konstantinos Zygalakis
- Abstract summary: We propose a class of differential equations that approximate the dynamics of general optimization methods more closely than the original gradient flow.
We study the stability of the modified equation in the case of coordinate descent.
- Score: 1.534667887016089
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Stochastic optimization methods have been hugely successful in making
large-scale optimization problems feasible when computing the full gradient is
computationally prohibitive. Using the theory of modified equations for
numerical integrators, we propose a class of stochastic differential equations
that approximate the dynamics of general stochastic optimization methods more
closely than the original gradient flow. Analyzing a modified stochastic
differential equation can reveal qualitative insights about the associated
optimization method. Here, we study mean-square stability of the modified
equation in the case of stochastic coordinate descent.
Related papers
- Multi-level Monte-Carlo Gradient Methods for Stochastic Optimization with Biased Oracles [23.648702140754967]
We consider optimization when one only has to access biased oracles and obtain objective with low biases.
We show that biased gradient methods can reduce variance in the non-varied regime.
We also show that conditional optimization methods significantly improve best-known complexities in the literature for conditional optimization and risk optimization.
arXiv Detail & Related papers (2024-08-20T17:56:16Z) - On the Stochastic (Variance-Reduced) Proximal Gradient Method for Regularized Expected Reward Optimization [10.36447258513813]
We consider a regularized expected reward optimization problem in the non-oblivious setting that covers many existing problems in reinforcement learning (RL)
In particular, the method has shown to admit an $O(epsilon-4)$ sample to an $epsilon$-stationary point, under standard conditions.
Our analysis shows that the sample complexity can be improved from $O(epsilon-4)$ to $O(epsilon-3)$ under additional conditions.
arXiv Detail & Related papers (2024-01-23T06:01:29Z) - Stochastic Gradient Descent for Gaussian Processes Done Right [86.83678041846971]
We show that when emphdone right -- by which we mean using specific insights from optimisation and kernel communities -- gradient descent is highly effective.
We introduce a emphstochastic dual descent algorithm, explain its design in an intuitive manner and illustrate the design choices.
Our method places Gaussian process regression on par with state-of-the-art graph neural networks for molecular binding affinity prediction.
arXiv Detail & Related papers (2023-10-31T16:15:13Z) - Last-Iterate Convergence of Saddle-Point Optimizers via High-Resolution
Differential Equations [83.3201889218775]
Several widely-used first-order saddle-point optimization methods yield an identical continuous-time ordinary differential equation (ODE) when derived naively.
However, the convergence properties of these methods are qualitatively different, even on simple bilinear games.
We adopt a framework studied in fluid dynamics to design differential equation models for several saddle-point optimization methods.
arXiv Detail & Related papers (2021-12-27T18:31:34Z) - Amortized Implicit Differentiation for Stochastic Bilevel Optimization [53.12363770169761]
We study a class of algorithms for solving bilevel optimization problems in both deterministic and deterministic settings.
We exploit a warm-start strategy to amortize the estimation of the exact gradient.
By using this framework, our analysis shows these algorithms to match the computational complexity of methods that have access to an unbiased estimate of the gradient.
arXiv Detail & Related papers (2021-11-29T15:10:09Z) - A Stochastic Newton Algorithm for Distributed Convex Optimization [62.20732134991661]
We analyze a Newton algorithm for homogeneous distributed convex optimization, where each machine can calculate gradients of the same population objective.
We show that our method can reduce the number, and frequency, of required communication rounds compared to existing methods without hurting performance.
arXiv Detail & Related papers (2021-10-07T17:51:10Z) - Adaptive Sampling Quasi-Newton Methods for Zeroth-Order Stochastic
Optimization [1.7513645771137178]
We consider unconstrained optimization problems with no available gradient information.
We propose an adaptive sampling quasi-Newton method where we estimate the gradients of a simulation function using finite differences within a common random number framework.
We develop modified versions of a norm test and an inner product quasi-Newton test to control the sample sizes used in the approximations and provide global convergence results to the neighborhood of the optimal solution.
arXiv Detail & Related papers (2021-09-24T21:49:25Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - Robust, Accurate Stochastic Optimization for Variational Inference [68.83746081733464]
We show that common optimization methods lead to poor variational approximations if the problem is moderately large.
Motivated by these findings, we develop a more robust and accurate optimization framework by viewing the underlying algorithm as producing a Markov chain.
arXiv Detail & Related papers (2020-09-01T19:12:11Z) - Adaptive First-and Zeroth-order Methods for Weakly Convex Stochastic
Optimization Problems [12.010310883787911]
We analyze a new family of adaptive subgradient methods for solving an important class of weakly convex (possibly nonsmooth) optimization problems.
Experimental results indicate how the proposed algorithms empirically outperform its zerothorder gradient descent and its design variant.
arXiv Detail & Related papers (2020-05-19T07:44:52Z) - Scalable Gradients for Stochastic Differential Equations [40.70998833051251]
adjoint sensitivity method scalably computes gradients of ordinary differential equations.
We generalize this method to differential equations, allowing time-efficient and constant-memory computation.
We use our method to fit neural dynamics defined by networks, achieving competitive performance on a 50-dimensional motion capture dataset.
arXiv Detail & Related papers (2020-01-05T23:05:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.