Related papers: Backward error analysis and the qualitative behaviour of stochastic optimization algorithms: Application to stochastic coordinate descent

Backward error analysis and the qualitative behaviour of stochastic optimization algorithms: Application to stochastic coordinate descent

URL: http://arxiv.org/abs/2309.02082v1
Date: Tue, 5 Sep 2023 09:39:56 GMT
Title: Backward error analysis and the qualitative behaviour of stochastic optimization algorithms: Application to stochastic coordinate descent
Authors: Stefano Di Giovacchino, Desmond J. Higham, Konstantinos Zygalakis
Abstract summary: We propose a class of differential equations that approximate the dynamics of general optimization methods more closely than the original gradient flow. We study the stability of the modified equation in the case of coordinate descent.
Score: 1.534667887016089
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Stochastic optimization methods have been hugely successful in making large-scale optimization problems feasible when computing the full gradient is computationally prohibitive. Using the theory of modified equations for numerical integrators, we propose a class of stochastic differential equations that approximate the dynamics of general stochastic optimization methods more closely than the original gradient flow. Analyzing a modified stochastic differential equation can reveal qualitative insights about the associated optimization method. Here, we study mean-square stability of the modified equation in the case of stochastic coordinate descent.

Related papers

Eliminating Ratio Bias for Gradient-based Simulated Parameter Estimation [0.7673339435080445]
This article addresses the challenge of parameter calibration in models where the likelihood function is not analytically available. We propose a gradient-based simulated parameter estimation framework, leveraging a multi-time scale that tackles the issue of ratio bias in both maximum likelihood estimation and posterior density estimation problems.
arXiv Detail & Related papers (2024-11-20T02:46:15Z)
Multi-level Monte-Carlo Gradient Methods for Stochastic Optimization with Biased Oracles [23.648702140754967]
We consider optimization when one only has to access biased oracles and obtain objective with low biases. We show that biased gradient methods can reduce variance in the non-varied regime. We also show that conditional optimization methods significantly improve best-known complexities in the literature for conditional optimization and risk optimization.
arXiv Detail & Related papers (2024-08-20T17:56:16Z)
On the Stochastic (Variance-Reduced) Proximal Gradient Method for Regularized Expected Reward Optimization [10.36447258513813]
We consider a regularized expected reward optimization problem in the non-oblivious setting that covers many existing problems in reinforcement learning (RL) In particular, the method has shown to admit an $O(epsilon-4)$ sample to an $epsilon$-stationary point, under standard conditions. Our analysis shows that the sample complexity can be improved from $O(epsilon-4)$ to $O(epsilon-3)$ under additional conditions.
arXiv Detail & Related papers (2024-01-23T06:01:29Z)
Stochastic Gradient Descent for Gaussian Processes Done Right [86.83678041846971]
We show that when emphdone right -- by which we mean using specific insights from optimisation and kernel communities -- gradient descent is highly effective. We introduce a emphstochastic dual descent algorithm, explain its design in an intuitive manner and illustrate the design choices. Our method places Gaussian process regression on par with state-of-the-art graph neural networks for molecular binding affinity prediction.
arXiv Detail & Related papers (2023-10-31T16:15:13Z)
Last-Iterate Convergence of Saddle-Point Optimizers via High-Resolution Differential Equations [83.3201889218775]
Several widely-used first-order saddle-point optimization methods yield an identical continuous-time ordinary differential equation (ODE) when derived naively. However, the convergence properties of these methods are qualitatively different, even on simple bilinear games. We adopt a framework studied in fluid dynamics to design differential equation models for several saddle-point optimization methods.
arXiv Detail & Related papers (2021-12-27T18:31:34Z)
Amortized Implicit Differentiation for Stochastic Bilevel Optimization [53.12363770169761]
We study a class of algorithms for solving bilevel optimization problems in both deterministic and deterministic settings. We exploit a warm-start strategy to amortize the estimation of the exact gradient. By using this framework, our analysis shows these algorithms to match the computational complexity of methods that have access to an unbiased estimate of the gradient.
arXiv Detail & Related papers (2021-11-29T15:10:09Z)
A Stochastic Newton Algorithm for Distributed Convex Optimization [62.20732134991661]
We analyze a Newton algorithm for homogeneous distributed convex optimization, where each machine can calculate gradients of the same population objective. We show that our method can reduce the number, and frequency, of required communication rounds compared to existing methods without hurting performance.
arXiv Detail & Related papers (2021-10-07T17:51:10Z)
Adaptive Sampling Quasi-Newton Methods for Zeroth-Order Stochastic Optimization [1.7513645771137178]
We consider unconstrained optimization problems with no available gradient information. We propose an adaptive sampling quasi-Newton method where we estimate the gradients of a simulation function using finite differences within a common random number framework. We develop modified versions of a norm test and an inner product quasi-Newton test to control the sample sizes used in the approximations and provide global convergence results to the neighborhood of the optimal solution.
arXiv Detail & Related papers (2021-09-24T21:49:25Z)
Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information. We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z)
Robust, Accurate Stochastic Optimization for Variational Inference [68.83746081733464]
We show that common optimization methods lead to poor variational approximations if the problem is moderately large. Motivated by these findings, we develop a more robust and accurate optimization framework by viewing the underlying algorithm as producing a Markov chain.
arXiv Detail & Related papers (2020-09-01T19:12:11Z)
Adaptive First-and Zeroth-order Methods for Weakly Convex Stochastic Optimization Problems [12.010310883787911]
We analyze a new family of adaptive subgradient methods for solving an important class of weakly convex (possibly nonsmooth) optimization problems. Experimental results indicate how the proposed algorithms empirically outperform its zerothorder gradient descent and its design variant.
arXiv Detail & Related papers (2020-05-19T07:44:52Z)
Scalable Gradients for Stochastic Differential Equations [40.70998833051251]
adjoint sensitivity method scalably computes gradients of ordinary differential equations. We generalize this method to differential equations, allowing time-efficient and constant-memory computation. We use our method to fit neural dynamics defined by networks, achieving competitive performance on a 50-dimensional motion capture dataset.
arXiv Detail & Related papers (2020-01-05T23:05:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.