Related papers: A Homogenization Approach for Gradient-Dominated Stochastic Optimization

A Homogenization Approach for Gradient-Dominated Stochastic Optimization

URL: http://arxiv.org/abs/2308.10630v3
Date: Wed, 29 May 2024 07:22:58 GMT
Title: A Homogenization Approach for Gradient-Dominated Stochastic Optimization
Authors: Jiyuan Tan, Chenyu Xue, Chuwen Zhang, Qi Deng, Dongdong Ge, Yinyu Ye,
Abstract summary: We propose a homogeneous second-order descent method (SHSOD) for functions enjoying gradient dominance. Our findings show that SHSODM matches the best-known sample complexity achieved by other second-order methods for gradient-dominated optimization.
Score: 6.1144486886258065
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Gradient dominance property is a condition weaker than strong convexity, yet sufficiently ensures global convergence even in non-convex optimization. This property finds wide applications in machine learning, reinforcement learning (RL), and operations management. In this paper, we propose the stochastic homogeneous second-order descent method (SHSODM) for stochastic functions enjoying gradient dominance property based on a recently proposed homogenization approach. Theoretically, we provide its sample complexity analysis, and further present an enhanced result by incorporating variance reduction techniques. Our findings show that SHSODM matches the best-known sample complexity achieved by other second-order methods for gradient-dominated stochastic optimization but without cubic regularization. Empirically, since the homogenization approach only relies on solving extremal eigenvector problem at each iteration instead of Newton-type system, our methods gain the advantage of cheaper computational cost and robustness in ill-conditioned problems. Numerical experiments on several RL tasks demonstrate the better performance of SHSODM compared to other off-the-shelf methods.

Related papers

Stochastic Optimization with Optimal Importance Sampling [49.484190237840714]
We propose an iterative-based algorithm that jointly updates the decision and the IS distribution without requiring time-scale separation between the two. Our method achieves the lowest possible variable variance and guarantees global convergence under convexity of the objective and mild assumptions on the IS distribution family.
arXiv Detail & Related papers (2025-04-04T16:10:18Z)
Diffusion Stochastic Optimization for Min-Max Problems [33.73046548872663]
The optimistic gradient method is useful in addressing minimax optimization problems. Motivated by the observation that the conventional version suffers from the need for a large batch size, we introduce and analyze a new formulation termed Samevareps-generativeOGOG.
arXiv Detail & Related papers (2024-01-26T01:16:59Z)
Stochastic Optimization for Non-convex Problem with Inexact Hessian Matrix, Gradient, and Function [99.31457740916815]
Trust-region (TR) and adaptive regularization using cubics have proven to have some very appealing theoretical properties. We show that TR and ARC methods can simultaneously provide inexact computations of the Hessian, gradient, and function values.
arXiv Detail & Related papers (2023-10-18T10:29:58Z)
Federated Conditional Stochastic Optimization [110.513884892319]
Conditional optimization has found in a wide range of machine learning tasks, such as in-variant learning tasks, AUPRC, andAML. This paper proposes algorithms for distributed federated learning.
arXiv Detail & Related papers (2023-10-04T01:47:37Z)
An Optimization-based Deep Equilibrium Model for Hyperspectral Image Deconvolution with Convergence Guarantees [71.57324258813675]
We propose a novel methodology for addressing the hyperspectral image deconvolution problem. A new optimization problem is formulated, leveraging a learnable regularizer in the form of a neural network. The derived iterative solver is then expressed as a fixed-point calculation problem within the Deep Equilibrium framework.
arXiv Detail & Related papers (2023-06-10T08:25:16Z)
Stochastic Second-Order Methods Provably Beat SGD For Gradient-Dominated Functions [42.57892322514602]
We show that SCRN improves the best-known sample complexity of gradient descent by a factor of $mathcalO(epsilon-1/2)$. We also show that the sample complexity of SCRN can be improved by a factor of $mathcalO(epsilon-1/2)$ using a variance reduction method with time-varying batch sizes.
arXiv Detail & Related papers (2022-05-25T15:33:00Z)
Last-Iterate Convergence of Saddle-Point Optimizers via High-Resolution Differential Equations [83.3201889218775]
Several widely-used first-order saddle-point optimization methods yield an identical continuous-time ordinary differential equation (ODE) when derived naively. However, the convergence properties of these methods are qualitatively different, even on simple bilinear games. We adopt a framework studied in fluid dynamics to design differential equation models for several saddle-point optimization methods.
arXiv Detail & Related papers (2021-12-27T18:31:34Z)
Momentum Accelerates the Convergence of Stochastic AUPRC Maximization [80.8226518642952]
We study optimization of areas under precision-recall curves (AUPRC), which is widely used for imbalanced tasks. We develop novel momentum methods with a better iteration of $O (1/epsilon4)$ for finding an $epsilon$stationary solution. We also design a novel family of adaptive methods with the same complexity of $O (1/epsilon4)$, which enjoy faster convergence in practice.
arXiv Detail & Related papers (2021-07-02T16:21:52Z)
Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information. We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z)
The Strength of Nesterov's Extrapolation in the Individual Convergence of Nonsmooth Optimization [0.0]
We prove that Nesterov's extrapolation has the strength to make the individual convergence of gradient descent methods optimal for nonsmooth problems. We give an extension of the derived algorithms to solve regularized learning tasks with nonsmooth losses in settings. Our method is applicable as an efficient tool for solving large-scale $l$1-regularized hinge-loss learning problems.
arXiv Detail & Related papers (2020-06-08T03:35:41Z)
Distributed Stochastic Nonconvex Optimization and Learning based on Successive Convex Approximation [26.11677569331688]
We introduce a novel framework for the distributed algorithmic minimization of the sum of the sum of the agents in a network. We show that the proposed method can be applied to distributed neural networks.
arXiv Detail & Related papers (2020-04-30T15:36:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.