On Convergence of Incremental Gradient for Non-Convex Smooth Functions
- URL: http://arxiv.org/abs/2305.19259v4
- Date: Mon, 12 Feb 2024 12:51:54 GMT
- Title: On Convergence of Incremental Gradient for Non-Convex Smooth Functions
- Authors: Anastasia Koloskova, Nikita Doikov, Sebastian U. Stich, Martin Jaggi
- Abstract summary: In machine learning and network optimization, algorithms like shuffle SGD are popular due to minimizing the number of misses and good cache.
This paper delves into the convergence properties SGD algorithms with arbitrary data ordering.
- Score: 63.51187646914962
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In machine learning and neural network optimization, algorithms like
incremental gradient, and shuffle SGD are popular due to minimizing the number
of cache misses and good practical convergence behavior. However, their
optimization properties in theory, especially for non-convex smooth functions,
remain incompletely explored.
This paper delves into the convergence properties of SGD algorithms with
arbitrary data ordering, within a broad framework for non-convex smooth
functions. Our findings show enhanced convergence guarantees for incremental
gradient and single shuffle SGD. Particularly if $n$ is the training set size,
we improve $n$ times the optimization term of convergence guarantee to reach
accuracy $\varepsilon$ from $O(n / \varepsilon)$ to $O(1 / \varepsilon)$.
Related papers
- Demystifying the Myths and Legends of Nonconvex Convergence of SGD [17.445810977264067]
gradient descent (SGD) and its variants are the main workhorses for solving large-scale optimization problems.
As our analyses, we addressed certain myths and legends related to the non convergence of the gradient.
arXiv Detail & Related papers (2023-10-19T17:58:59Z) - Gradient-free optimization of highly smooth functions: improved analysis
and a new algorithm [87.22224691317766]
This work studies problems with zero-order noisy oracle information under the assumption that the objective function is highly smooth.
We consider two kinds of zero-order projected gradient descent algorithms.
arXiv Detail & Related papers (2023-06-03T17:05:13Z) - Variance-reduced Clipping for Non-convex Optimization [24.765794811146144]
Gradient clipping is a technique used in deep learning applications such as large-scale language modeling.
Recent experimental training have a fairly special behavior in that it mitigates order complexity.
arXiv Detail & Related papers (2023-03-02T00:57:38Z) - Gauss-Newton Temporal Difference Learning with Nonlinear Function Approximation [11.925232472331494]
A Gauss-Newton Temporal Difference (GNTD) learning method is proposed to solve the Q-learning problem with nonlinear function approximation.
In each iteration, our method takes one Gauss-Newton (GN) step to optimize a variant of Mean-Squared Bellman Error (MSBE)
We validate our method via extensive experiments in several RL benchmarks, where GNTD exhibits both higher rewards and faster convergence than TD-type methods.
arXiv Detail & Related papers (2023-02-25T14:14:01Z) - Optimal Stochastic Non-smooth Non-convex Optimization through
Online-to-Non-convex Conversion [56.92236659731376]
We present new algorithms for optimizing non-known, non-smooth objectives based on a novel analysis technique.
For deterministic second-order smooth objectives, applying advanced optimistic online learning techniques enables a new $O(delta0.5) all$ to recover optimal or best-known results.
arXiv Detail & Related papers (2023-02-07T22:09:20Z) - Stochastic Zeroth order Descent with Structured Directions [10.604744518360464]
We introduce and analyze Structured Zeroth order Descent (SSZD), a finite difference approach that approximates a gradient on a set $lleq d directions, where $d is the dimension of the ambient space.
For convex convex we prove almost sure convergence of functions on $O( (d/l) k-c1/2$)$ for every $c1/2$, which is arbitrarily close to the one of the Gradient Descent (SGD) in terms of one number of iterations.
arXiv Detail & Related papers (2022-06-10T14:00:06Z) - Adaptive extra-gradient methods for min-max optimization and games [35.02879452114223]
We present a new family of minmax optimization algorithms that automatically exploit the geometry of the gradient data observed at earlier iterations.
Thanks to this adaptation mechanism, the proposed method automatically detects whether the problem is smooth or not.
It converges to an $varepsilon$-optimal solution within $mathcalO (1/varepsilon)$ iterations in smooth problems, and within $mathcalO (1/varepsilon)$ iterations in non-smooth ones.
arXiv Detail & Related papers (2020-10-22T22:54:54Z) - On the Almost Sure Convergence of Stochastic Gradient Descent in
Non-Convex Problems [75.58134963501094]
This paper analyzes the trajectories of gradient descent (SGD)
We show that SGD avoids saddle points/manifolds with $1$ for strict step-size policies.
arXiv Detail & Related papers (2020-06-19T14:11:26Z) - Gradient Free Minimax Optimization: Variance Reduction and Faster
Convergence [120.9336529957224]
In this paper, we denote the non-strongly setting on the magnitude of a gradient-free minimax optimization problem.
We show that a novel zeroth-order variance reduced descent algorithm achieves the best known query complexity.
arXiv Detail & Related papers (2020-06-16T17:55:46Z) - Complexity of Finding Stationary Points of Nonsmooth Nonconvex Functions [84.49087114959872]
We provide the first non-asymptotic analysis for finding stationary points of nonsmooth, nonsmooth functions.
In particular, we study Hadamard semi-differentiable functions, perhaps the largest class of nonsmooth functions.
arXiv Detail & Related papers (2020-02-10T23:23:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.