A qualitative difference between gradient flows of convex functions in
finite- and infinite-dimensional Hilbert spaces
- URL: http://arxiv.org/abs/2310.17610v1
- Date: Thu, 26 Oct 2023 17:33:52 GMT
- Title: A qualitative difference between gradient flows of convex functions in
finite- and infinite-dimensional Hilbert spaces
- Authors: Jonathan W. Siegel, Stephan Wojtowytsch
- Abstract summary: We consider gradient flow/gradient descent and heavy ball/accelerated gradient descent optimization for convex objective functions.
In Hilbert spaces, this is optimal: $f(x_t) - inf f$ can decay to $0$ as slowly as any given function which is monotone decreasing and integrable at $infty$.
- Score: 2.7195102129095003
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider gradient flow/gradient descent and heavy ball/accelerated
gradient descent optimization for convex objective functions. In the gradient
flow case, we prove the following:
1. If $f$ does not have a minimizer, the convergence $f(x_t)\to \inf f$ can
be arbitrarily slow.
2. If $f$ does have a minimizer, the excess energy $f(x_t) - \inf f$ is
integrable/summable in time. In particular, $f(x_t) - \inf f = o(1/t)$ as
$t\to\infty$.
3. In Hilbert spaces, this is optimal: $f(x_t) - \inf f$ can decay to $0$ as
slowly as any given function which is monotone decreasing and integrable at
$\infty$, even for a fixed quadratic objective.
4. In finite dimension (or more generally, for all gradient flow curves of
finite length), this is not optimal: We prove that there are convex monotone
decreasing integrable functions $g(t)$ which decrease to zero slower than
$f(x_t)-\inf f$ for the gradient flow of any convex function on $\mathbb R^d$.
For instance, we show that any gradient flow $x_t$ of a convex function $f$ in
finite dimension satisfies $\liminf_{t\to\infty} \big(t\cdot \log^2(t)\cdot
\big\{f(x_t) -\inf f\big\}\big)=0$.
This improves on the commonly reported $O(1/t)$ rate and provides a sharp
characterization of the energy decay law. We also note that it is impossible to
establish a rate $O(1/(t\phi(t))$ for any function $\phi$ which satisfies
$\lim_{t\to\infty}\phi(t) = \infty$, even asymptotically.
Similar results are obtained in related settings for (1) discrete time
gradient descent, (2) stochastic gradient descent with multiplicative noise and
(3) the heavy ball ODE. In the case of stochastic gradient descent, the
summability of $\mathbb E[f(x_n) - \inf f]$ is used to prove that $f(x_n)\to
\inf f$ almost surely - an improvement on the convergence almost surely up to a
subsequence which follows from the $O(1/n)$ decay estimate.
Related papers
- Efficient Continual Finite-Sum Minimization [52.5238287567572]
We propose a key twist into the finite-sum minimization, dubbed as continual finite-sum minimization.
Our approach significantly improves upon the $mathcalO(n/epsilon)$ FOs that $mathrmStochasticGradientDescent$ requires.
We also prove that there is no natural first-order method with $mathcalOleft(n/epsilonalpharight)$ complexity gradient for $alpha 1/4$, establishing that the first-order complexity of our method is nearly tight.
arXiv Detail & Related papers (2024-06-07T08:26:31Z) - On the $O(\frac{\sqrt{d}}{T^{1/4}})$ Convergence Rate of RMSProp and Its Momentum Extension Measured by $\ell_1$ Norm [59.65871549878937]
This paper considers the RMSProp and its momentum extension and establishes the convergence rate of $frac1Tsum_k=1T.
Our convergence rate matches the lower bound with respect to all the coefficients except the dimension $d$.
Our convergence rate can be considered to be analogous to the $frac1Tsum_k=1T.
arXiv Detail & Related papers (2024-02-01T07:21:32Z) - Accelerated Methods for Riemannian Min-Max Optimization Ensuring Bounded
Geometric Penalties [21.141544548229774]
We study the form $min_x max_y f(x, y) where $mathcalN$ are Hadamard.
We show global interest accelerated by reducing gradient convergence constants.
arXiv Detail & Related papers (2023-05-25T15:43:07Z) - Beyond Uniform Smoothness: A Stopped Analysis of Adaptive SGD [38.221784575853796]
This work considers the problem of finding first-order stationary point of a non atau function with potentially constant smoothness using a gradient.
We develop a technique that allows us to prove $mathcalO(fracmathrmpolylog(T)sigmatT)$ convergence rates without assuming uniform bounds on the noise.
arXiv Detail & Related papers (2023-02-13T18:13:36Z) - Faster Gradient-Free Algorithms for Nonsmooth Nonconvex Stochastic Optimization [20.54801745090522]
We consider the problem of the form $min_x in mathbbRd f(x) triangleq mathbbE_xi [Fxi]$inf(x)$ Lipschitz.
The recently proposed gradient-free method requires at most $mathcalO( L4 d3/2 epsilon-4 + Goldstein L d3/2 delta-1 epsilon-4)$ zeroth-order complexity
arXiv Detail & Related papers (2023-01-16T13:33:37Z) - ReSQueing Parallel and Private Stochastic Convex Optimization [59.53297063174519]
We introduce a new tool for BFG convex optimization (SCO): a Reweighted Query (ReSQue) estimator for the gradient of a function convolved with a (Gaussian) probability density.
We develop algorithms achieving state-of-the-art complexities for SCO in parallel and private settings.
arXiv Detail & Related papers (2023-01-01T18:51:29Z) - Accelerated Primal-Dual Gradient Method for Smooth and Convex-Concave
Saddle-Point Problems with Bilinear Coupling [84.47780064014262]
We study a linear convex-concave saddle-point problem $min_xmax_y f(x) ytopmathbfA x - g(y)
arXiv Detail & Related papers (2021-12-30T20:31:46Z) - A first-order primal-dual method with adaptivity to local smoothness [64.62056765216386]
We consider the problem of finding a saddle point for the convex-concave objective $min_x max_y f(x) + langle Ax, yrangle - g*(y)$, where $f$ is a convex function with locally Lipschitz gradient and $g$ is convex and possibly non-smooth.
We propose an adaptive version of the Condat-Vu algorithm, which alternates between primal gradient steps and dual steps.
arXiv Detail & Related papers (2021-10-28T14:19:30Z) - On the Self-Penalization Phenomenon in Feature Selection [69.16452769334367]
We describe an implicit sparsity-inducing mechanism based on over a family of kernels.
As an application, we use this sparsity-inducing mechanism to build algorithms consistent for feature selection.
arXiv Detail & Related papers (2021-10-12T09:36:41Z) - Convergence Rate of the (1+1)-Evolution Strategy with Success-Based
Step-Size Adaptation on Convex Quadratic Functions [20.666734673282498]
The (1+1)-evolution strategy (ES) with success-based step-size adaptation is analyzed on a general convex quadratic function.
The convergence rate of the (1+1)-ES is derived explicitly and rigorously on a general convex quadratic function.
arXiv Detail & Related papers (2021-03-02T09:03:44Z) - On the Convergence of Langevin Monte Carlo: The Interplay between Tail
Growth and Smoothness [10.482805367361818]
We show that for potentials with Lipschitz gradient, i.e. $beta=1$, our rate rate recovers the best known rate rate of dependency.
Our results are applicable to $nu_* = eff$ in target distribution $nu_*$ in KL-divergence.
arXiv Detail & Related papers (2020-05-27T00:26:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.