Convexity of Optimization Curves: Local Sharp Thresholds, Robustness Impossibility, and New Counterexamples
- URL: http://arxiv.org/abs/2509.08954v1
- Date: Wed, 10 Sep 2025 19:28:47 GMT
- Title: Convexity of Optimization Curves: Local Sharp Thresholds, Robustness Impossibility, and New Counterexamples
- Authors: Le Duc Hieu,
- Abstract summary: We study when the emphoptimization curve of first-order methods is convex.<n>For gradient descent on convex $L$--smooth functions, the curve is convex for all stepsizes $eta le 1.75/L$, and this threshold is tight.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study when the \emph{optimization curve} of first--order methods -- the sequence \${f(x\_n)}*{n\ge0}\$ produced by constant--stepsize iterations -- is convex, equivalently when the forward differences \$f(x\_n)-f(x*{n+1})\$ are nonincreasing. For gradient descent (GD) on convex \$L\$--smooth functions, the curve is convex for all stepsizes \$\eta \le 1.75/L\$, and this threshold is tight. Moreover, gradient norms are nonincreasing for all \$\eta \le 2/L\$, and in continuous time (gradient flow) the curve is always convex. These results complement and refine the classical smooth convex optimization toolbox, connecting discrete and continuous dynamics as well as worst--case analyses.
Related papers
- Complexity Bounds for Smooth Convex Multiobjective Optimization [0.0]
We study the oracle complexity of finding $varepsilon$-Pareto stationary points in smooth multiobjective optimization with $m$ objectives.<n>For strongly convex objectives, any span first-order method exhibits linear convergence no faster than $exp(-Theta(T/sqrtkappa)$ after $T$ oracle calls.<n>For convex problems and general span methods with adaptive scalarizations, we establish a universal lower bound of order $1/T2$ on the gradient norm of the final iterate after $T$ steps.
arXiv Detail & Related papers (2025-09-16T21:33:11Z) - Methods for Convex $(L_0,L_1)$-Smooth Optimization: Clipping, Acceleration, and Adaptivity [50.25258834153574]
We focus on the class of (strongly) convex $(L0)$-smooth functions and derive new convergence guarantees for several existing methods.<n>In particular, we derive improved convergence rates for Gradient Descent with smoothnessed Gradient Clipping and for Gradient Descent with Polyak Stepsizes.
arXiv Detail & Related papers (2024-09-23T13:11:37Z) - Two-Timescale Gradient Descent Ascent Algorithms for Nonconvex Minimax Optimization [77.3396841985172]
We provide a unified analysis of two-timescale gradient ascent (TTGDA) for solving structured non minimax optimization problems.<n>Our contribution is to design TTGDA algorithms are effective beyond the setting.
arXiv Detail & Related papers (2024-08-21T20:14:54Z) - A qualitative difference between gradient flows of convex functions in
finite- and infinite-dimensional Hilbert spaces [2.7195102129095003]
We consider gradient flow/gradient descent and heavy ball/accelerated gradient descent optimization for convex objective functions.
In Hilbert spaces, this is optimal: $f(x_t) - inf f$ can decay to $0$ as slowly as any given function which is monotone decreasing and integrable at $infty$.
arXiv Detail & Related papers (2023-10-26T17:33:52Z) - Revisiting Subgradient Method: Complexity and Convergence Beyond Lipschitz Continuity [24.45688490844496]
Subgradient method is one of the most fundamental algorithmic schemes for nonsmooth optimization.
In this work, we first extend the typical iteration complexity results for the subgradient method to cover non-Lipschitz convex and weakly convex minimization.
arXiv Detail & Related papers (2023-05-23T15:26:36Z) - Improved Dynamic Regret for Online Frank-Wolfe [54.690867216880356]
We investigate the dynamic regret of online Frank-Wolfe (OFW), which is an efficient projection-free algorithm for online convex optimization.
In this paper, we derive improved dynamic regret bounds for OFW by extending the fast convergence rates of FW from offline optimization to online optimization.
arXiv Detail & Related papers (2023-02-11T07:19:51Z) - Optimal Extragradient-Based Bilinearly-Coupled Saddle-Point Optimization [116.89941263390769]
We consider the smooth convex-concave bilinearly-coupled saddle-point problem, $min_mathbfxmax_mathbfyF(mathbfx) + H(mathbfx,mathbfy)$, where one has access to first-order oracles for $F$, $G$ as well as the bilinear coupling function $H$.
We present a emphaccelerated gradient-extragradient (AG-EG) descent-ascent algorithm that combines extragrad
arXiv Detail & Related papers (2022-06-17T06:10:20Z) - Thinking Outside the Ball: Optimal Learning with Gradient Descent for
Generalized Linear Stochastic Convex Optimization [37.177329562964765]
We consider linear prediction with a convex Lipschitz loss, or more generally, convex optimization problems of generalized linear form.
We show that in this setting, early iteration stopped Gradient Descent (GD), without any explicit regularization or projection, ensures excess error at most $epsilon$.
But instead of uniform convergence in a norm ball, which we show can guarantee suboptimal learning using $Theta (1/epsilon4)$ samples, we rely on uniform convergence in a distribution-dependent ball.
arXiv Detail & Related papers (2022-02-27T09:41:43Z) - Accelerated Primal-Dual Gradient Method for Smooth and Convex-Concave
Saddle-Point Problems with Bilinear Coupling [84.47780064014262]
We study a linear convex-concave saddle-point problem $min_xmax_y f(x) ytopmathbfA x - g(y)
arXiv Detail & Related papers (2021-12-30T20:31:46Z) - A first-order primal-dual method with adaptivity to local smoothness [64.62056765216386]
We consider the problem of finding a saddle point for the convex-concave objective $min_x max_y f(x) + langle Ax, yrangle - g*(y)$, where $f$ is a convex function with locally Lipschitz gradient and $g$ is convex and possibly non-smooth.
We propose an adaptive version of the Condat-Vu algorithm, which alternates between primal gradient steps and dual steps.
arXiv Detail & Related papers (2021-10-28T14:19:30Z) - Accelerated Gradient Tracking over Time-varying Graphs for Decentralized Optimization [59.65871549878937]
We prove that the practical single loop accelerated gradient tracking needs $O(fracgamma1-sigma_gamma)2sqrtfracLepsilon)$.<n>Our convergence rates improve significantly over the ones of $O(frac1epsilon5/7)$ and $O(fracLmu)5/7frac1 (1-sigma)1.5logfrac1epsilon)$.
arXiv Detail & Related papers (2021-04-06T15:34:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.