Proximal Subgradient Norm Minimization of ISTA and FISTA
- URL: http://arxiv.org/abs/2211.01610v1
- Date: Thu, 3 Nov 2022 06:50:19 GMT
- Title: Proximal Subgradient Norm Minimization of ISTA and FISTA
- Authors: Bowen Li, Bin Shi, Ya-xiang Yuan
- Abstract summary: We show that the squared proximal subgradient norm for the class of iterative shrinkage-thresholding algorithms converges at an inverse square rate.
We also show that the squared proximal subgradient norm for the class of faster iterative shrinkage-thresholding algorithms (FISTA) is accelerated to convergence at an inverse cubic rate.
- Score: 8.261388753972234
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: For first-order smooth optimization, the research on the acceleration
phenomenon has a long-time history. Until recently, the mechanism leading to
acceleration was not successfully uncovered by the gradient correction term and
its equivalent implicit-velocity form. Furthermore, based on the
high-resolution differential equation framework with the corresponding emerging
techniques, phase-space representation and Lyapunov function, the squared
gradient norm of Nesterov's accelerated gradient descent (\texttt{NAG}) method
at an inverse cubic rate is discovered. However, this result cannot be directly
generalized to composite optimization widely used in practice, e.g., the linear
inverse problem with sparse representation. In this paper, we meticulously
observe a pivotal inequality used in composite optimization about the step size
$s$ and the Lipschitz constant $L$ and find that it can be improved tighter. We
apply the tighter inequality discovered in the well-constructed Lyapunov
function and then obtain the proximal subgradient norm minimization by the
phase-space representation, regardless of gradient-correction or
implicit-velocity. Furthermore, we demonstrate that the squared proximal
subgradient norm for the class of iterative shrinkage-thresholding algorithms
(ISTA) converges at an inverse square rate, and the squared proximal
subgradient norm for the class of faster iterative shrinkage-thresholding
algorithms (FISTA) is accelerated to convergence at an inverse cubic rate.
Related papers
- An Accelerated Algorithm for Stochastic Bilevel Optimization under Unbounded Smoothness [15.656614304616006]
This paper investigates a class of bilevel optimization problems where the upper-level function is non- unbounded smoothness and the lower-level problem is strongly convex.
These problems have significant applications in data learning, such as text classification using neural networks.
arXiv Detail & Related papers (2024-09-28T02:30:44Z) - Gradient-Variation Online Learning under Generalized Smoothness [56.38427425920781]
gradient-variation online learning aims to achieve regret guarantees that scale with variations in gradients of online functions.
Recent efforts in neural network optimization suggest a generalized smoothness condition, allowing smoothness to correlate with gradient norms.
We provide the applications for fast-rate convergence in games and extended adversarial optimization.
arXiv Detail & Related papers (2024-08-17T02:22:08Z) - Directional Smoothness and Gradient Methods: Convergence and Adaptivity [16.779513676120096]
We develop new sub-optimality bounds for gradient descent that depend on the conditioning of the objective along the path of optimization.
Key to our proofs is directional smoothness, a measure of gradient variation that we use to develop upper-bounds on the objective.
We prove that the Polyak step-size and normalized GD obtain fast, path-dependent rates despite using no knowledge of the directional smoothness.
arXiv Detail & Related papers (2024-03-06T22:24:05Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Linear convergence of forward-backward accelerated algorithms without knowledge of the modulus of strong convexity [14.0409219811182]
We show that both Nesterov's accelerated gradient descent (NAG) and FISTA exhibit linear convergence for strongly convex functions.
We emphasize the distinctive approach employed in crafting the Lyapunov function, which involves a dynamically adapting coefficient of kinetic energy.
arXiv Detail & Related papers (2023-06-16T08:58:40Z) - Convex and Non-convex Optimization Under Generalized Smoothness [69.69521650503431]
An analysis of convex and non- optimization methods often requires the Lipsitzness gradient, which limits the analysis by this trajectorys.
Recent work generalizes the gradient setting via the non-uniform smoothness condition.
arXiv Detail & Related papers (2023-06-02T04:21:59Z) - Linear Convergence of ISTA and FISTA [8.261388753972234]
We revisit the class of iterative shrinkage-thresholding algorithms (ISTA) for solving the linear inverse problem with sparse representation.
We find that the previous assumption for the smooth part to be convex weakens the least-square model.
We generalize the linear convergence to composite optimization in both the objective value and the squared proximal subgradient norm.
arXiv Detail & Related papers (2022-12-13T02:02:50Z) - Revisiting the acceleration phenomenon via high-resolution differential
equations [6.53306151979817]
Nesterov's accelerated gradient descent (NAG) is one of the milestones in the history of first-order algorithms.
We investigate NAG for the $mu$-strongly convex function based on the techniques of Lyapunov analysis and phase-space representation.
We find that the high-resolution differential equation framework from the implicit-velocity scheme of NAG is perfect and outperforms the gradient-correction scheme.
arXiv Detail & Related papers (2022-12-12T04:36:37Z) - On the Convergence of Stochastic Extragradient for Bilinear Games with
Restarted Iteration Averaging [96.13485146617322]
We present an analysis of the ExtraGradient (SEG) method with constant step size, and present variations of the method that yield favorable convergence.
We prove that when augmented with averaging, SEG provably converges to the Nash equilibrium, and such a rate is provably accelerated by incorporating a scheduled restarting procedure.
arXiv Detail & Related papers (2021-06-30T17:51:36Z) - Hessian-Free High-Resolution Nesterov Acceleration for Sampling [55.498092486970364]
Nesterov's Accelerated Gradient (NAG) for optimization has better performance than its continuous time limit (noiseless kinetic Langevin) when a finite step-size is employed.
This work explores the sampling counterpart of this phenonemon and proposes a diffusion process, whose discretizations can yield accelerated gradient-based MCMC methods.
arXiv Detail & Related papers (2020-06-16T15:07:37Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.