The rate of convergence of Bregman proximal methods: Local geometry vs.
regularity vs. sharpness
- URL: http://arxiv.org/abs/2211.08043v2
- Date: Wed, 2 Aug 2023 09:12:54 GMT
- Title: The rate of convergence of Bregman proximal methods: Local geometry vs.
regularity vs. sharpness
- Authors: Wa\"iss Azizian and Franck Iutzeler and J\'er\^ome Malick and
Panayotis Mertikopoulos
- Abstract summary: We show that the convergence rate of a given method depends sharply on its associated Legendre exponent.
In particular, we show that boundary solutions exhibit a stark separation between methods with a zero and non-zero Legendre exponent.
- Score: 33.48987613928269
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We examine the last-iterate convergence rate of Bregman proximal methods -
from mirror descent to mirror-prox and its optimistic variants - as a function
of the local geometry induced by the prox-mapping defining the method. For
generality, we focus on local solutions of constrained, non-monotone
variational inequalities, and we show that the convergence rate of a given
method depends sharply on its associated Legendre exponent, a notion that
measures the growth rate of the underlying Bregman function (Euclidean,
entropic, or other) near a solution. In particular, we show that boundary
solutions exhibit a stark separation of regimes between methods with a zero and
non-zero Legendre exponent: the former converge at a linear rate, while the
latter converge, in general, sublinearly. This dichotomy becomes even more
pronounced in linearly constrained problems where methods with entropic
regularization achieve a linear convergence rate along sharp directions,
compared to convergence in a finite number of steps under Euclidean
regularization.
Related papers
- A Unified Theory of Stochastic Proximal Point Methods without Smoothness [52.30944052987393]
Proximal point methods have attracted considerable interest owing to their numerical stability and robustness against imperfect tuning.
This paper presents a comprehensive analysis of a broad range of variations of the proximal point method (SPPM)
arXiv Detail & Related papers (2024-05-24T21:09:19Z) - Incremental Quasi-Newton Methods with Faster Superlinear Convergence
Rates [50.36933471975506]
We consider the finite-sum optimization problem, where each component function is strongly convex and has Lipschitz continuous gradient and Hessian.
The recently proposed incremental quasi-Newton method is based on BFGS update and achieves a local superlinear convergence rate.
This paper proposes a more efficient quasi-Newton method by incorporating the symmetric rank-1 update into the incremental framework.
arXiv Detail & Related papers (2024-02-04T05:54:51Z) - Online Learning Guided Curvature Approximation: A Quasi-Newton Method
with Global Non-Asymptotic Superlinear Convergence [22.286753988825048]
We present the first globally convergent quasi-Newton method with an explicit non-asymptotic superlinear convergence rate.
Unlike classical quasi-Newton methods, we build our algorithm upon the hybrid proximal extragradient method.
arXiv Detail & Related papers (2023-02-16T20:58:09Z) - Linear Convergence of Natural Policy Gradient Methods with Log-Linear
Policies [115.86431674214282]
We consider infinite-horizon discounted Markov decision processes and study the convergence rates of the natural policy gradient (NPG) and the Q-NPG methods with the log-linear policy class.
We show that both methods attain linear convergence rates and $mathcalO (1/epsilon2)$ sample complexities using a simple, non-adaptive geometrically increasing step size.
arXiv Detail & Related papers (2022-10-04T06:17:52Z) - On the Convergence Rates of Policy Gradient Methods [9.74841674275568]
We consider geometrically discounted dominance problems with finite state sub spaces.
We show that with direct gradient pararization in a sample we can analyze the general complexity of a gradient.
arXiv Detail & Related papers (2022-01-19T07:03:37Z) - On Asymptotic Linear Convergence of Projected Gradient Descent for
Constrained Least Squares [22.851500417035947]
This manuscript presents a unified framework for the analysis of projected gradient descent in the context of constrained least squares.
We present a recipe for the convergence analysis of PGD and demonstrate it via a beginning-to-end application of the recipe on four fundamental problems.
arXiv Detail & Related papers (2021-12-22T09:49:51Z) - The Last-Iterate Convergence Rate of Optimistic Mirror Descent in
Stochastic Variational Inequalities [29.0058976973771]
We show an intricate relation between the algorithm's rate of convergence and the local geometry induced by the method's underlying Bregman function.
We show that this exponent determines both the optimal step-size policy of the algorithm and the optimal rates attained.
arXiv Detail & Related papers (2021-07-05T09:54:47Z) - On the Convergence of Stochastic Extragradient for Bilinear Games with
Restarted Iteration Averaging [96.13485146617322]
We present an analysis of the ExtraGradient (SEG) method with constant step size, and present variations of the method that yield favorable convergence.
We prove that when augmented with averaging, SEG provably converges to the Nash equilibrium, and such a rate is provably accelerated by incorporating a scheduled restarting procedure.
arXiv Detail & Related papers (2021-06-30T17:51:36Z) - Optimal oracle inequalities for solving projected fixed-point equations [53.31620399640334]
We study methods that use a collection of random observations to compute approximate solutions by searching over a known low-dimensional subspace of the Hilbert space.
We show how our results precisely characterize the error of a class of temporal difference learning methods for the policy evaluation problem with linear function approximation.
arXiv Detail & Related papers (2020-12-09T20:19:32Z) - Iterative Pre-Conditioning for Expediting the Gradient-Descent Method:
The Distributed Linear Least-Squares Problem [0.966840768820136]
This paper considers the multi-agent linear least-squares problem in a server-agent network.
The goal for the agents is to compute a linear model that optimally fits the collective data points held by all the agents, without sharing their individual local data points.
We propose an iterative pre-conditioning technique that mitigates the deleterious effect of the conditioning of data points on the rate of convergence of the gradient-descent method.
arXiv Detail & Related papers (2020-08-06T20:01:18Z) - Random extrapolation for primal-dual coordinate descent [61.55967255151027]
We introduce a randomly extrapolated primal-dual coordinate descent method that adapts to sparsity of the data matrix and the favorable structures of the objective function.
We show almost sure convergence of the sequence and optimal sublinear convergence rates for the primal-dual gap and objective values, in the general convex-concave case.
arXiv Detail & Related papers (2020-07-13T17:39:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.