Linear Convergence of Generalized Mirror Descent with Time-Dependent
Mirrors
- URL: http://arxiv.org/abs/2009.08574v2
- Date: Wed, 6 Oct 2021 15:09:20 GMT
- Title: Linear Convergence of Generalized Mirror Descent with Time-Dependent
Mirrors
- Authors: Adityanarayanan Radhakrishnan and Mikhail Belkin and Caroline Uhler
- Abstract summary: We present a PL-based analysis for a generalization of mirror descent with a possibly time-dependent mirror.
Our result establishes sufficient conditions and provides learning for convergence of mirror descent rates.
For functions, our analysis implies existence of an interpolating solution of GMD to this solution.
- Score: 23.738242876364865
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Polyak-Lojasiewicz (PL) inequality is a sufficient condition for
establishing linear convergence of gradient descent, even in non-convex
settings. While several recent works use a PL-based analysis to establish
linear convergence of stochastic gradient descent methods, the question remains
as to whether a similar analysis can be conducted for more general optimization
methods. In this work, we present a PL-based analysis for linear convergence of
generalized mirror descent (GMD), a generalization of mirror descent with a
possibly time-dependent mirror. GMD subsumes popular first order optimization
methods including gradient descent, mirror descent, and preconditioned gradient
descent methods such as Adagrad. Since the standard PL analysis cannot be
extended naturally from GMD to stochastic GMD, we present a Taylor-series based
analysis to establish sufficient conditions for linear convergence of
stochastic GMD. As a corollary, our result establishes sufficient conditions
and provides learning rates for linear convergence of stochastic mirror descent
and Adagrad. Lastly, for functions that are locally PL*, our analysis implies
existence of an interpolating solution and convergence of GMD to this solution.
Related papers
- Entropic Mirror Descent for Linear Systems: Polyak's Stepsize and Implicit Bias [55.72269695392027]
This paper focuses on applying entropic mirror descent to solve linear systems.<n>The main challenge for the convergence analysis stems from the unboundedness of the domain.<n>To overcome this without imposing restrictive assumptions, we introduce a variant of Polyak-type stepsizes.
arXiv Detail & Related papers (2025-05-05T12:33:18Z) - Euclidean Distance Matrix Completion via Asymmetric Projected Gradient Descent [13.27202712518471]
This paper proposes and analyzes a gradient-type algorithm based on Burer-Monteiro factorization, called the Asymmetric Descented Gradient (APGD)
arXiv Detail & Related papers (2025-04-28T07:13:23Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Distributionally Time-Varying Online Stochastic Optimization under
Polyak-{\L}ojasiewicz Condition with Application in Conditional Value-at-Risk
Statistical Learning [9.749745086213215]
We consider a sequence of optimization problems following a time-varying distribution via the lens of online optimization.
We show that the framework can be applied to the Conditional Value-at-Risk (CVaR) learning problem.
arXiv Detail & Related papers (2023-09-18T00:47:08Z) - Curvature-Independent Last-Iterate Convergence for Games on Riemannian
Manifolds [77.4346324549323]
We show that a step size agnostic to the curvature of the manifold achieves a curvature-independent and linear last-iterate convergence rate.
To the best of our knowledge, the possibility of curvature-independent rates and/or last-iterate convergence has not been considered before.
arXiv Detail & Related papers (2023-06-29T01:20:44Z) - Mirror Descent with Relative Smoothness in Measure Spaces, with
application to Sinkhorn and EM [11.007661197604065]
This paper studies the convergence of the mirror descent algorithm in an infinite-dimensional setting.
Applying our result to joint distributions and the Kullback--Leibler divergence, we show that Sinkhorn's primal iterations for optimal transport correspond to a mirror descent.
arXiv Detail & Related papers (2022-06-17T16:19:47Z) - Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector
Problems [98.34292831923335]
Motivated by the problem of online correlation analysis, we propose the emphStochastic Scaled-Gradient Descent (SSD) algorithm.
We bring these ideas together in an application to online correlation analysis, deriving for the first time an optimal one-time-scale algorithm with an explicit rate of local convergence to normality.
arXiv Detail & Related papers (2021-12-29T18:46:52Z) - Leveraging Non-uniformity in First-order Non-convex Optimization [93.6817946818977]
Non-uniform refinement of objective functions leads to emphNon-uniform Smoothness (NS) and emphNon-uniform Lojasiewicz inequality (NL)
New definitions inspire new geometry-aware first-order methods that converge to global optimality faster than the classical $Omega (1/t2)$ lower bounds.
arXiv Detail & Related papers (2021-05-13T04:23:07Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z) - Exact Linear Convergence Rate Analysis for Low-Rank Symmetric Matrix
Completion via Gradient Descent [22.851500417035947]
Factorization-based gradient descent is a scalable and efficient algorithm for solving the factorrank matrix completion.
We show that gradient descent enjoys fast convergence to estimate a solution of the global nature problem.
arXiv Detail & Related papers (2021-02-04T03:41:54Z) - Optimal Sample Complexity of Subgradient Descent for Amplitude Flow via
Non-Lipschitz Matrix Concentration [12.989855325491163]
We consider the problem of recovering a real-valued $n$-dimensional signal from $m$ phaseless, linear measurements.
We establish local convergence of subgradient descent with optimal sample complexity based on the uniform concentration of a random, discontinuous matrix-valued operator.
arXiv Detail & Related papers (2020-10-31T15:03:30Z) - Asymptotic Errors for Teacher-Student Convex Generalized Linear Models
(or : How to Prove Kabashima's Replica Formula) [23.15629681360836]
We prove an analytical formula for the reconstruction performance of convex generalized linear models.
We show that an analytical continuation may be carried out to extend the result to convex (non-strongly) problems.
We illustrate our claim with numerical examples on mainstream learning methods.
arXiv Detail & Related papers (2020-06-11T16:26:35Z) - On the Convergence Rate of Projected Gradient Descent for a
Back-Projection based Objective [58.33065918353532]
We consider a back-projection based fidelity term as an alternative to the common least squares (LS)
We show that using the BP term, rather than the LS term, requires fewer iterations of optimization algorithms.
arXiv Detail & Related papers (2020-05-03T00:58:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.