Related papers: Linear Convergence of Generalized Mirror Descent with Time-Dependent Mirrors

Linear Convergence of Generalized Mirror Descent with Time-Dependent Mirrors

URL: http://arxiv.org/abs/2009.08574v2
Date: Wed, 6 Oct 2021 15:09:20 GMT
Title: Linear Convergence of Generalized Mirror Descent with Time-Dependent Mirrors
Authors: Adityanarayanan Radhakrishnan and Mikhail Belkin and Caroline Uhler
Abstract summary: We present a PL-based analysis for a generalization of mirror descent with a possibly time-dependent mirror. Our result establishes sufficient conditions and provides learning for convergence of mirror descent rates. For functions, our analysis implies existence of an interpolating solution of GMD to this solution.
Score: 23.738242876364865
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The Polyak-Lojasiewicz (PL) inequality is a sufficient condition for establishing linear convergence of gradient descent, even in non-convex settings. While several recent works use a PL-based analysis to establish linear convergence of stochastic gradient descent methods, the question remains as to whether a similar analysis can be conducted for more general optimization methods. In this work, we present a PL-based analysis for linear convergence of generalized mirror descent (GMD), a generalization of mirror descent with a possibly time-dependent mirror. GMD subsumes popular first order optimization methods including gradient descent, mirror descent, and preconditioned gradient descent methods such as Adagrad. Since the standard PL analysis cannot be extended naturally from GMD to stochastic GMD, we present a Taylor-series based analysis to establish sufficient conditions for linear convergence of stochastic GMD. As a corollary, our result establishes sufficient conditions and provides learning rates for linear convergence of stochastic mirror descent and Adagrad. Lastly, for functions that are locally PL*, our analysis implies existence of an interpolating solution and convergence of GMD to this solution.

Related papers

Entropic Mirror Descent for Linear Systems: Polyak's Stepsize and Implicit Bias [55.72269695392027]
This paper focuses on applying entropic mirror descent to solve linear systems.<n>The main challenge for the convergence analysis stems from the unboundedness of the domain.<n>To overcome this without imposing restrictive assumptions, we introduce a variant of Polyak-type stepsizes.
arXiv Detail & Related papers (2025-05-05T12:33:18Z)
Euclidean Distance Matrix Completion via Asymmetric Projected Gradient Descent [13.27202712518471]
This paper proposes and analyzes a gradient-type algorithm based on Burer-Monteiro factorization, called the Asymmetric Descented Gradient (APGD)
arXiv Detail & Related papers (2025-04-28T07:13:23Z)
Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training. We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z)
Distributionally Time-Varying Online Stochastic Optimization under Polyak-{\L}ojasiewicz Condition with Application in Conditional Value-at-Risk Statistical Learning [9.749745086213215]
We consider a sequence of optimization problems following a time-varying distribution via the lens of online optimization. We show that the framework can be applied to the Conditional Value-at-Risk (CVaR) learning problem.
arXiv Detail & Related papers (2023-09-18T00:47:08Z)
Curvature-Independent Last-Iterate Convergence for Games on Riemannian Manifolds [77.4346324549323]
We show that a step size agnostic to the curvature of the manifold achieves a curvature-independent and linear last-iterate convergence rate. To the best of our knowledge, the possibility of curvature-independent rates and/or last-iterate convergence has not been considered before.
arXiv Detail & Related papers (2023-06-29T01:20:44Z)
Mirror Descent with Relative Smoothness in Measure Spaces, with application to Sinkhorn and EM [11.007661197604065]
This paper studies the convergence of the mirror descent algorithm in an infinite-dimensional setting. Applying our result to joint distributions and the Kullback--Leibler divergence, we show that Sinkhorn's primal iterations for optimal transport correspond to a mirror descent.
arXiv Detail & Related papers (2022-06-17T16:19:47Z)
Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector Problems [98.34292831923335]
Motivated by the problem of online correlation analysis, we propose the emphStochastic Scaled-Gradient Descent (SSD) algorithm. We bring these ideas together in an application to online correlation analysis, deriving for the first time an optimal one-time-scale algorithm with an explicit rate of local convergence to normality.
arXiv Detail & Related papers (2021-12-29T18:46:52Z)
Leveraging Non-uniformity in First-order Non-convex Optimization [93.6817946818977]
Non-uniform refinement of objective functions leads to emphNon-uniform Smoothness (NS) and emphNon-uniform Lojasiewicz inequality (NL) New definitions inspire new geometry-aware first-order methods that converge to global optimality faster than the classical $Omega (1/t2)$ lower bounds.
arXiv Detail & Related papers (2021-05-13T04:23:07Z)
Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically. This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression. We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z)
Exact Linear Convergence Rate Analysis for Low-Rank Symmetric Matrix Completion via Gradient Descent [22.851500417035947]
Factorization-based gradient descent is a scalable and efficient algorithm for solving the factorrank matrix completion. We show that gradient descent enjoys fast convergence to estimate a solution of the global nature problem.
arXiv Detail & Related papers (2021-02-04T03:41:54Z)
Optimal Sample Complexity of Subgradient Descent for Amplitude Flow via Non-Lipschitz Matrix Concentration [12.989855325491163]
We consider the problem of recovering a real-valued $n$-dimensional signal from $m$ phaseless, linear measurements. We establish local convergence of subgradient descent with optimal sample complexity based on the uniform concentration of a random, discontinuous matrix-valued operator.
arXiv Detail & Related papers (2020-10-31T15:03:30Z)
Asymptotic Errors for Teacher-Student Convex Generalized Linear Models (or : How to Prove Kabashima's Replica Formula) [23.15629681360836]
We prove an analytical formula for the reconstruction performance of convex generalized linear models. We show that an analytical continuation may be carried out to extend the result to convex (non-strongly) problems. We illustrate our claim with numerical examples on mainstream learning methods.
arXiv Detail & Related papers (2020-06-11T16:26:35Z)
On the Convergence Rate of Projected Gradient Descent for a Back-Projection based Objective [58.33065918353532]
We consider a back-projection based fidelity term as an alternative to the common least squares (LS) We show that using the BP term, rather than the LS term, requires fewer iterations of optimization algorithms.
arXiv Detail & Related papers (2020-05-03T00:58:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.