A Mirror Descent Perspective of Smoothed Sign Descent
- URL: http://arxiv.org/abs/2410.14158v1
- Date: Fri, 18 Oct 2024 03:52:21 GMT
- Title: A Mirror Descent Perspective of Smoothed Sign Descent
- Authors: Shuyang Wang, Diego Klabjan,
- Abstract summary: We study the dynamics of smoothed sign descent with a stability constant $varepsilon$ for regression problems.
By studying dual dynamics, we characterize the convergent solution as an approximate KKT point of minimizing a Bregman divergence style function.
- Score: 14.205909074145598
- License:
- Abstract: Recent work by Woodworth et al. (2020) shows that the optimization dynamics of gradient descent for overparameterized problems can be viewed as low-dimensional dual dynamics induced by a mirror map, explaining the implicit regularization phenomenon from the mirror descent perspective. However, the methodology does not apply to algorithms where update directions deviate from true gradients, such as ADAM. We use the mirror descent framework to study the dynamics of smoothed sign descent with a stability constant $\varepsilon$ for regression problems. We propose a mirror map that establishes equivalence to dual dynamics under some assumptions. By studying dual dynamics, we characterize the convergent solution as an approximate KKT point of minimizing a Bregman divergence style function, and show the benefit of tuning the stability constant $\varepsilon$ to reduce the KKT error.
Related papers
- A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks.
We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks.
Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z) - On Learning Gaussian Multi-index Models with Gradient Flow [57.170617397894404]
We study gradient flow on the multi-index regression problem for high-dimensional Gaussian data.
We consider a two-timescale algorithm, whereby the low-dimensional link function is learnt with a non-parametric model infinitely faster than the subspace parametrizing the low-rank projection.
arXiv Detail & Related papers (2023-10-30T17:55:28Z) - Implicit Bias of Gradient Descent on Reparametrized Models: On
Equivalence to Mirror Descent [64.26008239544085]
gradient flow with any commuting parametrization is equivalent to continuous mirror descent with a related Legendre function.
continuous mirror descent with any Legendre function can be viewed as gradient flow with a related commuting parametrization.
arXiv Detail & Related papers (2022-07-08T17:47:11Z) - Mirror Descent with Relative Smoothness in Measure Spaces, with
application to Sinkhorn and EM [11.007661197604065]
This paper studies the convergence of the mirror descent algorithm in an infinite-dimensional setting.
Applying our result to joint distributions and the Kullback--Leibler divergence, we show that Sinkhorn's primal iterations for optimal transport correspond to a mirror descent.
arXiv Detail & Related papers (2022-06-17T16:19:47Z) - Beyond the Edge of Stability via Two-step Gradient Updates [49.03389279816152]
Gradient Descent (GD) is a powerful workhorse of modern machine learning.
GD's ability to find local minimisers is only guaranteed for losses with Lipschitz gradients.
This work focuses on simple, yet representative, learning problems via analysis of two-step gradient updates.
arXiv Detail & Related papers (2022-06-08T21:32:50Z) - Non-convex online learning via algorithmic equivalence [30.038975353298117]
algorithmic equivalence technique non gradient descent convex mirror descent theory is presented.
Our analysis is based on a new simple algorithmic method prove an $frac23$.
arXiv Detail & Related papers (2022-05-30T16:50:34Z) - Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with
Variance Reduction and its Application to Optimization [50.83356836818667]
gradient Langevin Dynamics is one of the most fundamental algorithms to solve non-eps optimization problems.
In this paper, we show two variants of this kind, namely the Variance Reduced Langevin Dynamics and the Recursive Gradient Langevin Dynamics.
arXiv Detail & Related papers (2022-03-30T11:39:00Z) - On the Role of Optimization in Double Descent: A Least Squares Study [30.44215064390409]
We show an excess risk bound for the descent gradient solution of the least squares objective.
We find that in case of noiseless regression, double descent is explained solely by optimization-related quantities.
We empirically explore if our predictions hold for neural networks.
arXiv Detail & Related papers (2021-07-27T09:13:11Z) - A Continuous-Time Mirror Descent Approach to Sparse Phase Retrieval [24.17778927729799]
We analyze continuous-time mirror applied to sparse phase retrieval.
It is the problem of recovering sparse signals from a set of only measurements.
We provide a convergence analysis algorithm for this problem.
arXiv Detail & Related papers (2020-10-20T10:03:44Z) - On regularization of gradient descent, layer imbalance and flat minima [9.08659783613403]
We analyze the training dynamics for deep linear networks using a new metric - imbalance - which defines the flatness of a solution.
We demonstrate that different regularization methods, such as weight decay or noise data augmentation, behave in a similar way.
arXiv Detail & Related papers (2020-07-18T00:09:14Z) - Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem.
We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent.
Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.