Implicit Bias of Gradient Descent on Reparametrized Models: On
Equivalence to Mirror Descent
- URL: http://arxiv.org/abs/2207.04036v1
- Date: Fri, 8 Jul 2022 17:47:11 GMT
- Title: Implicit Bias of Gradient Descent on Reparametrized Models: On
Equivalence to Mirror Descent
- Authors: Zhiyuan Li, Tianhao Wang, JasonD. Lee, Sanjeev Arora
- Abstract summary: gradient flow with any commuting parametrization is equivalent to continuous mirror descent with a related Legendre function.
continuous mirror descent with any Legendre function can be viewed as gradient flow with a related commuting parametrization.
- Score: 64.26008239544085
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As part of the effort to understand implicit bias of gradient descent in
overparametrized models, several results have shown how the training trajectory
on the overparametrized model can be understood as mirror descent on a
different objective. The main result here is a characterization of this
phenomenon under a notion termed commuting parametrization, which encompasses
all the previous results in this setting. It is shown that gradient flow with
any commuting parametrization is equivalent to continuous mirror descent with a
related Legendre function. Conversely, continuous mirror descent with any
Legendre function can be viewed as gradient flow with a related commuting
parametrization. The latter result relies upon Nash's embedding theorem.
Related papers
- A Mirror Descent Perspective of Smoothed Sign Descent [14.205909074145598]
We study the dynamics of smoothed sign descent with a stability constant $varepsilon$ for regression problems.
By studying dual dynamics, we characterize the convergent solution as an approximate KKT point of minimizing a Bregman divergence style function.
arXiv Detail & Related papers (2024-10-18T03:52:21Z) - Intrinsic Bayesian Cramér-Rao Bound with an Application to Covariance Matrix Estimation [49.67011673289242]
This paper presents a new performance bound for estimation problems where the parameter to estimate lies in a smooth manifold.
It induces a geometry for the parameter manifold, as well as an intrinsic notion of the estimation error measure.
arXiv Detail & Related papers (2023-11-08T15:17:13Z) - On Learning Gaussian Multi-index Models with Gradient Flow [57.170617397894404]
We study gradient flow on the multi-index regression problem for high-dimensional Gaussian data.
We consider a two-timescale algorithm, whereby the low-dimensional link function is learnt with a non-parametric model infinitely faster than the subspace parametrizing the low-rank projection.
arXiv Detail & Related papers (2023-10-30T17:55:28Z) - Score-based Continuous-time Discrete Diffusion Models [102.65769839899315]
We extend diffusion models to discrete variables by introducing a Markov jump process where the reverse process denoises via a continuous-time Markov chain.
We show that an unbiased estimator can be obtained via simple matching the conditional marginal distributions.
We demonstrate the effectiveness of the proposed method on a set of synthetic and real-world music and image benchmarks.
arXiv Detail & Related papers (2022-11-30T05:33:29Z) - Stochastic Mirror Descent in Average Ensemble Models [38.38572705720122]
The mirror descent (SMD) is a general class of training algorithms, which includes the celebrated gradient descent (SGD) as a special case.
In this paper we explore the performance of the mirror potential algorithm on mean-field ensemble models.
arXiv Detail & Related papers (2022-10-27T11:04:00Z) - Provable Phase Retrieval with Mirror Descent [1.1662472705038338]
We consider the problem of phase retrieval, which consists of recovering an $n$-m real vector from the magnitude of its behaviour.
For two measurements, we show that when the number of measurements $n$ is enough, then with high probability, for almost all initializers, the original vector recovers up to a sign.
arXiv Detail & Related papers (2022-10-17T16:40:02Z) - The Equalization Losses: Gradient-Driven Training for Long-tailed Object
Recognition [84.51875325962061]
We propose a gradient-driven training mechanism to tackle the long-tail problem.
We introduce a new family of gradient-driven loss functions, namely equalization losses.
Our method consistently outperforms the baseline models.
arXiv Detail & Related papers (2022-10-11T16:00:36Z) - Mirror Descent with Relative Smoothness in Measure Spaces, with
application to Sinkhorn and EM [11.007661197604065]
This paper studies the convergence of the mirror descent algorithm in an infinite-dimensional setting.
Applying our result to joint distributions and the Kullback--Leibler divergence, we show that Sinkhorn's primal iterations for optimal transport correspond to a mirror descent.
arXiv Detail & Related papers (2022-06-17T16:19:47Z) - Implicit Regularization Properties of Variance Reduced Stochastic Mirror
Descent [7.00422423634143]
We prove that the discrete VRSMD estimator sequence converges to the minimum mirror interpolant in the linear regression.
We derive a model estimation accuracy result in the setting when the true model is sparse.
arXiv Detail & Related papers (2022-04-29T19:37:24Z) - Channel-Directed Gradients for Optimization of Convolutional Neural
Networks [50.34913837546743]
We introduce optimization methods for convolutional neural networks that can be used to improve existing gradient-based optimization in terms of generalization error.
We show that defining the gradients along the output channel direction leads to a performance boost, while other directions can be detrimental.
arXiv Detail & Related papers (2020-08-25T00:44:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.