Mirror Descent with Relative Smoothness in Measure Spaces, with
application to Sinkhorn and EM
- URL: http://arxiv.org/abs/2206.08873v1
- Date: Fri, 17 Jun 2022 16:19:47 GMT
- Title: Mirror Descent with Relative Smoothness in Measure Spaces, with
application to Sinkhorn and EM
- Authors: Pierre-Cyril Aubin-Frankowski and Anna Korba and Flavien L\'eger
- Abstract summary: This paper studies the convergence of the mirror descent algorithm in an infinite-dimensional setting.
Applying our result to joint distributions and the Kullback--Leibler divergence, we show that Sinkhorn's primal iterations for optimal transport correspond to a mirror descent.
- Score: 11.007661197604065
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Many problems in machine learning can be formulated as optimizing a convex
functional over a space of measures. This paper studies the convergence of the
mirror descent algorithm in this infinite-dimensional setting. Defining Bregman
divergences through directional derivatives, we derive the convergence of the
scheme for relatively smooth and strongly convex pairs of functionals. Applying
our result to joint distributions and the Kullback--Leibler (KL) divergence, we
show that Sinkhorn's primal iterations for entropic optimal transport in the
continuous setting correspond to a mirror descent, and we obtain a new proof of
its (sub)linear convergence. We also show that Expectation Maximization (EM)
can always formally be written as a mirror descent, and, when optimizing on the
latent distribution while fixing the mixtures, we derive sublinear rates of
convergence.
Related papers
- Taming Nonconvex Stochastic Mirror Descent with General Bregman
Divergence [25.717501580080846]
This paper revisits the convergence of gradient Forward Mirror (SMD) in the contemporary non optimization setting.
For the training, we develop provably convergent algorithms for the problem of linear networks.
arXiv Detail & Related papers (2024-02-27T17:56:49Z) - Mirror Descent-Ascent for mean-field min-max problems [0.0]
We study two variants of the mirror descent-ascent algorithm for solving min-max problems on the space of measures.
We show that the convergence rates to mixed Nash equilibria, measured in the Nikaido-Isoda error, are of order $mathcalOleft(N-1/2right)$ and $mathcalOleft(N-2/3right)$ for the simultaneous and sequential schemes, respectively.
arXiv Detail & Related papers (2024-02-12T22:52:32Z) - Distributed Markov Chain Monte Carlo Sampling based on the Alternating
Direction Method of Multipliers [143.6249073384419]
In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers.
We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art.
In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
arXiv Detail & Related papers (2024-01-29T02:08:40Z) - Symmetric Mean-field Langevin Dynamics for Distributional Minimax
Problems [78.96969465641024]
We extend mean-field Langevin dynamics to minimax optimization over probability distributions for the first time with symmetric and provably convergent updates.
We also study time and particle discretization regimes and prove a new uniform-in-time propagation of chaos result.
arXiv Detail & Related papers (2023-12-02T13:01:29Z) - On Learning Gaussian Multi-index Models with Gradient Flow [57.170617397894404]
We study gradient flow on the multi-index regression problem for high-dimensional Gaussian data.
We consider a two-timescale algorithm, whereby the low-dimensional link function is learnt with a non-parametric model infinitely faster than the subspace parametrizing the low-rank projection.
arXiv Detail & Related papers (2023-10-30T17:55:28Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Curvature-Independent Last-Iterate Convergence for Games on Riemannian
Manifolds [77.4346324549323]
We show that a step size agnostic to the curvature of the manifold achieves a curvature-independent and linear last-iterate convergence rate.
To the best of our knowledge, the possibility of curvature-independent rates and/or last-iterate convergence has not been considered before.
arXiv Detail & Related papers (2023-06-29T01:20:44Z) - Stochastic Mirror Descent: Convergence Analysis and Adaptive Variants
via the Mirror Stochastic Polyak Stepsize [20.376216873620763]
We investigate the convergence of mirror descent (SMD) under in relatively smooth and smooth convex optimization.
We propose a new adaptive stepsize scheme -- the mirror Polyak stepsize (mSPS)
arXiv Detail & Related papers (2021-10-28T19:49:40Z) - Implicit differentiation for fast hyperparameter selection in non-smooth
convex learning [87.60600646105696]
We study first-order methods when the inner optimization problem is convex but non-smooth.
We show that the forward-mode differentiation of proximal gradient descent and proximal coordinate descent yield sequences of Jacobians converging toward the exact Jacobian.
arXiv Detail & Related papers (2021-05-04T17:31:28Z) - Efficient constrained sampling via the mirror-Langevin algorithm [9.061408029414455]
We propose a new discretization of the mirror-Langevin diffusion and give a crisp proof of its convergence.
For the task of sampling from a log-concave distribution supported on a compact set, our theoretical results are significantly better than the existing guarantees.
arXiv Detail & Related papers (2020-10-30T11:54:24Z) - Distributed Mirror Descent with Integral Feedback: Asymptotic
Convergence Analysis of Continuous-time Dynamics [11.498089180181365]
This work addresses distributed optimization, where a network of agents wants to minimize a global strongly convex objective function.
We propose a continuous-time distributed mirror descent that uses purely local information to converge to the global optimum.
arXiv Detail & Related papers (2020-09-14T21:11:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.