Projected Langevin dynamics and a gradient flow for entropic optimal
transport
- URL: http://arxiv.org/abs/2309.08598v1
- Date: Fri, 15 Sep 2023 17:55:56 GMT
- Title: Projected Langevin dynamics and a gradient flow for entropic optimal
transport
- Authors: Giovanni Conforti, Daniel Lacker, Soumik Pal
- Abstract summary: We introduce analogous diffusion dynamics that sample from an entropy-regularized optimal transport.
By studying the induced Wasserstein geometry of the submanifold $Pi(mu,nu)$, we argue that the SDE can be viewed as a Wasserstein gradient flow on this space of couplings.
- Score: 0.8057006406834466
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The classical (overdamped) Langevin dynamics provide a natural algorithm for
sampling from its invariant measure, which uniquely minimizes an energy
functional over the space of probability measures, and which concentrates
around the minimizer(s) of the associated potential when the noise parameter is
small. We introduce analogous diffusion dynamics that sample from an
entropy-regularized optimal transport, which uniquely minimizes the same energy
functional but constrained to the set $\Pi(\mu,\nu)$ of couplings of two given
marginal probability measures $\mu$ and $\nu$ on $\mathbb{R}^d$, and which
concentrates around the optimal transport coupling(s) for small regularization
parameter. More specifically, our process satisfies two key properties: First,
the law of the solution at each time stays in $\Pi(\mu,\nu)$ if it is
initialized there. Second, the long-time limit is the unique solution of an
entropic optimal transport problem. In addition, we show by means of a new
log-Sobolev-type inequality that the convergence holds exponentially fast, for
sufficiently large regularization parameter and for a class of marginals which
strictly includes all strongly log-concave measures. By studying the induced
Wasserstein geometry of the submanifold $\Pi(\mu,\nu)$, we argue that the SDE
can be viewed as a Wasserstein gradient flow on this space of couplings, at
least when $d=1$, and we identify a conjectural gradient flow for $d \ge 2$.
The main technical difficulties stems from the appearance of conditional
expectation terms which serve to constrain the dynamics to $\Pi(\mu,\nu)$.
Related papers
- Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks.
In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z) - A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks.
We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks.
Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z) - Symmetric Mean-field Langevin Dynamics for Distributional Minimax
Problems [78.96969465641024]
We extend mean-field Langevin dynamics to minimax optimization over probability distributions for the first time with symmetric and provably convergent updates.
We also study time and particle discretization regimes and prove a new uniform-in-time propagation of chaos result.
arXiv Detail & Related papers (2023-12-02T13:01:29Z) - High-probability Convergence Bounds for Nonlinear Stochastic Gradient Descent Under Heavy-tailed Noise [59.25598762373543]
We show that wetailed high-prob convergence guarantees of learning on streaming data in the presence of heavy-tailed noise.
We demonstrate analytically and that $ta$ can be used to the preferred choice of setting for a given problem.
arXiv Detail & Related papers (2023-10-28T18:53:41Z) - Lower Complexity Adaptation for Empirical Entropic Optimal Transport [0.0]
Entropic optimal transport (EOT) presents an effective and computationally viable alternative to unregularized optimal transport (OT)
We derive novel statistical bounds for empirical plug-in estimators of the EOT cost.
Our techniques employ empirical process theory and rely on a dual formulation of EOT over a single function class.
arXiv Detail & Related papers (2023-06-23T16:06:13Z) - Single Trajectory Nonparametric Learning of Nonlinear Dynamics [8.438421942654292]
Given a single trajectory of a dynamical system, we analyze the performance of the nonparametric least squares estimator (LSE)
We leverage recently developed information-theoretic methods to establish the optimality of the LSE for non hypotheses classes.
We specialize our results to a number of scenarios of practical interest, such as Lipschitz dynamics, generalized linear models, and dynamics described by functions in certain classes of Reproducing Kernel Hilbert Spaces (RKHS)
arXiv Detail & Related papers (2022-02-16T19:38:54Z) - Mean-Square Analysis with An Application to Optimal Dimension Dependence
of Langevin Monte Carlo [60.785586069299356]
This work provides a general framework for the non-asymotic analysis of sampling error in 2-Wasserstein distance.
Our theoretical analysis is further validated by numerical experiments.
arXiv Detail & Related papers (2021-09-08T18:00:05Z) - On Multimarginal Partial Optimal Transport: Equivalent Forms and
Computational Complexity [11.280177531118206]
We study the multi-marginal partial optimal transport (POT) problem between $m$ discrete (unbalanced) measures with at most $n$ supports.
We first prove that we can obtain two equivalence forms of the multimarginal POT problem in terms of the multimarginal optimal transport problem via novel extensions of cost tensor.
We demonstrate that the ApproxMPOT algorithm can approximate the optimal value of multimarginal POT problem with a computational complexity upper bound of the order $tildemathcalO(m3(n+1)m/ var
arXiv Detail & Related papers (2021-08-18T06:46:59Z) - Last iterate convergence of SGD for Least-Squares in the Interpolation
regime [19.05750582096579]
We study the noiseless model in the fundamental least-squares setup.
We assume that an optimum predictor fits perfectly inputs and outputs $langle theta_*, phi(X) rangle = Y$, where $phi(X)$ stands for a possibly infinite dimensional non-linear feature map.
arXiv Detail & Related papers (2021-02-05T14:02:20Z) - Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and
Variance Reduction [63.41789556777387]
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP)
We show that the number of samples needed to yield an entrywise $varepsilon$-accurate estimate of the Q-function is at most on the order of $frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$ up to some logarithmic factor.
arXiv Detail & Related papers (2020-06-04T17:51:00Z) - Function approximation by neural nets in the mean-field regime: Entropic regularization and controlled McKean-Vlasov dynamics [7.1822457112352955]
We consider the problem of function approximation by two-layer neural nets with random weights that are "nearly Gaussian"
We show that the problem can be phrased as global minimization of a free energy functional on the space of paths over probability measures on the weights.
arXiv Detail & Related papers (2020-02-05T20:50:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.